Utilising SimMetrics in SSMS 2014

SimMetrics is a Similarity Metric Library created by UK Sheffield University. Full Project here.

This library is very useful for providing fuzzy match functions that can be used to match customer data within a relational database environment.

The functions include edit distance between strings (e.g. Levenshtein, Gotoh, JaroWinkler etc)  as well as other metrics, (e.g Chapman).

Below are some simple instructions that allow you to use this library in SQL Server Management Studio 2014 without the need for Visual Studio.

I have used the very helpful article from here, but had to play around with Visual Studio in order to get the dll to work. Below are instructions that allow you to skip that step (hopefully).

Pop the following files in c:\Temp directory

TextFunctions.zip

Then unzip them and run the following SQL in SSMS (remembering to update the reference at the top to your database name).

EXEC sp_configure ‘clr enabled’, 1
RECONFIGURE

GO

USE <Insert DB name here>
GO

Declare @TextFunctionDllPath nvarchar(300)

SELECT @TextFunctionDllPath = ‘C:\Temp\TextFunctions.dll’
————————-

CREATE ASSEMBLY [TextFunctions]
AUTHORIZATION [dbo]
FROM @TextFunctionDllPath
WITH PERMISSION_SET = SAFE
GO

CREATE FUNCTION Levenstein(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.Levenstein
GO

CREATE FUNCTION NeedlemanWunch(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.NeedlemanWunch
GO

CREATE FUNCTION SmithWaterman(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.SmithWaterman
GO

CREATE FUNCTION SmithWatermanGotoh(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.SmithWatermanGotoh
GO

CREATE FUNCTION SmithWatermanGotohWindowedAffine(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.SmithWatermanGotohWindowedAffine
GO

CREATE FUNCTION Jaro(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.Jaro
GO

CREATE FUNCTION JaroWinkler(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.JaroWinkler
GO

CREATE FUNCTION ChapmanLengthDeviation(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.ChapmanLengthDeviation
GO

CREATE FUNCTION ChapmanMeanLength(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.ChapmanMeanLength
GO

CREATE FUNCTION QGramsDistance(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.QGramsDistance
GO

CREATE FUNCTION BlockDistance(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.BlockDistance
GO

CREATE FUNCTION CosineSimilarity(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.CosineSimilarity
GO

CREATE FUNCTION DiceSimilarity(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.DiceSimilarity
GO

CREATE FUNCTION EuclideanDistance(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.EuclideanDistance
GO

CREATE FUNCTION JaccardSimilarity(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.JaccardSimilarity
GO

CREATE FUNCTION MatchingCoefficient(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.MatchingCoefficient
GO

CREATE FUNCTION MongeElkan(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.MongeElkan
GO

CREATE FUNCTION OverlapCoefficient(@firstword NVARCHAR(255),@secondword NVARCHAR(255))
RETURNS float EXTERNAL NAME TextFunctions.StringMetrics.OverlapCoefficient
GO

CREATE FUNCTION dbo.CompareStringMetrics(@firstword [nvarchar](255), @secondword [nvarchar](255))
RETURNS TABLE
AS
RETURN
(
    SELECT dbo.Jaro(@firstword, @secondword) as Score, ‘Jaro’ as Metric
    UNION SELECT dbo.JaroWinkler(@firstword, @secondword), ‘JaroWinkler’
    UNION SELECT dbo.BlockDistance(@firstword, @secondword), ‘BlockDistance’
    UNION SELECT dbo.ChapmanLengthDeviation(@firstword, @secondword), ‘ChapmanLengthDeviation’
    UNION SELECT dbo.ChapmanMeanLength(@firstword, @secondword), ‘ChapmanMeanLength’
    UNION SELECT dbo.CosineSimilarity(@firstword, @secondword), ‘CosineSimilarity’
    UNION SELECT dbo.DiceSimilarity(@firstword, @secondword), ‘DiceSimilarity’
    UNION SELECT dbo.EuclideanDistance(@firstword, @secondword), ‘EuclideanDistance’
    UNION SELECT dbo.JaccardSimilarity(@firstword, @secondword), ‘JaccardSimilarity’
    UNION SELECT dbo.Levenstein(@firstword, @secondword), ‘Levenstein’
    UNION SELECT dbo.MatchingCoefficient(@firstword, @secondword), ‘MatchingCoefficient’
    UNION SELECT dbo.MongeElkan(@firstword, @secondword), ‘MongeElkan’
    UNION SELECT dbo.NeedlemanWunch(@firstword, @secondword), ‘NeedlemanWunch’
    UNION SELECT dbo.OverlapCoefficient(@firstword, @secondword), ‘OverlapCoefficient’
    UNION SELECT dbo.QGramsDistance(@firstword, @secondword), ‘QGramsDistance’
    UNION SELECT dbo.SmithWaterman(@firstword, @secondword), ‘SmithWaterman’
    UNION SELECT dbo.SmithWatermanGotoh(@firstword, @secondword), ‘SmithWatermanGotoh’
    UNION SELECT dbo.SmithWatermanGotohWindowedAffine(@firstword, @secondword), ‘SmithWatermanGotohWindowedAffine’
)

Now you should find that the functions are available in SQL Server Management Studio under Object Explorer – Database | Programmability | Functions | Scalar-Valued Functions

Usage for CompareStringMetrics Function:

SELECT metric, Score
FROM dbo.CompareStringMetrics(‘katee smith’,’katie smith’) AS CompareStringMetrics

You could also consider using the built-in soundex function:

select soundex(‘katee smith’)