[How To] Calculate Similarity Between Two Strings c#

Compute similarity of two names

This can be used in many cases like comparing names and find out the similarity between them. For an example if you are working on an organization that tracking down suspicious people with same name, this algorithm will be helpful to you. I have implemented Levenshtein distance algorithm, to calculate similarities between two strings.

Levenshtein distance algorithm

In this edit distance algorithm, calculates number of steps required to transform source string to target string. For more information about Levenshtein algorithm, visit above wikipedia link.

Source code

    public static class ComputeSimilarity
    {
        /// <summary>
        /// Calculate percentage similarity of two strings
        /// <param name="source">Source String to Compare with</param>
        /// <param name="target">Targeted String to Compare</param>
        /// <returns>Return Similarity between two strings from 0 to 1.0</returns>
        /// </summary>
        public static double CalculateSimilarity(this string source, string target)
        {
            if ((source == null) || (target == null)) return 0.0;
            if ((source.Length == 0) || (target.Length == 0)) return 0.0;
            if (source == target) return 1.0;

            int stepsToSame = ComputeLevenshteinDistance(source, target);
            return (1.0 - ((double)stepsToSame / (double)Math.Max(source.Length, target.Length)));
        }
        /// <summary>
        /// Returns the number of steps required to transform the source string
        /// into the target string.
        /// </summary>
        static int ComputeLevenshteinDistance(string source, string target)
        {
            if ((source == null) || (target == null)) return 0;
            if ((source.Length == 0) || (target.Length == 0)) return 0;
            if (source == target) return source.Length;

            int sourceWordCount = source.Length;
            int targetWordCount = target.Length;

            // Step 1
            if (sourceWordCount == 0)
                return targetWordCount;

            if (targetWordCount == 0)
                return sourceWordCount;

            int[,] distance = new int[sourceWordCount + 1, targetWordCount + 1];

            // Step 2
            for (int i = 0; i <= sourceWordCount; distance[i, 0] = i++) ;
            for (int j = 0; j <= targetWordCount; distance[0, j] = j++) ;

            for (int i = 1; i <= sourceWordCount; i++)
            {
                for (int j = 1; j <= targetWordCount; j++)
                {
                    // Step 3
                    int cost = (target[j - 1] == source[i - 1]) ? 0 : 1;

                    // Step 4
                    distance[i, j] = Math.Min(Math.Min(distance[i - 1, j] + 1, distance[i, j - 1] + 1), distance[i - 1, j - 1] + cost);
                }
            }

            return distance[sourceWordCount, targetWordCount];
        }
    }

Implementation

double percentage = ComputeSimilarity.CalculateSimilarity("John","Joan")

variable percentage, holds percentage  (ex: 0.80 = 80%).
[How To] Calculate Similarity Between Two Strings c# [How To] Calculate Similarity Between Two Strings c# Reviewed by TechDoubts on 7:51 AM Rating: 5

No comments:

Powered by Blogger.