Similarity detection algorithm for good accuracy

Posted By :Sharmishtha Paliwal |31st May 2023


In the Artificial Intelligence and computer vision world,we heard about many supervised algorithms ,unsupervised algorithms,image classification but along with that in many cases ,it is mandatory to use similarity detection algorithms due to wrong prediction of similarity with actual input.There are so many use cases and few of them are 

1)Learning image or text

2)Detection based on two non zero vector which is use in NLP

3)Comparison of string 


Till date I have used three  types of algorithm for three different cases ,first is siamese algorithm for learning image similarity, second is cosine similarity for text similarity,document detection etc and Levenshtein distance for word,letter,string comparison.

Brief of two algorithm is follow which is shown good result first is siamese algorithm and second is Levenshtein distance


Siamese algorithm : This is widely used in application of deep learning algorithm ,and known by name Siamese Network .Brief description of Siamese Architecture is it is using twin branches,which is called identical subnetwork ,Each branch has one input which is forward to next phase that is feature extraction by adding convolution layer or Recurrent layer.In this phase network is trained to less the distance for two similar input and increase the distance if two input is not similar.It is also called input encoding.In next phase ,using encoded input ,it is use euclidean or cosine distance which is also called loss function to find the similarity between two input.




Application of siamese algorithm is Image matching,recommendation system,text similarity and face recognition and so on.

Levenshtein distance : It is widely used for string matching , known by edit distance and algorithm introduced by Vladimir Levenshtein in 1965 .


                                                 Insertion ,deletion and substitution operation


The concept is comparing each string of word with the string of each word which is used for comparing.Minimum distance is considered based on minimum replacement of string. It is basically using three operations over string that is insertion,deletion and substitution.




                                                 Matrix calculation for similarity prediction


For smooth calculation it is using a matrix of string where each row and column is two words for comparison and each cell is string.Based on number number substitution in matrix  is calculating the distance.

It is widely using in text similarity,dna clustering.


About Author

Sharmishtha Paliwal

Sharmishtha Paliwal is an accomplished backend developer with extensive industry experience. She has successfully contributed to multiple projects, including I-infinity, Call Recording Library, Trip Congo, and Text-OCR. Sharmishtha is proficient in utilizing the latest technologies and is adept at working with tools and frameworks such as Python, Django Rest Framework, Flask Framework, Node Js, SQL, and NO SQL databases. She has also acquired expertise in trending skills such as Artificial Intelligence, Machine Learning, Data Analytics, Pyspark, and Data Visualization using Grafana, Matplotlib, and Seaborn.

Request For Proposal

[contact-form-7 404 "Not Found"]

Ready to innovate ? Let's get in touch

Chat With Us