SIMILARITY DETECTION ALGORITHM
In the Artificial Intelligence and computer vision world,we heard about many supervised algorithms ,unsupervised algorithms,image classification but along with that in many cases ,it is mandatory to use similarity detection algorithms due to wrong prediction of similarity with actual input.There are so many use cases and few of them are
1)Learning image or text
2)Detection based on two non zero vector which is use in NLP
3)Comparison of string
Till date I have used three types of algorithm for three different cases ,first is siamese algorithm for learning image similarity, second is cosine similarity for text similarity,document detection etc and Levenshtein distance for word,letter,string comparison.
Brief of two algorithm is follow which is shown good result first is siamese algorithm and second is Levenshtein distance
Siamese algorithm : This is widely used in application of deep learning algorithm ,and known by name Siamese Network .Brief description of Siamese Architecture is it is using twin branches,which is called identical subnetwork ,Each branch has one input which is forward to next phase that is feature extraction by adding convolution layer or Recurrent layer.In this phase network is trained to less the distance for two similar input and increase the distance if two input is not similar.It is also called input encoding.In next phase ,using encoded input ,it is use euclidean or cosine distance which is also called loss function to find the similarity between two input.
Application of siamese algorithm is Image matching,recommendation system,text similarity and face recognition and so on.
Levenshtein distance : It is widely used for string matching , known by edit distance and algorithm introduced by Vladimir Levenshtein in 1965 .
Insertion ,deletion and substitution operation
The concept is comparing each string of word with the string of each word which is used for comparing.Minimum distance is considered based on minimum replacement of string. It is basically using three operations over string that is insertion,deletion and substitution.
Matrix calculation for similarity prediction
For smooth calculation it is using a matrix of string where each row and column is two words for comparison and each cell is string.Based on number number substitution in matrix is calculating the distance.
It is widely using in text similarity,dna clustering.