Similarities and Dissimilarities (Distances)
Similarities measure the relatedness of a sample pairs, i.e. they measure how close to samples
are to each other. Dissimilarities (distances) measure the number of differences between a pair of
samples.
Absolute and relative distances
We can distinguish absolute distances from relative distances. For absolute differences the
number of different positions is calculated. In the case of relative distances the absolute distance
is normalized to the total number of positions that are compared.
Normalization
Distances are usually normalized (relative distances) to compensate different sample sizes and
number of total positions. Thus, relative distances are numbers in the range of zero (no
relatedness of two samples) to 1 (total match between a pair of samples).
Example
If 10 character positions are compared between two samples and both samples have two
characters in common, they have an absolute similarity of 2 or a relative similarity of 0.2 (2 out of
10). The absolute dissimilarity or absolute distance is 8 (10 minus 2) or 0.8 (8 differing positions
out of 10).
A large number of different distance measures are available and each distance measure has its
own logic.
