Distance Measures
|
Previous Top Next |
Sample
|
Character
|
1
|
0
|
Sample #1
|
1
|
a = 1/1 (in both samples)
|
b = 1/0 (only in sample #1)
|
Sample #2
|
0
|
c = 0/1 (only in sample #2)
|
d = 0/0 (in non of the samples)
|
Name
|
Formula
|
Comment
|
Hamming distance
(Manhattan, city-block, taxi-cab)
|
b+c
|
non-normalized distance,
increases with the number of
characteristics
|
Euclidian distance
|
sqrt(b+c)
|
non-normalized distance,
increases with the number of
characteristics
|
Soergel distance
|
(b+c)/(b+c+d)
|
normalized distance,
complementary to Tanimoto: 1-a/(b+c+d)
|
Mean Hamming distance
|
(b+c)/n
= (b+c)/(a+b+c+d)
|
normalized distance
|
Mean Euclidian distance
|
sqrt((b+c)/n)
= sqrt(b+c)/(a+b+c+d)**2
|
normalized distance
|
Name
|
Similarity Formula
|
Jaccard
|
a/(n-d)
[= a / (a+b+c)]
|
Russel & Rao
|
a/n
|
Rogers & Tanimoto
|
(a+d)/(a+2*(b+c)+d)
|
Kulczynski #1
|
a/(b+c)
|
Kulczynski #2
|
0.5*(a/(a+b)+a/(a+c))
|
Dice
|
2*a/(2*a+b+c)
|
Pearson's Phi coefficient
|
((a*d)-(c*b))/sqrt((a+c)*(c+d)*(a+b)*(b+d))
|
- bc -
|
1 - D
|
Baroni-Urbani/Buser
|
(a+sqrt(a*d))/(a+b+c+sqrt(a*d))
|
Braun-Blanquet
|
if (a+b)>(a+c) then S:=a/(a+b) else S:=a/(a+c)
|
Simpson similarity
coefficient
|
if (a+b)<(a+c) then S := a/(a+b) else S := a/(a+c)
|
Michael
|
4*(a*d-b*c)/((a+d)*(a+d)+(b+c)*(b+c))
|
Sokal and Sneath #1
|
a/(a+2*(b+c))
|
SokalSneath #2
|
0.25 * ( a/(a+b)
+ a/(a+c)
+ d/(b+d)
+ d/(c+d) )
|
SokalSneath #3
|
a*d/sqrt((a+b)*(a+c)*(d+b)*(d+c))
|
Sokal and Sneath #4
|
(a+d)/(b+c)
|
Sokal and Sneath #5
|
2*(a+d)/(2*(a+d)+b+c)
|
Simple Matching
|
(a+d)/(a+b+c+d)
=(a+d)/n
|
Mean Hamming
|
1 - D
|
Sneath & Sokal
|
(a+d)/(a +0.5*(b+c)+d)
|
Kocher & Wong
|
a*n/((a+b)*(c+d))
|
Faith
|
(a+d/2)/n
|
Ochiaï #1
|
a/sqrt((a+b)*(a+c))
|
Ochiaï #2
|
a*d/sqrt((a+b)*(a+c)*(d+b)*(d+c))
|
Q0
|
(b*c)/(a*d)
|
Yule's Sigma
|
(sqrt(a*d)-sqrt(b*c))/(sqrt(a*d)+sqrt(b*c))
|
Yule's Q
|
(a*d-b*c)/(a*d+b*c)
|
Upholt
|
F = 2*a/(2*a+b+c)
S = Power(0.5 * (-F+sqrt(F*F+8*F)) , (1/n));
|
Excoffier
|
n*(1-(a/n))
|
Hamann
|
(a-(b+c)+d)/n
|
Roux #1
|
(a+d) / (min(b,c)+min(n-b,n-c))
|
Roux #2
|
(n-a*d) / sqrt ((a+b)*(c+d)*(a+c)*(b+d))
|
Michelet
|
a*a/(b*c)
|
Fager & McGowan
|
a/sqrt((a+b)*(a+c)) + 1/sqrt(a+b)
|
Fager
|
a/sqrt((a+b)*(a+c)) - max(b,c)
|
Unigram subtuples
|
Log(a*d/b/c)-3.29*sqrt(1/a+1/b+1/c+1/d)
|
U cost
|
Log(1+(min(b,c)+a)/(max(b,c)+a))
|
S cost
|
Log(1+min(b,c)/(a+1))**-.5
|
R cost
|
Log(1+a/(a+b))*log(1+a/(a+c))
|
T combined cost
|
Sqrt(U * S * R)
U = Log(1+(min(b,c)+a)/(max(b,c)+a))
S = Log(1+min(b,c)/(a+1))**-.5
R = Log(1+a/(a+b))*log(1+a/(a+c))
|
McConnoughy
|
(a*a - b*c) / sqrt((a+b)*(a+c))
|
Phi Square
|
power((a*d + b*c), 2) / ((a+b)*(a+c)*(b+c)*(b+d))
|
Forbes
|
n*a/((a+b)*(a+c))
|
Fossum
|
n*(a-0.5)*(a-0.5)/((a+b)*(a+c))
|
Stiles
|
log10(n*power((abs(a*d-b*c)-n/2) , 2) / ( (a+b)*(a+c)*(b+d)*(c+d) ))
|
Dispersion
|
(a*d-b*c)/power(a+b+c+d, 2)
|
Dennis
|
(a*d-b*c)/sqrt(n*(a+b)*(a+c))
|
Pearson Chi Square
|
n * power (abs(a*d-b*c)-n/2 , 2)/((a+b)*(c+d)*(a+c)*(b+d))
|
Mountford
|
2*a/(2*b*c+a*b+a*c))
|
Mutual Information
|
ln(a*n/((a+b)*(a+c)))
|
Weighted Mutual
Information #3
|
ln(power(a, 3)*n/((a+b)*(a+c)))
|
Chi Square with
correction of Yates n/2
|
(n * (abs(a*d-b*c)-n/2)**2)/((a+b)*(c+d)*(a+c)*(b+d))
|
Normalized Collacation
|
a/(b+c-a)
|
Dunning
|
2*(a*log(a) + b*log(b) + c*log(c) + d*log(d))
- (a+b)*log(a+b) - (a+c)*log(a+c) - (b+d)*log(b+d) - (c+d)*log(c+d)
+ (a+b+c+d)*log(a+b+c+d)
|