The unweighted pair-group method with arithmetic mean (UPGMA) is a popular distance analysis
method.
UPGMA characteristics
· UPGMA is the simplest method for constructing trees.
· The great disadvantage of UPGMA is that it assumes the same evolutionary speed on all
lineages, i.e. the rate of mutations is constant over time and for all lineages in the tree. This is
called a 'molecular clock hypothesis'.
This would mean that all leaves (terminal nodes) have the same distance from the root. In
reality the individual branches are very unlikely to have the same mutation rate. Therefore, UPGMA frequently generates wrong tree toplogies!
· Generates rooted trees (re-rooting is not allowed!)
· Generates ultrametric trees
The UPGMA algorithm
· UPGMA starts with a matrix of pairwise distances D[1..n, 1..m].
· In the following text each sample (taxon, operational taxonomic unit=OTU) is denoted as a
'cluster'.
· starts by assigning all clusters (samples) to a star-like tree
1. Find that pair (cluster i and j) with the smallest distance value in the distance matrix:
D[i,j].
2. Define a new cluster comprising cluster i and j:
Cluster i is connected by a branch to the common ancestor node. The same applies for
cluster j.
Therefore, the distance D[i,j] is split onto the two branches. So, each of the two branches
obtains a length of D[i,j]/2.
3. If i and j were the last 2 clusters, the tree is finished. If not the algorithm finds a new
cluster
called u.
4. Define the distance from u to each other cluster (k, with k <> i or j) to be an average
of the
distances dki and dkj.
For 'Weighted PGMA (WPGM)': dku = dki+dkj/2).
For 'Complete linkage': dku = max(dki, dkj).
For 'Single linkage': dku = min(dki, dkj).
5. Go back to step 1 with one less cluster. Clusters i and j are eliminated, and cluster u is
added
to the tree.
Reference for UPGMA:
Michener, C.D., Sokal, R.R. (1957): A quantitative approach to a problem of classification.
Evolution, 11:490–499.
See