UPGMA
Previous Top Next

The unweighted pair-group method with arithmetic mean (UPGMA) is a popular distance analysis method.

UPGMA characteristics

·     UPGMA is the simplest method for constructing trees.
·     The great disadvantage of UPGMA is that it assumes the same evolutionary speed on all lineages, i.e. the rate of mutations is constant over time and for all lineages in the tree. This is called a 'molecular clock hypothesis'.
This would mean that all leaves (terminal nodes) have the same distance from the root. In reality the individual branches are very unlikely to have the same mutation rate. Therefore,
UPGMA frequently generates wrong tree toplogies!
·     Generates rooted trees (re-rooting is not allowed!)
·     Generates ultrametric trees


The UPGMA algorithm

·     UPGMA starts with a matrix of pairwise distances D[1..n, 1..m].
·     In the following text each sample (taxon, operational taxonomic unit=OTU) is denoted as a 'cluster'.
·     starts by assigning all clusters (samples) to a star-like tree

1.   Find that pair (cluster i and j) with the smallest distance value in the distance matrix: D[i,j].
2.   Define a new cluster comprising cluster i and j:
Cluster i is connected by a branch to the common ancestor node. The same applies for cluster j.
Therefore, the distance D[i,j] is split onto the two branches. So, each of the two branches obtains a length of D[i,j]/2.
3.   If i and j were the last 2 clusters, the tree is finished. If not the algorithm finds a new cluster called u.
4.   Define the distance from u to each other cluster (k, with k <> i or j) to be an average of the distances dki and dkj.
For 'Weighted PGMA (WPGM)': dku = dki+dkj/2).
For 'Complete linkage': dku = max(dki, dkj).
For 'Single linkage': dku = min(dki, dkj).
5.   Go back to step 1 with one less cluster. Clusters i and j are eliminated, and cluster u is added to the tree.


Reference for UPGMA:
Michener, C.D., Sokal, R.R. (1957): A quantitative approach to a problem of classification. Evolution, 11:490–499.

See
< Cluster analysis
> Explanation of the Term 'Ultrametric'