Neighbor-Joining method Previous Top Next

The neighbor-joining method (NJ) is a distance based method (requires a distance matrix) and uses the star decomposition method.

Algorithm
Neighbor-joining is a recursive algorithm. Each step in the recursion consists of the following steps:
1.   Based on the current distance matrix calculate a modified distance matrix Q (see below).
2.   Find the least distant pair of nodes in Q (= the closest neighbors = the pair with the lowest distance value). Create a new node on the tree joining the two closest nodes: the two nodes are linked by their common ancestral node.
3.   Calculate the distance of each of the nodes in the pair to their ancestral node.
4.   Calculate the distance of all nodes outside of this pair to their ancestral node.
5.   Start the algorithm again, considering the pair of joined neighbors as a single taxon (the terminal nodes are replaced by their ancestral node and the ancestral node is then treated as a terminal node) and using the distances calculated in the previous step.

Tab. 1. Formula used in the NJ clustering method
 Description Formula Distance matrix Q Each member in the distance matrix Q is calculated as follows: Q(i,j) = (r-2) * d(i,j) - sum[k=1..r](d(i,k)) - sum[k=1..r](d(j,k)) Neighbors in the pair For each neighbor in the pair just joined, use the following formula to calculate to the new node: d(f,u) = 0.5 * d(f,g) + 1/(2*(r-2)) * [sum[k=1..r](d(f,k)) - sum[k=1..r](d(g,k))] With: f and g are the paired taxa u is the newly generated node Each node/taxon outside the pair The distance of the other taxa to the new node the distance to the new node is calculated as follows: d(u,k) = 0.5 * [ d(f,k) - d(f,u) ] + 0.5 * [ d(g,k) - d(g,u) ] With: u is the new node k is the node for which we want to calculate the distance f and g are the members of the pair just joined

Negative branch lengths
NJ represents the data in an additive tree. Therefore it can assign negative branch lengths. Usually, branch lengths can be interpreted as an estimate for the substitutions. However, here we have difficulties in doing so.
If this occurs one can set negative branch length to zero and transfer the difference to the adjacent branch length so that the total distance between an adjacent pair of terminal nodes remains unaffected. This does not alter the overall topology of the tree (Kuhner and Felsenstein, 1994).

·     fast (suited for large datasets)
·     does not require ultrametric data: suited for datasets comprising lineages with largely varying rates of evolution
·     permits correction for multiple substitutions