The neighbor-joining method (NJ) is a distance based method (requires a distance matrix) and
uses the star decomposition method.
Neighbor-joining is a recursive algorithm. Each step in the recursion consists of the following
1. Based on the current distance matrix calculate a modified distance matrix Q (see below).
2. Find the least distant pair of nodes in Q (= the closest neighbors = the pair with the
lowest distance value). Create a new node on the tree joining the two closest nodes: the
two nodes are linked by their common ancestral node.
3. Calculate the distance of each of the nodes in the pair to their ancestral node.
4. Calculate the distance of all nodes outside of this pair to their ancestral node.
5. Start the algorithm again, considering the pair of joined neighbors as a single taxon (the
terminal nodes are replaced by their ancestral node and the ancestral node is then
treated as a terminal node) and using the distances calculated in the previous step.
Tab. 1. Formula used in the NJ clustering method
Distance matrix Q
Each member in the distance matrix Q is calculated as
Q(i,j) = (r-2) * d(i,j) - sum[k=1..r](d(i,k)) - sum[k=1..r](d(j,k))
Neighbors in the pair
For each neighbor in the pair just joined, use the
following formula to calculate to the new node:
d(f,u) = 0.5 * d(f,g) + 1/(2*(r-2)) * [sum[k=1..r](d(f,k)) -
f and g are the paired taxa
u is the newly generated node
Each node/taxon outside
The distance of the other taxa to the new node the
distance to the new node is calculated as follows:
d(u,k) = 0.5 * [ d(f,k) - d(f,u) ] + 0.5 * [ d(g,k) - d(g,u) ]
u is the new node
k is the node for which we want to calculate the distance
f and g are the members of the pair just joined
Negative branch lengths
NJ represents the data in an additive tree. Therefore it can assign negative branch lengths.
Usually, branch lengths can be interpreted as an estimate for the substitutions. However, here we
have difficulties in doing so.
If this occurs one can set negative branch length to zero and transfer the difference to the
adjacent branch length so that the total distance between an adjacent pair of terminal nodes
remains unaffected. This does not alter the overall topology of the tree (Kuhner and Felsenstein,
Advantages of NJ
· fast (suited for large datasets)
· does not require ultrametric data: suited for datasets comprising lineages with largely varying
rates of evolution
· permits correction for multiple substitutions
Disadvantages of NJ
· information is reduced (distance matrix based)
· gives only one tree (out of several possible trees)
· the resulting tree depends on the model of evolution used
Kuhner, M.K., Felsenstein, J. (1994): A simulation comparison of phylogeny algorithms under
equal and unequal evolutionary rates. Mol Biol Evol. 11(3): 459-68.
Saitou, N., Nei, M. (1987): The neighbor-joining method: a new method for reconstructing
phylogenetic trees. In: Molecular Biology and Evolution. 1987, Vol 4(4), p. 406-425.