Principal Coordinates Analysis
Previous Top Next

Principal Coordinates Analysis (PCoA, = Multidimensional scaling, MDS) is a method to explore and to visualize similarities or dissimilarities of data. It starts with a similarity matrix or dissimilarity matrix (= distance matrix) and assigns for each item a location in a low-dimensional space, e.g. as a 3D graphics.

PCOA tries to find the main axes through a matrix. It is a kind of eigenanalysis (sometimes referred as "singular value decomposition") and calculates a series of eigenvalues and eigenvectors. Each eigenvalue has an eigenvector, and there are as many eigenvectors and eigenvalues as there are rows in the initial matrix.

Eigenvalues are usually ranked from the greatest to the least. The first eigenvalue is often called the "dominant" or "leading" eigenvalue. Using the eigenvectors we can visualize the main axes through the initial distance matrix. Eigenvalues are also often called "latent values".

The result is a rotation of the data matrix: it does not change the positions of points relative to each other but it just changes the coordinate systems!

By using PCoA we can visualize individual and/or group differences. Individual differences can be used to show outliers.

There is also a method called 'Principal Component Analysis' (PCA, sometimes also misleadingly abbreviated as 'PCoA') which is different from PCOA. PCA is used for similarities and PCoA for dissimilaritties. However, all binary measures (Jaccard, Dice etc.) are distance measures and, therefore PCoA should be used. For details see the following box:
Why does ClusterVis perform Principal Coordinates Analysis (PCoA, = 'Classical
Multidimensional Scaling') instead of Principal Components Analysis (PCA)?
Let's look at the differences between PCA and PCoA:

Principal Components analysis (PCA)
- transforms a number of possibly correlated variables (a similarity matrix!) into a smaller number of uncorrelated variables called principal components. So it reduces the dimensions of a complex data set and can be used to visulalize complex data.
The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible.
- captures as much of the variation in the data as possible
- principal components are ...
* summary variables
* linear combinations of the original variables
* uncorrelated with each other
* capture as much of the original variance as possible

Classical Multidimensional Scaling (CMDS)
[syn. Torgerson scaling, Torgerson-Gower scaling]
- is similar in spirit to PCA but it takes a dissimilarity as input! A dissimilarity matrix shows the distance between every possible pair of objects.
- is a set of data analysis techniques that display the structure of (complex) distance-like data (a dissimilarity matrix!) in a high dimensional space into a lower dimensional space without too much loss of information.
- The goal of MDS is to faithfully represent these distances with the lowest possible dimensional space.

ClusterVis calculates a principal coordinate analysis (PCoA) of a distance matrix (see Gower, 1966) and calculates a centered matrix. The centered matrix is then decomposed into its component eigenvalues and eigenvectors.
The eigenvectors, standardized by dividing by the square root of their corresponding eigenvalue, are output as the principal coordinate axes. This analysis is also called metric multi-dimensional scaling. It is useful for ordination of multivariate data on the basis of any distance function.

Zuur, A.F., Leno, E.N. & Smith, G.M. (2007): Statistics for Biology and
Health - Analysing Ecological Data, Springer, New York. ISBN
978-0-387-45967-7 (Print), 978-0-387-45972-1 (Online).
Gower, J.C. (1966): Some distance properties of latent root and vector
methods used in multivariate analysis. Biometrika 53: 325-338.

Also see:

< Cluster analysis