Principal Coordinates Analysis
|Previous Top Next|
Why does ClusterVis perform Principal Coordinates Analysis (PCoA, = 'Classical
Multidimensional Scaling') instead of Principal Components Analysis (PCA)?
Let's look at the differences between PCA and PCoA:
Principal Components analysis (PCA)
- transforms a number of possibly correlated variables (a similarity matrix!) into a smaller number of uncorrelated variables called principal components. So it reduces the dimensions of a complex data set and can be used to visulalize complex data.
The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible.
- captures as much of the variation in the data as possible
- principal components are ...
* summary variables
* linear combinations of the original variables
* uncorrelated with each other
* capture as much of the original variance as possible
Classical Multidimensional Scaling (CMDS)
[syn. Torgerson scaling, Torgerson-Gower scaling]
- is similar in spirit to PCA but it takes a dissimilarity as input! A dissimilarity matrix shows the distance between every possible pair of objects.
- is a set of data analysis techniques that display the structure of (complex) distance-like data (a dissimilarity matrix!) in a high dimensional space into a lower dimensional space without too much loss of information.
- The goal of MDS is to faithfully represent these distances with the lowest possible dimensional space.
ClusterVis calculates a principal coordinate analysis (PCoA) of a distance matrix (see Gower, 1966) and calculates a centered matrix. The centered matrix is then decomposed into its component eigenvalues and eigenvectors.
The eigenvectors, standardized by dividing by the square root of their corresponding eigenvalue, are output as the principal coordinate axes. This analysis is also called metric multi-dimensional scaling. It is useful for ordination of multivariate data on the basis of any distance function.
Zuur, A.F., Leno, E.N. & Smith, G.M. (2007): Statistics for Biology and
Health - Analysing Ecological Data, Springer, New York. ISBN
978-0-387-45967-7 (Print), 978-0-387-45972-1 (Online).
Gower, J.C. (1966): Some distance properties of latent root and vector
methods used in multivariate analysis. Biometrika 53: 325-338.