排序之 PCA & tb-PCA

主成分分析是线性非限制性排序方法。主成分分析的轴是基于特征值(变量解释程度)而排序的。

主成分分析相关function简介:

\(\color{red}{\text{rda()}}\) 源于vegan包。如果相关环境变量没有制定,rda默认的是进行pca分析。如果不用原始数据而是对数据进行了相应转换(利用decostand(),如helinger或者chord转换),然后运行rda进行tb-PCA。

排序结果可以用\(\color{red}{\text{plot()}}\)来展示。还可以用biplot() 或者cleanplot.pca。

\(\color{red}{\text{evplot()}}\)通过可视化方法判断排序轴因子的重要性。using Keiser-Guttman criterion and broken stick model.类似的还有PCAsignificance (library BiodiversityR) - calculates broken-stick model for PCA axes.

\(\color{red}{\text{ordiequilibriumcircle}}\) (library BiodiversityR) - adds the circle of equivalent contribution onto the ordination diagram plotted by vegan’s functions ordiplot or biplot.

举个栗子 我们以草原群落为例。鉴于该草原的物种群落组成相对匀质,我们认为比较适合pca分析。统计上讲可以通过第一DCA轴的长度来判断。

#读进群落数据
grasslands.spe <- read.delim('http://www.davidzeleny.net/anadat-r/data-download/grasslands-spe.txt', row.names = 1)
## Warning in file(file, "rt"): "internal" method cannot handle https
## redirection to: 'https://www.davidzeleny.net/anadat-r/data-download/
## grasslands-spe.txt'
## Warning in file(file, "rt"): "internal" method failed, so trying "libcurl"
##读进环境变量
grasslands.env <- read.delim('http://www.davidzeleny.net/anadat-r/data-download/grasslands-env.txt')
## Warning in file(file, "rt"): "internal" method cannot handle https
## redirection to: 'https://www.davidzeleny.net/anadat-r/data-download/
## grasslands-env.txt'

## Warning in file(file, "rt"): "internal" method failed, so trying "libcurl"
grasslands.spe.log <- log1p(grasslands.spe)
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-4
decorana(grasslands.spe.log)
## 
## Call:
## decorana(veg = grasslands.spe.log) 
## 
## Detrended correspondence analysis with 26 segments.
## Rescaling of axes with 4 iterations.
## 
##                   DCA1   DCA2   DCA3   DCA4
## Eigenvalues     0.2652 0.2285 0.1485 0.1545
## Decorana values 0.2887 0.2228 0.1676 0.1383
## Axis lengths    2.6961 2.1441 2.5704 1.8207

结果中第一个DCA轴的长度为2.7倍的sd(小于3即可适用pca分析)。

PCA <- rda(grasslands.spe.log)
PCA
## Call: rda(X = grasslands.spe.log)
## 
##               Inertia Rank
## Total            35.4     
## Unconstrained    35.4   47
## Inertia is variance 
## 
## Eigenvalues for unconstrained axes:
##   PC1   PC2   PC3   PC4   PC5   PC6   PC7   PC8 
## 4.625 3.492 2.445 2.297 1.648 1.543 1.479 1.331 
## (Showed only 8 of all 47 unconstrained eigenvalues)

pca轴一和轴二一共解释了(4.625+3.492)/35.4 = 23%。然后就是画图。我们可以设置用不同的形状和颜色代表不同的植被类型。

vegtype <- as.numeric(grasslands.env$classification)
ordiplot(PCA, display = "sites", type = "n")
points(PCA, pch=vegtype, col=vegtype)