MDS and network!   

MDS and network!

降维分析和网络

04 Sep 2016

Go back 2016-09-04-MDS

1.MDS降维技术

MDS就是保持高维度两两之间的距离的数据,在低纬度上展现出来的数学工具

这个距离数据可以由dist函数产生

#R 自带的数据集
#UScitiesD        Distances Between European Cities and Between US Cities
fit<-cmdscale(UScitiesD)
x<-fit[,1]
y<-fit[,2]
plot(-x,-y)#实际绘图时稍微修下图即可!
dat<-UScitiesD
text(-x,-y,pos=4,labels = attr(dat,"Labels"))#和真实的地图还是比较相似的

Igraph包中的layout.mds()也可以实现降维。也可以用来画这个图

library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
g<-graph.full(nrow(as.matrix(UScitiesD)))
V(g)$label<-attr(dat,"Labels")
layout<-layout.mds(g,dist = as.matrix(UScitiesD))
plot(g,layout=layout,vertex.size=3)

2.社交网络分析

下面记录一下这个社交分析的思想:就是一个m × n的矩阵 m是你要构建网络的节点,n是你选取的指标,然后你设置一个阈值如果达到这个阈值就认为其在这个指标中显著。然后m × n 矩阵和 转置后的 n × m之后的矩阵相乘得到一个m × m的邻接矩阵adjacency matrix

#产生测试数据
set.seed(2)
termDocMatrix<-matrix(data =runif(100,min = -1,max = 1),nrow = 10,ncol = 10)
rownames(termDocMatrix)<-rep(paste0("gene",1:10))
colnames(termDocMatrix)<-rep(paste0("tissue",1:10))
termDocMatrix[termDocMatrix>0]<-1#值大于0赋值为1
termDocMatrix[termDocMatrix<=0]<-0
#转化成adjacency matrix
termMatrix<-termDocMatrix %*% t(termDocMatrix) 
head(termMatrix)#展示数据
##       gene1 gene2 gene3 gene4 gene5 gene6 gene7 gene8 gene9 gene10
## gene1     5     1     3     1     2     2     2     0     2      1
## gene2     1     3     2     0     2     2     1     1     1      1
## gene3     3     2     6     2     3     4     3     2     3      2
## gene4     1     0     2     3     1     2     2     1     2      1
## gene5     2     2     3     1     4     3     2     1     2      2
## gene6     2     2     4     2     3     7     5     3     4      4

开始绘图

library(igraph)#build a graph from the above matrix
g<-graph.adjacency(termMatrix,weighted = T,mode = "undirected")
g<-simplify(g)#remove loops
#set labels and degrees of vertices
V(g)$label <- V(g)$name
V(g)$degree<-degree(g)
#绘制图形
set.seed(1)#可重复
layout1<-layout.fruchterman.reingold(g)
plot(g,layout=layout1)

优化图形输出效果

进一步设置定点标签的尺寸,对重要的term加以突出,基于权重设置连接线的宽度和透明度。这在定点和连接线众多时应用优势突出。在下面的代码中,顶点和连接线分别从V()和E()获得。函数rgb()定义颜色,alpha定义透明度。

V(g)$label.cex<-2.2*V(g)$degree/max(V(g)$degree)+0.2
V(g)$label.color<-rgb(0,0,0.2,0.8)
V(g)$frame.color<-NA
egam<-(log(E(g)$weight)+0.4)/max(log(E(g)$weight)+0.4)
E(g)$color<-rgb(0.5,0.5,0,egam)
E(g)$width<-egam*2
#绘图
plot(g,layout=layout1)