使用ggplot2绘制“序列标识”?

是否可以使用ggplot2绘制一个序列标志图 ?

有一个基于名为“ seqLogo ”的“grid”的软件包 ,但我想知道是否可以有一个ggplot2版本。

谢谢。

在这里输入图像说明

ggseqlogo应该是你在找什么。 我希望这可以减轻一些挫折,我敢肯定你们中许多人在绘制R中的序列标识方面有什么困难

我已经实现了一个由Charles Berrydevise的替代scheme,它解决了seqLogos在下面评论部分讨论的一些弱点。 它使用ggplot2:

library("devtools") install_github("leipzig/berrylogo") library("berrylogo") freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T'))) p<-berrylogo(freqs,gc_content=.41) print(p) 

在这里输入图像说明

我正在提交一个与上面的莱比锡/贝瑞解决scheme有点类似的ggplot2尝试。 这种格式更接近标准的语标。

但我的解决scheme,我认为任何ggplot2解决scheme, 仍然不足,因为ggplot2不提供对绘图符号纵横比的控制 。 这是(我认为)生成序列标识所需的核心function, ggplot2没有ggplot2

还要注意:我使用了Jeremy Leipzig的答案 ,但是我没有对小样本或%GC值不同于50%的数据进行修正。

 require(ggplot2) require(reshape2) freqs<-matrix(data=c(0.25,0.65,0.87,0.92,0.16,0.16,0.04,0.98,0.98,1.00,0.02,0.10,0.10,0.80,0.98,0.91,0.07,0.07,0.11,0.05,0.04,0.00,0.26,0.17,0.00,0.01,0.00,0.00,0.29,0.17,0.01,0.03,0.00,0.00,0.32,0.32,0.53,0.26,0.07,0.02,0.53,0.18,0.96,0.01,0.00,0.00,0.65,0.01,0.89,0.17,0.01,0.09,0.59,0.12,0.11,0.04,0.02,0.06,0.05,0.49,0.00,0.00,0.02,0.00,0.04,0.72,0.00,0.00,0.01,0.00,0.02,0.49),byrow=TRUE,nrow=4,dimnames=list(c('A','C','G','T'))) freqdf <- as.data.frame(t(freqs)) freqdf$pos = as.numeric(as.character(rownames(freqdf))) freqdf$height <- apply(freqdf[,c('A', 'C','G','T')], MARGIN=1, FUN=function(x){2-sum(log(x^x,base=2))}) logodf <- data.frame(A=freqdf$A*freqdf$height, C=freqdf$C*freqdf$height, G=freqdf$G*freqdf$height, T=freqdf$T*freqdf$height, pos=freqdf$pos) lmf <- melt(logodf, id.var='pos') quartz(height=3, width=8) ggplot(data=lmf, aes(x=as.numeric(as.character(pos)), y=value)) + geom_bar(aes(fill=variable,order=value), position='stack', stat='identity', alpha=0.5) + geom_text(aes(label=variable, size=value, order=value, vjust=value), position='stack') + theme_bw() quartz.save('StackOverflow_5438474.png', type='png') 

这产生这个图:

不错,但不是一个序列标志的情节

就我而言,在ggplot2中没有直接的方法。

但是,检查出RWebLogo。 这是我为WebLogo python库编写的R包装器。 你可以从CRAN下载,它托pipe在github上

简单的例子:

 # Load package library('RWebLogo') # Sample alignment aln <- c('CCAACCCAA', 'CCAACCCTA', 'AAAGCCTGA', 'TGAACCGGA') # Plot logo to file weblogo(seqs=aln, file.out='logo.pdf') # Plot logo to R graphics device (uses generated jpeg logo and raster package) weblogo(seqs=aln, plot=TRUE, open=FALSE, format='jpeg', resolution=600) 

有关更多选项,请参阅?weblogo?plotlogo

这是另一种select。 motiflogo是由ggplot2实现的motif(序列)标识的新表示。 可以考虑两个方面。

  1. 作为规范的图案标志表示
  2. 作为SNP特定的主题标志表示

规范的图案标志表示一个SNP特定的图案标志表示

现在有一个gglogo软件包(也是在CRAN上,另外一个由Heike Hofmann提供的令人惊叹的ggplot2扩展)。

这个产生这样的情节的软件包:

 library(ggplot2) library(gglogo) ggplot(data = ggfortify(sequences, "peptide")) + geom_logo(aes(x=position, y=bits, group=element, label=element, fill=interaction(Polarity, Water)), alpha = 0.6) + scale_fill_brewer(palette="Paired") + theme(legend.position = "bottom") 

在这里输入图像说明

这个例子是从https://github.com/heike/gglogo/blob/master/visual_test/logos.R ,这里有一个包的手稿: https : //github.com/heike/logopaper/blob/master/ logos.Rmd

我很高兴有人终于做到了这一点: https : //omarwagih.github.io/ggseqlogo/

这是ggplot2加载项的github存储库,可以轻松绘制序列标识。 它也有一个很好的文档。