今天要讲的是如何用R的bioconductor包来得到芯片探针与基因的对应关系~ 一般重要的芯片在R的bioconductor里面都是有包的,不同的芯片对应不同的包,常见的物种如下: 先安装AnnotationDbi source('http:///biocLite.R')biocLite('AnnotationDbi') 以 biocLite('hgu95av2.db') 然后载入这两个包 library(AnnotationDbi) 看下数据库的信息~ > hgu95av2.dbChipDb object:| DBSCHEMAVERSION: 2.1| Db type: ChipDb| Supporting package: AnnotationDbi| DBSCHEMA: HUMANCHIP_DB| ORGANISM: Homo sapiens| SPECIES: Human| MANUFACTURER: Affymetrix| CHIPNAME: Human Genome U95 Set| MANUFACTURERURL: http://www./support/technical/byproduct.affx?product=hgu95| EGSOURCEDATE: 2015-Sep27| EGSOURCENAME: Entrez Gene| EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA| CENTRALID: ENTREZID| TAXID: 9606| GOSOURCENAME: Gene Ontology| GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/| GOSOURCEDATE: 20150919| GOEGSOURCEDATE: 2015-Sep27| GOEGSOURCENAME: Entrez Gene| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA| KEGGSOURCENAME: KEGG GENOME| KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes| KEGGSOURCEDATE: 2011-Mar15| GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)| GPSOURCEURL: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19| GPSOURCEDATE: 2010-Mar22| ENSOURCEDATE: 2015-Jul16| ENSOURCENAME: Ensembl| ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta| UPSOURCENAME: Uniprot| UPSOURCEURL: http://www./| UPSOURCEDATE: Thu Oct 1 23:31:58 2015Please see: help('select') for usage information 库所包含的内容及可以作为检索键的列分别可以用 > columns(hgu95av2.db) [1] 'ACCNUM' 'ALIAS' 'ENSEMBL' [4] 'ENSEMBLPROT' 'ENSEMBLTRANS' 'ENTREZID' [7] 'ENZYME' 'EVIDENCE' 'EVIDENCEALL' 要是想要看上面的具体某一列的内容可以用 > head(keys(hgu95av2.db, keytype='SYMBOL'))[1] 'A1BG' 'A2M' 'A2MP1' 'NAT1' 'NAT2' 'NATP' 最后,要是我们有一些PROBEID需要转换成SYMBOL,可以这么做: > # 取一些PROBEID
|
|