分享

数据整理这一块工作商业公司可能做得更好-人类lncRNA大全

 健明 2021-07-14

有VIP学员咨询我们,该如何整理人类的lncRNA信息做数据挖掘呢?

正好我看到一款商业芯片 Arraystar Human LncRNA Array V4.0 ,上面介绍:

Arraystar Human LncRNA Array V4.0 has a total of 40,173 lncRNAs in two major lncRNA collections, 7,506 for Gold Standard LncRNAs and 32,667 Reliable LncRNAs, from more than 47 Tb worth of RNA-seq data and all major public databases and repositories, such as Refseq, USCS Known Genes, GENCODE, lincRNA catalogs, lncRNAdb, T-UCRs, RNAdb, NRED, and scientific publications.

https://www./human-lncrna-expression-array-v4-0/

值得大家学习。

https://www./assets/1/6/Arraystar_Human_LncRNA_Array_V4.0.pdf

LncRNA Transcript Collection

-Arraystar Human LncRNA Microarray V4.0
The set of LncRNAs covered by the Arraystar Human LncRNA Microarray V4.0 is carefully constructed
using the most highly-respected public transcriptome databases (Refseq, UCSC knowngenes, Gencode,
etc), as well as landmark publications [1-17]. Our LncRNA database is continually being updated to
ensure that all the latest annotated LncRNAs are included on the array.

1. RefSeq (Updated Aug 2015)

The Reference Sequence (RefSeq) database maintained by NCBI
(http://www.ncbi.nlm./projects/RefSeq/) is a comprehensive and well-annotated collection of
genome, transcript, and protein sequences [1]. There are 4,927 human LncRNAs in Refseq as of August
2015, all of which are included on the Arraystar Human LncRNA Microarray V4.0.

2. UCSC Known genes dataset (Known Genes 7)

The UCSC Known genes dataset (http://genome./cgi-bin/hgTables) contains predicted genes
based on data from RefSeq, Genbank, CCDS and UniProt [2]. After removing small RNAs and other
unrelated transcripts, the Arraystar Human LncRNA Microarray V4.0 covers 3,521 LncRNAs from UCSC
Known genes.

3. GENCODE version 19

The GENCODE project is a database of annotations of all human protein-coding and noncoding genes
using evidence-based gene features [3]. After analyzing the noncoding sequences and removing those
unrelated to LncRNAs, we have designed probes to detect 13,332 LncRNAs from GENCODE.

4. LncRNAdb

LncRNAdb (http:///) is a database of functional LncRNAs that are connected in one way or
another with eukaryotic biological function, including expression patterns, subcellular localization, etc. [6].
122 human LncRNAs in this database are represented on the Arraystar Human LncRNA Microarray
V4.0.

5. NRED

The Noncoding RNA Expression Database (NRED)
(http://jsm-research.imb./nred/cgi-bin/ncrnadb.pl) includes human and mouse LncRNAs with
experimental expression and ancillary information [5]. 645 human LncRNAs from NRED are covered by
the Arraystar Human LncRNA Microarray V4.0.

6. RNAdb 2.0

The RNAdb (http://research.imb./rnadb/) database is archived by the Mattick group at the
Institute for Molecular Bioscience (IMB) [4]. This database contains the legacy sequences and
annotations for thousands of non-coding RNAs. 1,318 human LncRNAs from RNAdb are represented on
the Arraystar Human LncRNA Microarray V4.0. 
Page | 2

7. LincRNAs identified by Khalil et al.

Khalil et al. identified and characterized 3,289 large intergenic noncoding RNAs (lincRNAs) by searching
for regions of chromatin methylation (H3K4me3 and H3K36me3) outside of known protein-coding loci [7].
By mapping these chromatin state data to transcriptome databases, eliminating all annotated
non-lincRNA transcripts (e.g., annotated protein-coding genes, rRNAs and tRNAs), and evaluating their
coding potential, 2,193 of the lincRNAs described by Khalil,et al. were included on the Arraystar Human
LncRNA Microarray V4.0.

8. LincRNAs identified by Cabili et al.

Cabili et al. defined a reference catalog of more than 8,000 human lincRNA genes using their RNA
sequencing results and public database information [8]. 14,353 transcripts expressed from 4,662
stringently-defined human lincRNA genes were identified. 6,969 out of these lincRNA transcripts are
covered by the Arraystar Human LncRNA Microarray V4.0.

9. LincRNAs identified by Iyer et al. & Clark et al.

Clark et al. used CaptureSeq to greatly improve RNA-seq coverage and support the identification of
16,453 lncRNA transcripts in 78 tissue samples. Iyer et al. integrated 7,256 RNA-seq data from 25
independent studies, including TCGA, ENCODE and others, to derive 58,648 LncRNAs [17]. 20,142 of
these LncRNAs are covered by the Arraystar Human LncRNA Microarray V4.0.

10. Ultraconserved regions encoding LncRNAs (T-UCRs)

Ultraconserved regions (UCRs) are intra- and intergenic sequences greater than 200 nt in length that are
100% identical among humans, mice, and rats. 481 human UCRs were identified by Bejerano et al. [9]. A
large fraction of UCRs transcribe a subset of LncRNAs, known as T-UCRs, that are aberrantly expressed
in several human cancers. All T-UCRs are represented on the Arraystar human LncRNA Microarray V4.0.
To help discover potential non-coding transcripts from these regions, we also designed 962 probes to
target both strands of these UCRs (http://users.soe./~jill/ultra.html).

11. HOX loci LncRNAs (HOX LncRNAs)

HOX cluster genes are fundamental regulators of pattern and axis formation during animal development.
Rinn et al. identified 407 transcribed regions within the four HOX loci in humans (101 HOX gene exons,
75 introns and 231 intergenic ncRNA transcripts) [10]. All of these distinct transcribed regions are
targeted by probes on the Arraystar Human LncRNA Microarray V4.0. Furthermore, 68 potential
LncRNAs, whose transcript units (TUs) overlap HOX cluster genes on the same or antisense genomic
strand, are covered by the Arraystar Human LncRNA Microarray V4.0.

12. LncRNAs with Enhancer-like Function (LncRNA-a)

Using the human GENCODE annotation, Orom et al. identified 3,019 human LncRNAs with
enhancer-like function expressed from 2,286 unique genes [11]. LncRNAs with enhancer-like function
are included on the Arraystar Human LncRNA Microarray V4.0.

References:

  1. Pruitt K.D. et al. (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids
    Res, 2014. 42(Database issue):D756-63

  2. Hsu, F., et al., The UCSC Known Genes. Bioinformatics, 2006. 22(9): p. 1036-46.

  3. Harrow, J., et al., GENCODE: producing a reference annotation for ENCODE. Genome Biol,

  4. 7 Suppl 1: p. S4 1-9.

  5. Pang, K.C., et al., RNAdb 2.0--an expanded database of mammalian non-coding RNAs. Nucleic
    Acids Res, 2007. 35(Database issue): p. D178-82.

  6. Dinger, M.E., et al., NRED: a database of long noncoding RNA expression. Nucleic Acids Res,

  7. 37(Database issue): p. D122-6.

  8. Quek X.C. et al., lncRNAdb v2.0: expanding the reference database for functional long
    noncoding RNAs. Nucleic Acids Res. 2015. 43(Database issue):D168-73

  9. Khalil, A.M., et al., Many human large intergenic noncoding RNAs associate with
    chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A, 2009.
    106(28): p. 11667-72.

  10. Cabili, M.N., et al., Integrative annotation of human large intergenic noncoding RNAs reveals
    global properties and specific subclasses. Genes Dev, 2011. 25(18): p. 1915-27.

  11. Bejerano, G., et al., Ultraconserved elements in the human genome. Science, 2004. 304(5675):
    p. 1321-5.

  12. Rinn, J.L., et al., Functional demarcation of active and silent chromatin domains in human HOX
    loci by noncoding RNAs. Cell, 2007. 129(7): p. 1311-23.

  13. Orom, U.A., et al., Long noncoding RNAs with enhancer-like function in human cells. Cell, 2010.
    143(1): p. 46-58.

  14. Pang, K.C., et al., RNAdb--a comprehensive mammalian noncoding RNA database. Nucleic
    Acids Res, 2005. 33(Database issue): p. D125-30.

  15. Mercer, T.R., et al., Specific expression of long noncoding RNAs in the mouse brain. Proc Natl
    Acad Sci U S A, 2008. 105(2): p. 716-21.

  16. Guttman, M., et al., Chromatin signature reveals over a thousand highly conserved large
    non-coding RNAs in mammals. Nature, 2009. 458(7235): p. 223-7.

  17. Benson, D.A., et al., GenBank: update. Nucleic Acids Res, 2004. 32(Database issue): p. D23-6.

  18. Clark, et al.Quantitative gene profiling of long noncoding RNAs with targeted RNA sequencing.
    Nat Methods, 2015. 12(4): 339-342.

  19. Iyer, et al. The landscape of long noncoding RNAs in the human transcriptome. Nat Genet 2015.
    47(3): 199-208.

      ■

    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多