An integrated knowledge database dedicated to ncRNAs, especially lncRNAs.

What is NONCODE ?


NONCODE (current version v6.0) is an integrated knowledge database dedicated to non-coding RNAs (excluding tRNAs and rRNAs). Now, there are 39 species in NONCODE including 16 animals and 23 plants. The source of NONCODE includes literature and other public databases. We searched PubMed using key words ‘ncrna’, ‘noncoding’, ‘non-coding’,‘no code’, ‘non-code’, ‘lncrna’ or ‘lincrna. We retrieved the new identified lncRNAs and their annotation from the Supplementary Material or web site of these articles. Together with the newest data from Ensembl, RefSeq, lncRNAdb and GENCODE were processed through a standard pipeline for each species. The pipeline includes seven steps:

  1. Format normalization. All input data were processed into bed or gtf formats based on one assembly version. For example, Tair 10 and Tair 9 are two different assembly versions of A.thaliana. All of the related data were converted into the Tair 10 version.
  2. Multi-source data combination. All of the normalized data files were combined using the Cuffcompare program in the Cufflinks suite
  3. Protein-coding RNA filtration. We filtered out protein-coding RNA using two methods. First, all RNAs were compared with the coding RNAs in RefSeq and Ensemble. Second, CNIT (Coding-NonCoding Identifying Tool) was used to filter the RNAs and only the RNAs considered noncoding by CNIT were kept.
  4. General information presentation. Location, exons, length, assembly sequence, source are listed in each transcript.
  5. Expression profiles and functions prediction in plants. Corresponding information in four common plants out of 23 are shown. Their expression profiles were curated from multiple tissues. Detailed data sources were listed in supplementary table 1. Functions for lncRNAs were predicted by co-expression with coding genes.
  6. Conservation analysis at transcript level. Plant lncRNA conservation analysis was conducted with BLAST. The E-value cutoff was e-10. Each transcript in a plant species was blasted against every other transcript in the other 22 plant species.
  7. Web presence. New web pages especially for plants were constructed in NONCODEV6. More annotation information has been updated.

Now, there are 39 species in NONCODE. All in all, NONCODE tries to present the most complete collection and annotation of non-coding RNA. It not only provides the basic information of lncRNA such as location, strand, exon number, length and sequence, but also the advanced information such as the expression profile, exosome expression profile, conservation info, predicted function and disease relation.

The genome version of each species in current NONCODE version

SpeciesGenome VersionAbbreviationPhylum
ChimppanTro4PTRAnimal
GorillagorGor3GGOAnimal
OpossummonDom5MDOAnimal
OrangutanponAbe2PPYAnimal
PlatypusornAna1OANAnimal
RhesusrheMac3MMLAnimal
Humanhg38HASAnimal
Mousemm10MMUAnimal
C. elegansce10CELAnimal
CowbosTau6BTAAnimal
ChickengarGal4GGAAnimal
Fruitflydm6DMEAnimal
Ratrn6RNOAnimal
YeastsacCer3SCEAnimal
ZebrafishdanRer10DREAnimal
PigsusScr3SUSAnimal
A. thalianaTAIR10ATHPlant
B. napusAST_PRJEB5043_v1BNAPlant
B. rapaIVFCAASv1BRAPlant
QuinoaASM168347v1CQUPlant
C. reinhardtiiChlamydomonas_reinhardtii_v5.5CREPlant
CucumberASM407v2CSAPlant
SoybeanGlycine_max_v1.0GMAPlant
G. raimondiiGraimondii2_0GRAPlant
AppleASM211411v1MALPlant
CassavaManihot_esculenta_v6MESPlant
M. truncatulaMedtrA17_4.0MTRPlant
BananaMA1MACPlant
O. rufipogonOR_W1943ORUPlant
O. sativaIRGSP-1.0OSAPlant
P. patensPhypa_V3PPAPlant
P. trichocarpaJGI2.0POPPlant
Tomato390_v2.5SLYPlant
PotatoSolTub_3.0STUPlant
CacaoTheobroma_cacao_20110822TCAPlant
TrefoilTrprTPRPlant
WheatIWGSCTAEPlant
GrapeIGGP_12xVVIPlant
MaizeAGPv4ZMAPlant