The NCBI Taxonomy is a database of taxonomic information. It does not follow a single taxonomic treatise but rather attempts to incorporate phylogenetic and taxonomic knowledge from a variety of sources, including the published literature, web-based databases, and the advice of sequence submitters and outside taxonomy experts.

Record updated: Aug. 9, 2016, 7:55 p.m. by Madekale.

Eukaryotic Linear Motifs
This computational biology resource mainly focuses on annotation and detection of eukaryotic linear motifs (ELMs) by providing both a repository of annotated motif data and an exploratory tool for motif prediction. ELMs, or short linear motifs (SLiMs), are compact protein interaction sites composed of short stretches of adjacent amino acids.

Pfam Protein Families
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pfam entries which are related by similarity of sequence, structure or profile-HMM.

Comparative Toxicogenomics Database
The Comparative Toxicogenomics Database (CTD) advances understanding of the effects of environmental chemicals on human health. Biocurators manually curate chemical-gene, chemical-disease, and gene-disease relationships from the scientific literature. This core data is then internally integrated to generate inferred chemical-gene-disease networks. Additionally, the core data is integrated with external data sets (such as Gene Ontology and pathway annotations) to predict many novel associations between different data types. A unique and powerful feature of CTD is the inferred relationships generated by data integration that helps turn knowledge into discoveries by identifying novel connections between chemicals, genes, diseases, pathways, and GO annotations that might not otherwise be apparent using other biological resources.

ArchDB is a compilation of structural classifications of loops extracted from known protein structures. The structural classification is based on the geometry and conformation of the loop. The geometry is defined by four internal variables and the type of regular flanking secondary structures, resulting in 10 different loop types. Loops in ArchDB have been classified using an improved version (Espadaler et al.) of the original ArchType program published in 1997 by Oliva et al.

A CLAssification of Mobile genetic Elements
ACLAME is a database dedicated to the collection and classification of mobile genetic elements (MGEs) from various sources, comprising all known phage genomes, plasmids and transposons.

Giga Science Database
GigaDB primarily serves as a repository to host data and tools associated with articles in GigaScience; however, it also includes a subset of datasets that are not associated with GigaScience articles. GigaDB defines a dataset as a group of files (e.g., sequencing data, analyses, imaging files, software programs) that are related to and support an article or study.

UniCarbKB is an initiative that aims to promote the creation of an online information storage and search platform for glycomics and glycobiology research. The knowledgebase will offer a freely accessible and information-rich resource supported by querying interfaces, annotation technologies and the adoption of common standards to integrate structural, experimental and functional data.

probeBase is a manually maintained and curated database of rRNA-targeted oligonucleotide probes and primers. Contextual information and multiple options for evaluating in silico hybridization performance against the most recent rRNA sequence databases are provided for each oligonucleotide entry, which makes probeBase an important and frequently used resource for microbiology research and diagnostics. The major features of probeBase include a classification of probes and primers according to the NCBI taxonomy database, a powerful and customizable search function, which serves to query for target organisms, probe names, primers, target sites, and references. The probeBase match tool can be used to match near-full length rRNA sequences against probeBase and find all published probes targeting the query sequences. The new proxy match tool extends this analysis to partial rRNA sequences, which exploits full-length sequences in the rRNA sequence database SILVA to find published probes potentially targeting partial query sequences. A tool for submitting new or missing probe sequences or references helps to keep probeBase up-to-date.

The FAIRDOMHub is a publicly available resource build using the SEEK software, which enables collaborations within the scientific community. FAIRDOM will establish a support and service network for European Systems Biology. It will serve projects in standardizing, managing and disseminating data and models in a FAIR manner: Findable, Accessible, Interoperable and Reusable. FAIRDOM is an initiative to develop a community, and establish an internationally sustained Data and Model Management service to the European Systems Biology community. FAIRDOM is a joint action of ERA-Net EraSysAPP and European Research Infrastructure ISBE.

ENCODE Project
The ENCODE (Encyclopedia of DNA Elements) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.

Microenvironment Perturbagen (MEP) LINCS Center image server
The MEP LINCS project contributes to the development of the NIH Library of Integrated Network-based Cellular Signatures (LINCS) program by developing a dataset and computational strategy to elucidate how microenvironment (ME) signals affect cell intrinsic intracellular transcriptional- and protein-defined molecular networks to generate experimentally observable cellular phenotypes measured by high-content imaging.

GrainGenes, a Database for Triticeae and Avena
The GrainGenes website hosts a wealth of information for researchers working on Triticeae species, oat and their wild relatives. The website hosts a database encompassing information such as genetic maps, genes, alleles, genetic markers, phenotypic data, quantitative trait loci studies, experimental protocols and publications. The database can be queried by text searches, browsing, Boolean queries, MySQL commands, or by using pre-made queries created by the curators. GrainGenes is not solely a database, but serves as an informative site for researchers and a means to communicate project aims, outcomes and a forum for discussion.

Ensembl Genomes
The Ensembl genome annotation system, developed jointly by the EBI and the Wellcome Trust Sanger Institute, has been used for the annotation, analysis and display of vertebrate genomes since 2000. Since 2009, the Ensembl site has been complemented by the creation of five new sites, for bacteria, protists, fungi, plants and invertebrate metazoa, enabling users to use a single collection of (interactive and programatic) interfaces for accessing and comparing genome-scale data from species of scientific interest from across the taxonomy.

Ensembl Bacteria
Over 30,000 genome sequences from bacteria and archaea have been annotated and deposited in the public archives of the members of the International Nucleotide Sequence Database Collaboration. This site provides access to complete, annotated genomes from bacteria and archaea (present in the European Nucleotide Archive) through the Ensembl graphical user interface (genome browser).

Ensembl Protists
From release 27 release onwards, all protist genomes whose sequence and annotation has been completed and submitted to the the International Nucleotide Sequence Database Collaboration (i.e. the ENA, GenBank and DDBJ databases) are now available in Ensembl Protists. The release now consists of a total of over 150 genomes, of which over 100 have been taken directly from the INSDC archives and the remainder taken from other sources. The new genomes have been functionally annotated with InterPro entries and GO terms using InterPro v53.

Ensembl Plants
A new genome assembly of Triticum aestivum cv. Chinese Spring is now available in Ensembl Plants. The assembly (TGACv1) and it's accompanying annotation was produced by the Earlham Institute, formerly The Centre for Genome Analysis (TGAC), as part of the Triticeae Genomics for Sustainable Agriculture project.

Ensembl Fungi
From release 28 forward, all fungal genomes whose sequence and annotation has been completed and submitted to the the International Nucleotide Sequence Database Collaboration (i.e. the ENA, GenBank and DDBJ databases) is available in Ensembl Fungi. The release now consists of a total of 589 genomes, of which 536 have been taken from the archives and 53 taken directly from other sources.

Ensembl Metazoa
This site provides access to complete, annotated genomes from metazoa through the Ensembl graphical user interface (genome browser).

