General Information
Gene Ontology is a structured control vocabulary for use by the research community for the annotation of genes, gene products and sequencing.

Implementing Databases (140)
A genetic database for attention deficit hyperactivity disorder. ADHDgene aims to provide research community with a central genetic resource and analysis platform for ADHD, to help unveil the genetic basis of ADHD and to contribute to global mental health.

Aspergillus Genome Database
The Aspergillus Genome Database is a resource for genomic sequence data as well as gene and protein information for Aspergilli. This publicly available repository is a central point of access to genome, transcriptome and polymorphism data for the fungal research community.

Central Aspergillus Data REpository
This project aims to support the international Aspergillus research community by gathering all genomic information regarding this significant genus into one resource - The Central Aspergillus REsource (CADRE). CADRE facilitates visualisation and analyses of data using the Ensembl software suite. Much of our data has been extracted from Genbank and augmented with the consent of the original sequencing groups. This additional work has been carried out using both automated and manual efforts, with support from specific annotation projects and the general Aspergillus community.

CAPS-DB : a structural classification of helix-capping motifs
CAPS-DB is a structural classification of helix-cappings or caps compiled from protein structures. Caps extracted from protein structures have been structurally classified based on geometry and conformation and organized in a tree-like hierarchical classification where the different levels correspond to different properties of the caps.

Database of Differentially Expressed Proteins in Human Cancer
The dbDEPC is a database of differentially expressed proteins in human cancers.

Database of Bacterial Exotoxins for Human
DBETH is the Database of Bacterial Exotoxins for Human. The aim of this database is to assemble information on the toxins responsible for causing bacterial pathogenesis in humans.

EcoliWiki: A Wiki-based community resource for Escherichia coli
Community-based resource for the annotation of all non-pathogenic E. coli, its phages, plasmids, and mobile genetic elements.

Eukaryotic Linear Motifs
This computational biology resource mainly focuses on annotation and detection of eukaryotic linear motifs (ELMs) by providing both a repository of annotated motif data and an exploratory tool for motif prediction. ELMs, or short linear motifs (SLiMs), are compact protein interaction sites composed of short stretches of adjacent amino acids.

A database of exosomes, membrane vesicles of endocytic origin released by diverse cell types.

Genetic, genomic and molecular information pertaining to the model organism Drosophila melanogaster and related sequences. This database also contains information relating to human disease models in Drosophila, the use of transgenic constructs containing sequence from other organisms in Drosophila, and information on where to buy Drosophila strains and constructs.

DRSC Functional Genomics Resources
DRSC Functional Genomics Resources (DRSC-FGR) began as the Drosophila RNAi Screening Center (DRSC), founded by Prof. Norbert Perrimon in 2003, and the Transgenic RNAi Project (TRiP), founded by Prof. Perrimon in 2008. DRSC-FGR has been previously known as It has since grown into a functional genomics platform meeting the needs of the Drosophila and broader community.

Networks of Functional Coupling of proteins
FunCoup is a framework to infer genome-wide functional couplings in 11 model organisms. Functional coupling, or functional association, is an unspecific form of association that encompasses direct physical interaction but also more general types of direct or indirect interaction like regulatory interaction or participation the same process or pathway.

Fungal and Oomycete genomics resource
FungiDB is an integrated genomic and functional genomic database for the kingdom Fungi. The database integrates whole genome sequence and annotation and also includes experimental and environmental isolate sequence data. The database includes comparative genomics, analysis of gene expression, and supplemental bioinformatics analyses and a web interface for data-mining.

Expression Atlas
The Expression Atlas is a free resource providing information on gene expression patterns under different biological conditions. Gene expression data is re-analysed in-house to detect genes showing interesting baseline and differential expression patterns, allowing a user to ask questions such as "what are the genes expressed in normal human liver" and "what genes are differentially expressed between water-stressed rice plants and controls with normal watering?". The resource also features a few proteomics data sets provided by collaborators for corroboration between gene- and protein-level expression results.

GeneDB is a genome database for prokaryotic and eukaryotic organisms and provides a portal through which data generated by the "Pathogen Genomics" group at the Wellcome Trust Sanger Institute and other collaborating sequencing centres can be accessed.

IMG/M: the integrated metagenome data management and comparative analysis system
Data management and analysis system for metagenomes

IntAct molecular interaction database
IntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.

Mechanism, Annotation and Classification in Enzymes
Mechanism, annotation and classification in enzymes: enzyme reaction mechanisms

MetaCrop 2.0
The MetaCrop resource contains information on the major metabolic pathways mainly in crops of agricultural and economic importance. The database includes manually curated information on reactions and the kinetic data associated with these reactions. Ontology terms are used and publication identification available to ease mining the data.

Molecular INTeraction database
MINT focuses on experimentally verified protein-protein interactions mined from the scientific literature by expert curators. As of September 2013, MINT uses the IntAct database infrastructure to limit the duplication of efforts and to optimise future software development. Data maintenance and release, MINT PSICQUIC and IMEx services are under the responsibility of the IntAct team, while curation effort will be carried by both groups. Data manually curated by the MINT curators can now also be accessed from the IntAct homepage at the EBI.

Integrated web resource of mitochondrial localisation evidence and phenotype data for mammals, zebrafish and yeasts.

modMine is an integrated web resource of data & tools to browse and search modENCODE data and experimental details, download results and access the GBrowse genome browser.

Network of Cancer Genes
The Network of Cancer Genes (NCG) is a manually curated repository of cancer genes derived from the scientific literature. NCG also provides information on the experimental validation that supports the role of these genes in cancer and annotates their properties (duplicability, evolutionary origin, expression profile, function and interactions with proteins and miRNAs).

A comprehensive repository for omics data from the red spotted newt Notophthalmus viridescens from high throughput experiments. Newt-Omics aims to provide a comprehensive platform of expressed genes during tissue regeneration, including extensive annotations, expression data and experimentally verified peptide sequences with yet no homology to other publically available gene sequences. The goal is to obtain a detailed understanding of the molecular processes underlying tissue regeneration in the newt,that may lead to the development of approaches, efficiently stimulating regenerative pathways in mammalians.

GENI-ACT is a resource that allows the research community to collaboratively annotate bacterial genomes. Changes can be suggested to existing genomes and these alterations can be ported back to NCBI Genbank. GENI-ACT also has modules which can be used for educational purposes.

Pfam Protein Families
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pfam entries which are related by similarity of sequence, structure or profile-HMM.

Phenomics of yeast Mutants
PhenoM (Phenomics of yeast Mutants) stores, retrieves, visualises and data mines the quantitative single-cell measurements extracted from micrographs of temperature-sensitive mutant cells. PhenoM allows users to rapidly search and retrieve raw images and their quantified morphological data for genes of interest. The database also provides several data-mining tools, including a PhenoBlast module for phenotypic comparison between mutant strains and a Gene Ontology module for functional enrichment analysis of gene sets showing similar morphological alterations.

Protein Interaction Network Analysis
The Protein Interaction Network Analysis (PINA)- is an integrated platform for protein interaction network construction, filtering, analysis, visualization and management. It integrates protein-protein interaction data from six public curated databases and builds a complete, non-redundant protein interaction dataset for six model organisms.

Plant Natural Antisense Transcripts Database
Natural Antisense Transcripts (NATs), a kind of regulatory RNAs, occur prevalently in plant genomes and play significant roles in physiological and/or pathological processes. PlantNATsDB (Plant Natural Antisense Transcripts DataBase) is a platform for annotating and discovering NATs by integrating various data sources. PlantNATsDB also provides an integrative, interactive and information-rich web graphical interface to display multidimensional data, and facilitate plant research community and the discovery of functional NATs.

Saccharomyces Genome Database
The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae. SGD contains a variety of biological information and tools with which to search and analyze it.

The Arabidopsis Information Resource
The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.

BacMap is a picture atlas of annotated bacterial genomes. It is an interactive visual database containing hundreds of fully labeled, zoomable, and searchable maps of bacterial genomes.

Precalculated structural assignments for whole genomes.

GWASdb comprises of collections of traits/diseases associated SNP (TASs) from current GWAS and their comprehensive functional annotations, as well as disease classifications

miRNEST is an integrative collection of animal, plant and virus microRNA data. is the home page of the parasitic nematode EST project at The Genome Institute at Washington University in St. Louis. The site was established in 2000 as a component of the NIH-NIAID grant "A Genomic Approach to Parasites from the Phylum Nematoda". While started as a project site, over the years it became a community resource dedicated to the study of parasitic nematodes.

neXtProt is a comprehensive human-centric discovery platform, offering its users a seamless integration of and navigation through protein-related data.

PomBase is a model organism database that provides organization of and access to scientific data for the fission yeast Schizosaccharomyces pombe. PomBase supports genomic sequence and features, genome-wide datasets and manual literature curation as well as providing structural and functional annotation and access to large-scale data sets.

Collaborative resource for the Bacillus community.

Drug-related information: medical indications, adverse drug effects, drug metabolism and Gene Ontology terms of the target proteins.

Termini-Oriented Protein Function INferred Database
TopFIND is a protein-centric database for the annotation of protein termini currently in its third version. Non-canonical protein termini can be the result of multiple different biological processes, including pre-translational processes such as alternative splicing and alternative translation initiation or post-translational protein processing by proteases that cleave proteases as part of protein maturation or as a regulatory modification. Accordingly, protein termini evidence in TopFIND is inferred from other databases such as ENSEMBL transcripts, TISdb for alternative translation initiation, MEROPS for protein cleavage by proteases, and UniProt for canonical and protein isoform start sites. Additionally, termini are annotated from user submitted lists of termini and inferred from user submitted lists of cleavage sites. As a protein-centric database, TopFIND presents a website for each protein isoform (organized around UniProt accession codes). These websites contain general protein information, such as organism, chromosome location, and proteins sequence. They then list position information such as specific termini evidences, known cleavage sites, sequence features and domains for each protein. In addition, TopFIND shows each protein in the context of the protease web, a network of proteases and their inhibitors, where a protease can cleave of other proteases and their inhibitors thus influencing their activity. All information in TopFIND can be filtered by a powerful filter engine that relies on rich annotation as to the origin of data in TopFIND. TopFIND can also be programmatically queried using the PSICQUIC or XML API. Recently, software tools were developed to enable quick access to TopFIND data for lists of termini obtained by, for example, proteomic termini screens (terminomics). TopFIND Explorer “TopFINDer” reports position specific protein information for protein termini, such as terminus evidences, prime and non-prime sequences, and protein domains affected by cleavage. TopFINDer further reports summary statistics for protein cleavage by known proteases. PathFINDer is a second tool that reports proteolytic paths from a query protease to identified protein substrates thus enabling the differentiation between direct and indirect protease substrates and yielding mechanistic insights into pathways based on existing information.

TTD, Therapeutic Target Database
The Therapeutic Target Database provides information about therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets. Also included in this database are links to relevant databases containing information about target function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and drug structure, therapeutic class, clinical development status. All information is fully referenced.

The UCSC Archaeal Genome Browser
The UCSC Archaeal Genome Browser is a window on the biology of more than 100 microbial species from the domain Archaea. Basic gene annotation is derived from NCBI Genbank/RefSeq entries, with overlays of sequence conservation across multiple species, nucleotide and protein motifs, non-coding RNA predictions, operon predictions, and other types of bioinformatic analyses. In addition, we display available gene expression data (microarray or high-throughput RNA sequencing). Direct contributions or notices of publication of functional genomic data or bioinformatic analyses from archaeal research labs are very welcome.

Virus Pathogen Database and Analysis Resource
The Virus Pathogen Database and Analysis Resource (ViPR) is an integrated repository of data and analysis tools for multiple virus families, supported by the National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Centers (BRC) program. ViPR captures various types of information, including sequence records, gene and protein annotations, 3D protein structures, immune epitope locations, clinical and surveillance metadata and novel data derived from comparative genomics analysis. The database is available without charge as a service to the virology research community to help facilitate the development of diagnostics, prophylactics and therapeutics for priority pathogens and other viruses.

WikiPathways is an open, collaborative platform dedicated to the curation of biological pathways. WikiPathways was established to facilitate the contribution and maintenance of pathway information by the biology community.

WormBase is an international consortium of biologists and computer scientists dedicated to providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes.

The Yeast Metabolome Database (YMDB) is a manually curated database of small molecule metabolites found in or produced by Saccharomyces cerevisiae (also known as Baker’s yeast and Brewer’s yeast). This database covers metabolites described in textbooks, scientific journals, metabolic reconstructions and other electronic databases.

A comprehensive online knowledgebase for the monkey research community.

Human Ageing Genomic Resources
The Human Ageing Genomic Resources (HAGR) is a collection of databases and tools for the biology and genetics of ageing. HAGR features several databases with high-quality, manually-curated data: 1) GenAge, a database of genes associated with ageing in humans and model organisms; 2) AnAge, an extensive collection of longevity records and complementary traits for over 4,000 vertebrate species; and 3) GenDR, a database containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction.

Comparative Toxicogenomics Database
The Comparative Toxicogenomics Database (CTD) advances understanding of the effects of environmental chemicals on human health. Biocurators manually curate chemical-gene, chemical-disease, and gene-disease relationships from the scientific literature. This core data is then internally integrated to generate inferred chemical-gene-disease networks. Additionally, the core data is integrated with external data sets (such as Gene Ontology and pathway annotations) to predict many novel associations between different data types. A unique and powerful feature of CTD is the inferred relationships generated by data integration that helps turn knowledge into discoveries by identifying novel connections between chemicals, genes, diseases, pathways, and GO annotations that might not otherwise be apparent using other biological resources.

mycoCLAP is a searchable resource for the knowledge and annotation of Characterized Lignocellulose-Active Proteins of fungal origin.

wiki-pain is a wiki containing molecular interactions that are relevant to pain. Each molecular interaction is shown in relation to pain, disease, mutations, anatomy and a summary of its mentions throughout the literature is provided.

CellFinder maps validated gene and protein expression, phenotype and images related to cell types.The data allow characterization and comparison of cell types and can be browsed by using the body browser and by searching for cells or genes. All cells are related to more complex systems such as tissues, organs and organisms and arranged according to their position in development. CellFinder provides long-term data storage for validated and curated primary research data and provides additional expert-validation through relevant information extracted from text.

Implemented the SNP discovery software autoSNP within a relational database to enable the efficient mining of the identified polymorphisms and the detailed interrogation of the data. AutoSNP was selected because it does not require sequence trace files and is thus applicable to a broader range of species and datasets.

AgBase is a curated, open-source, Web-accessible resource for functional analysis of agricultural plant and animal gene products.

Arabidopsis Thaliana Trans-factor and cis-Element prediction Database
ATTED-II is a coexpression database for plant species with parallel views of multiple coexpression data sets and network analysis tools. The user can find functional gene relationships and design experiments to identify gene functions by reverse genetics and general molecular biology techniques.

Mammalian Protein Localization Database
LOCATE is a curated database that houses data describing the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set.

Pathogen Host Interactions
PHI-Base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions. PHI-base catalogues experimentally verified pathogenicity, virulence and effector genes from fungal, Oomycete and bacterial pathogens, which infect animal, plant, fungal and insect hosts.

Chicken Variation Database
The chicken Variation Database (ChickVD) is an integrated information system for storage, retrieval, visualization and analysis of chicken variation data.

Berkeley Drosophila Genome Project EST database
The goals of the Drosophila Genome Center are to finish the sequence of the euchromatic genome of Drosophila melanogaster to high quality and to generate and maintain biological annotations of this sequence.

Agile Protein Interactomes Dataserver
Agile Protein Interactomes Dataserver (APID) is an interactive bioinformatic web-tool that has been developed to allow exploration and analysis of main currently known information about protein-protein interactions integrated and unified in a common and comparative platform. The new, fully redesigned server has a comprehensive collection of protein interactomes for more than 400 organisms produced by the integration of only experimentally validated protein–protein physical interactions. This reincarnation supersedes the Agile Protein Interaction DataAnalyzer.

Database of Rice Transcription Factors
DRTF contains 2025 putative transcription factors (TFs) in Oryza sativa L. ssp. indica and 2384 in ssp. japonica, distributed in 63 families, identified by computational prediction and manual curation. It includes detailed annotations of each TF including sequence features, functional domains, Gene Ontology assignment, chromosomal localization, EST and microarray expression information, as well as multiple sequence alignment of the DNA-binding domains for each TF family.

Plant Transcription Factor Database
Plant Transcription Factor Database (PlantTFDB) provides a comprehensive, high-quality resource of plant transcription factors (TFs), regulatory elements and interactions between them. In the latest version, It contains 320 370 TFs, classified into 58 families, from 165 species. Abundant functional and evolutionary annotations (e.g., GO, functional description, binding motifs, cis-element, regulation, references, orthologous groups and phylogenetic tree, etc.) are provided for identified TFs. In addition, multiple online tools are set up for TF identification, regulation prediction and functional enrichment analyses.

Candida Genome Database
The Candida Genome Database (CGD) provides access to genomic sequence data and manually curated functional information about genes and proteins of the human pathogen Candida albicans. It collects gene names and aliases, and assigns gene ontology terms to describe the molecular function, biological process, and subcellular localization of gene products.

GreenPhylDB: A phylogenomic database for plant comparative genomics
GreenPhylDB comprises 37 full genomes from the major phylum of plant evolution. Clustering of these genomes was performed to define a consistent and extensive set of homeomorphic plant families.

OryGenesDB: an interactive tool for rice reverse genetics
The aim of this Oryza sativa database was first to display sequence information such as the T-DNA and Ds flanking sequence tags (FSTs) produced in the framework of the French genomics initiative Genoplante and the EU consortium Cereal Gene Tags. This information was later linked with related molecular data from external rice molecular resources (cDNA full length, Gene, EST, Markers, Expression data...).

TropGENE DB is a database that manages genetic and genomic information about tropical crops studied by Cirad. The database is organised into crop specific modules.

The Diatom EST Database
Diatoms are photosynthetic unicellular eukaryotes that play an essential role in marine ecosystems. On a global scale, they generate around one fifth of the oxygen we breathe. On this web site we present searchable databases of diatom ESTs (expressed sequence tags) that can be used to explore diatom biology.

Gramene, a comparative mapping resource for grains
Gramene's purpose is to provide added value to data sets available within the public sector, which will facilitate researchers' ability to understand the grass genomes and take advantage of genomic sequence known in one species for identifying and understanding corresponding genes, pathways and phenotypes in other grass species.

Plant Genome Network
PGN is a repository for plant EST sequence data located at Cornell. It comprises an analysis pipeline and a website, and presently contains mainly data from the Floral Genome Project.

Tomato Functional Genomics Database
The Tomato Functional Genomics Database integrates several prior databases including the Tomato Expression Database and Tomato Metabolite Database, and the Tomato Small RNA Database.

Protein Data Bank in Europe
The Protein Data Bank in Europe is a founding member of the worldwide Protein Data Bank which collects, organises and disseminates data on biological macromolecular structures.

Reactome - a curated knowledgebase of biological pathways
The Reactome project is a collaboration to develop a curated resource of core pathways and reactions in human biology.

MycoBrowser leprae
Mycobrowser is a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmatis. This collection references Mycobacteria leprae information.

MycoBrowser marinum
Mycobrowser is a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmatis. This collection references Mycobacteria marinum information.

MycoBrowser smegmatis
Mycobrowser is a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmatis. This collection references Mycobacteria smegmatis information.

Simple Modular Architecture Research Tool
SMART (Simple Modular Architecture Research Tool) is a web resource providing simple identification and extensive annotation of protein domains and the exploration of protein domain architectures.

The Gene Index Project
The goal of The Gene Index Project is to use the available EST and gene sequences, along with the reference genomes wherever available, to provide an inventory of likely genes and their variants and to annotate these with information regarding the functional roles played by these genes and their products.

MatrixDB: Extracellular Matrix interactions database
MatrixDB is a database reporting mammalian protein-protein and protein-carbohydrate interactions involving extracellular molecules. Interactions with lipids and cations are also reported. Full-length molecules, fragments and multimers present in the extracellular matrix are all included in the database.

Eukaryotic Genes
euGenes provides a common summary of gene and genomic information from eukaryotic organism databases including gene symbol and full name, chromosome, genetic and molecular map information, Gene Ontology (Function/Location/Process) and gene homology, product information.

Aphid Genomics Database
The Aphid Genome Database's aim is to improve the current pea aphid genome assembly and annotation, and to provide new aphid genome sequences as well as tools for analysis of these genomes.

Human Protein Reference Database
The Human Protein Reference Database represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome.

Integrated Microbial Genomes
The integrated microbial genomes (IMG) system is a data management, analysis and annotation platform for all publicly available genomes. IMG contains both draft and complete JGI (DoE Joint Genome Institute) microbial genomes integrated with all other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses.

Yeast Searching for Transcriptional Regulators and Consensus Tracking
YEASTRACT (Yeast Search for Transcriptional Regulators And Consensus Tracking) is a curated repository of more than 48333 regulatory associations between transcription factors (TF) and target genes in Saccharomyces cerevisiae, based on more than 1200 bibliographic references.

Maize Genetics and Genomics Database
MaizeGDB is the maize research community's central repository for genetics and genomics information.

The Global Proteome Machine Database
Rather than being a complete record of a proteomics experiment, this database holds the minimum amount of information necessary for certain bioinformatics-related tasks, such as sequence assignment validation. Most of the data is held in a set of XML files.

Rat Genome Database
The Rat Genome Database is the premier site for genetic, genomic, phenotype, and disease data generated from rat research. It provides easy access to corresponding human and mouse data for cross-species comparison and its comprehensive data and innovative software tools make it a valuable resource for researchers worldwide.

SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes.

Pathway Commons
Pathway Commons is a convenient point of access to biological pathway information collected from public pathway databases. Information is sourced from public pathway databases and is readily searched, visualized, and downloaded. The data is freely available under the license terms of each contributing database.

Mouse Genome Database - a Mouse Genome Informatics (MGI) Resource
MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. Data includes gene characterization, nomenclature, mapping, gene homologies among mammals, sequence links, phenotypes, allelic variants and mutants, and strain data.

Pathway Interaction Database
The Pathway Interaction Database is a highly-structured, curated collection of information about known biomolecular interactions and key cellular processes assembled into signaling pathways.

The NCBI BioSystems database centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. The resource provides categorical information on genes, proteins and small molecules of biosystems.

NCBI Gene provides information for genes from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.

PubChem is organized as three linked databases within the NCBI's Entrez information retrieval system. These are PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem also provides a fast chemical structure similarity search tool. More information about using each component database may be found using the links in the homepage.

Rice Genome Annotation Project
This website provides genome sequence from the Nipponbare subspecies of rice and annotation of the 12 rice chromosomes. These data are available through search pages and the Genome Browser that provides an integrated display of annotation data.

The Cardiovascular Research Grid
The CardioVascular Research Grid (CVRG) project is creating an infrastructure for sharing cardiovascular data and data analysis tools. CVRG tools are developed using the Software as a Service model, allowing users to access tools through their browser, thus eliminating the need to install and maintain complex software.

Transporter Classification Database
The database details a comprehensive IUBMB approved classification system for membrane transport proteins known as the Transporter Classification (TC) system. The TC system is analogous to the Enzyme Commission (EC) system for classification of enzymes, except that it incorporates both functional and phylogenetic information. Descriptions, TC numbers, and examples of over 600 families of transport proteins are provided. Transport systems are classified on the basis of five criteria, and each of these criteria corresponds to one of the five numbers or letters within the TC# for a particular type of transporter.

The Oryzabase is a comprehensive rice science database established in 2000 by rice researcher's committee in Japan. The Oryzabase consists of five parts, (1) genetic resource stock information, (2) gene dictionary, (3) chromosome maps, (4) mutant images, and (5) fundamental knowledge of rice science.

Protein Data Bank Japan
The Protein Data Bank is the single worldwide archive of structural data of biological macromolecules.

TargetTrack, a target registration database, provides information on the experimental progress and status of targets selected for structure determination.

3D interacting domains
The database of 3D Interaction Domains (3did) is a collection of domain-domain interactions in proteins for which high-resolution three-dimensional structures are known. 3did exploits structural information to provide critical molecular details necessary for understanding how interactions occur.

Pseudomonas Genome DB
The Pseudomonas Genome Database is a resource for peer-reviewed, continually updated annotation for all Pseudomonas species. It includes gene and protein sequence information, as well as regulation and predicted function and annotation.

InnateDB has been developed to facilitate systems level investigations of the mammalian (human, mouse and bovine) innate immune response. Its goal is to provide a manually-curated knowledgebase of the genes, proteins, and particularly, the interactions and signaling responses involved in mammalian innate immunity. InnateDB incorporates information of the whole human, mouse and bovine interactomes by integrating interaction and pathway information from several of the major publicly available databases but aims to capture an improved coverage of the innate immunity interactome through manual curation.

Protein ANalysis THrough Evolutionary Relationships: Classification of Genes and Proteins
The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System is a unique resource that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in the absence of direct experimental evidence.

Stanford Microarray Database
The Stanford Microarray Database is a repository a microarray based gene expression and comparative genomics data. This resource is no longer being maintained please us public repositories NCBI Gene Expression Omnibus or EBI Array Express

HAMAP database of microbial protein families
HAMAP is a system, based on manual protein annotation, that identifies and semi-automatically annotates proteins that are part of well-conserved families or subfamilies: the HAMAP families. HAMAP is based on manually created family rules and is applied to bacterial, archaeal and plastid-encoded proteins.

UniPathway is a manually curated resource of metabolic pathways for the UniProtKB/Swiss-Prot knowledgebase. It provides a structured controlled vocabulary to describe the role of a protein in a metabolic pathway.

UniProt Knowledgebase
The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data. The UniProt Knowledgebase consists of two sections: a reviewed section containing manually-annotated records with information extracted from literature and curator-evaluated computational analysis (aka "UniProtKB/Swiss-Prot"), and an unreviewed section with computationally analyzed records that await full manual annotation (aka "UniProtKB/TrEMBL").

A CLAssification of Mobile genetic Elements
ACLAME is a database dedicated to the collection and classification of mobile genetic elements (MGEs) from various sources, comprising all known phage genomes, plasmids and transposons.

CATH Protein Structure Classification
The CATH database is a hierarchical domain classification of protein structures in the Protein Data Bank. Protein structures are classified using a combination of automated and manual procedures. There are four major levels in this hierarchy; Class (secondary structure classification, e.g. mostly alpha), Architecture (classification based on overall shape), Topology (fold family) and Homologous superfamily (protein domains which are thought to share a common ancestor). This collection is concerned with superfamily classification.

The Human Metabolome Database
The Human Metabolome Database (HMDB) is a database containing detailed information about small molecule metabolites found in the human body.It contains or links 1) chemical 2) clinical and 3) molecular biology/biochemistry data.

Xenopus laevis and tropicalis biology and genomics resource
Xenbase is the model organism database for Xenopus laevis and X. (Silurana) tropicalis. It contains genomic, development data and community information for Xenopus research. It includes gene expression patterns that incorporate image data from the literature, large scale screens and community submissions.

Knockout Mouse Project
Knockout mutant mice strains. The KOMP repository is a resource of mouse embryonic stem (ES) cells containing a null mutation in every gene in the mouse genome.

The Zebrafish Model Organism Database
The Zebrafish Model Organism Database, ZFIN, serves as the primary community database resource for the laboratory use of zebrafish. We develop and support integrated zebrafish genetic, genomic, developmental and physiological information and link this information extensively to corresponding data in other model organism and human databases.

Compendium of Protein Lysine Modifications
CPLM (Compendium of Protein Lysine Modifications) is an online data resource specifically designed for protein lysine modifications (PLMs).

A Systematic Annotation Package
ASAP is a relational database and web interface developed to store, update and distribute genome sequence data and gene expression data. It was designed to facilitate ongoing community annotation of genomes and to grow with genome projects as they move from the preliminary data stage through post-sequencing functional analysis.

Genome Database for Rosaceae
The Genome Database for Rosaceae (GDR) is a curated and integrated web-based relational database providing centralized access to Rosaceae genomics and genetics data and analysis tools to facilitate cross-species utilization of data.

Yeast Resource Center Public Data Repository
The National Center for Research Resources' Yeast Resource Center is located at the University of Washington in Seattle, Washington. The mission of the center is to facilitate the identification and characterization of protein complexes in the yeast Saccharomyces cerevisiae.

EBI Metagenomics
"EBI Metagenomics" is a free-to-use resource aiming at supporting all metagenomics researchers. The service is an automated pipeline for the analysis and archiving of metagenomic data that aims to provide insights into the phylogenetic diversity as well as the functional and metabolic potential of a sample. You can freely browse all the public data in the repository.

CentrosomeDB is a collection of human and drosophila centrosomal genes that were reported in the literature and other sources. The database offers the possibility to study the evolution, function, and structure of the centrosome. They have compiled information from many sources, including Gene Ontology, disease-association, single nucleotide polymorphisms, and associated gene expression experiments.

Manually Curated Database of Rice Proteins
‘Manually Curated Database of Rice Proteins’ (MCDRP) is a unique manually curated database based on published experimental data. Currently, the database has data for over 1800 rice proteins curated from > 4000 different experiments of over 400 research articles. Since every aspect of the experiment such as gene name, plant type, tissue and developmental stage has been digitized, experimental data can be rapidly accessed and integrated.

The MOuse NOnCode Lung database
MONOCLdb is an integrative and interactive database designed to retrieve and visualize annotations and expression profiles of long-non coding RNAs (lncRNAs) expressed in Collaborative Cross ( founder mice in response to respiratory influenza and SARS infections.

FlyMine is an integrated database of genomic, expression and protein data for Drosophila, Anopheles and C. elegans. Integrating data makes it possible to run sophisticated data mining queries that span domains of biological knowledge.

MouseMine @ MGI
A database of integrated mouse data from MGI, powered by InterMine. MouseMine is member of InterMOD, a consortium of model organism databases dedicated to making cross-species data analysis easier through ongoing coordination and collaborative system development.

Compartmentalized Protein-Protein Interaction
The compartmentalized protein-protein interaction database (ComPPI), provides qualitative information on the interactions, proteins and their localizations integrated from multiple databases for protein-protein interaction network analysis.

The Immunology Database and Analysis Portal - OpenImmport
The ImmPort system serves as a long-term, sustainable archive of immunology research data generated by investigators mainly funded through the NIAID/DAIT. The core component of the ImmPort system is an extensive data warehouse containing an integration of experimental data and clinical trial data. The analytical tools created and integrated as part of the ImmPort system are available to any researcher within ImmPort after registration and approval by DAIT. Additionally, the data provided mainly by NIAID/DAIT funded researchers in ImmPort will be available to all registered users after the appropriate embargo time. ImmPort, is the data submission portal where researchers upload, QC, and curate their data prior to sharing in OpenImmPort.

Colorectal Cancer Atlas
Colorectral Cancer Atlas is an web-based resource which integrates genomic and proteomic pertaining to colorectal cancer cell lines and tissues. Data catalogued includes, quantitative and non-quantitative protein expression, sequence variations, cellular signaling pathways, protein-protein interactions, Gene Ontology terms, protein domains and post-translational modifications (PTMs).

The GeneWeaver data and analytics website is a publically available resource for storing, curating and analyzing sets of genes from heterogeneous data sources. The system enables discovery of relationships among genes, variants, traits, drugs, environments, anatomical structures and diseases implicitly found through gene set intersections. By enumerating the common and distinct biological molecules associated with all subsets of curated or user submitted groups of gene sets and gene networks, GeneWeaver empowers users with the ability to construct data driven descriptions of shared and unique biological processes, diseases and traits within and across species.

SwissLipids is an expert-curated resource that provides a framework for the integration of lipid and lipidomic data with biological knowledge and models. SwissLipids is updated daily.

Ascidian Network for In Situ Expression and Embryological Data
Aniseed is a database designed to offer a representation of ascidian embryonic development at the level of the genome (cis-regulatory sequences, spatial gene expression, protein annotation), of the cell (cell shapes, fate, lineage) or of the whole embryo (anatomy, morphogenesis).

MitoCheck aims to integrate information on cellular function of human genes while giving access to supporting information such as microscopy images of phenotypes

Image Data Repository
IDR is a prototype platform for publishing, mining and integrating bioimaging data at scale, following the Euro-BioImaging/ELIXIR imaging strategy using the OMERO and Bio-Formats open source software built by the Open Microscopy Environment. Deployed on an OpenStack cloud running on the EMBL-EBI’s Embassy resource, it includes image data linked to independent studies from genetic, RNAi, chemical, localisation and geographic high content screens, super-resolution microscopy, and digital pathology.

Banana Genome Hub
The Banana Genome Hub centralises databases of genetic and genomic data for the Musa acuminata crop, and is the official portal for the Musa genome resources.

REGULATOR is a metazoan transcription factor (TF) and maternal factor resource, specifically designed for developmental biology studies. Maternal factors were expressed in unfertilized eggs, and gradually reduced as time goes on. In order to reduce the search space of developmentally important genes, we only focused on those specifically expressed genes in egg stages, whose expression value ≥ 8 times than the mean value of others. There are ~77 metazoan species in the current database. The identification of TFs was based on statistical information similarity of protein features.

Orthologous MAtrix
The OMA (“Orthologous MAtrix”) project is a method and database for the inference of orthologs among complete genomes. The distinctive features of OMA are its broad scope and size, high quality of inferences, feature-rich web interface, availability of data in a wide range of formats and interfaces, and frequent update schedule of two releases per year.

Ensembl Bacteria
Over 30,000 genome sequences from bacteria and archaea have been annotated and deposited in the public archives of the members of the International Nucleotide Sequence Database Collaboration. This site provides access to complete, annotated genomes from bacteria and archaea (present in the European Nucleotide Archive) through the Ensembl graphical user interface (genome browser).

Ensembl Protists
From release 27 release onwards, all protist genomes whose sequence and annotation has been completed and submitted to the the International Nucleotide Sequence Database Collaboration (i.e. the ENA, GenBank and DDBJ databases) are now available in Ensembl Protists. The release now consists of a total of over 150 genomes, of which over 100 have been taken directly from the INSDC archives and the remainder taken from other sources. The new genomes have been functionally annotated with InterPro entries and GO terms using InterPro v53.

Ensembl Plants
A new genome assembly of Triticum aestivum cv. Chinese Spring is now available in Ensembl Plants. The assembly (TGACv1) and it's accompanying annotation was produced by the Earlham Institute, formerly The Centre for Genome Analysis (TGAC), as part of the Triticeae Genomics for Sustainable Agriculture project.

Ensembl Fungi
From release 28 forward, all fungal genomes whose sequence and annotation has been completed and submitted to the the International Nucleotide Sequence Database Collaboration (i.e. the ENA, GenBank and DDBJ databases) is available in Ensembl Fungi. The release now consists of a total of 589 genomes, of which 536 have been taken from the archives and 53 taken directly from other sources.

Ensembl Metazoa
This site provides access to complete, annotated genomes from metazoa through the Ensembl graphical user interface (genome browser).

