Note from FAN.  The CTD database began in 2004. As a matter of interest, as of this date (2-18-19):

Search for “sodium fluoride = 75 results: 8 under “chemicals” and 67 under “references”.

Search for “fluoride” = 406 results; 148 “chemicals”; 246 “references”; 2 “diseases”; 1 “genes”;  9 “Go Terms”.

Overview

CTD [Comparative Toxicogenomics Database] is a robust, publicly available database that aims to advance understanding about how environmental exposures affect human health. It provides manually curated information about chemical–gene/protein interactions, chemical–disease and gene–disease relationships. These data are integrated with functional and pathway data to aid in development of hypotheses about the mechanisms underlying environmentally influenced diseases.

We also have additional ongoing projects involving manual curation of exposome data and chemical–phenotype relationships to help identify pre–disease biomarkers resulting from environmental exposures.

The initial release of CTD was on November 12, 2004. We’re grateful to our strong community support and encourage you to give us feedback so we can continue to evolve with your research needs.

 Support

This program is supported by funds from the National Institute of Environmental Health Sciences (NIEHS):

  • R01ES014065, Comparative Toxicogenomics Database
  • R01ES019604, Generation of a centralized and integrated resource for exposure data
  • R01ES023788, Advancing mechanism–based studies with cross–species chemical–phenotype data
  • R01ES019604-04S1, Integrating Big Data and curated literature to advance discoveries about disease

We’re also proud to be part of the NIEHS Environmental Health Science Center at NC State, the Center for Human Health and the Environment (P30ES025128).

 Data Categories

Chemicals

CTD integrates a chemical subset of the Medical Subject Headings (MeSH®), the hierarchical vocabulary from the U.S. National Library of Medicine. You can view diverse information about chemicals, including chemical structures, curated interacting genes and proteins, curated and inferred disease relationships, and enriched pathways and functional annotations. You can browse chemicals, or use them to formulate gene, chemical–gene interaction, or reference queries.

Diseases

CTD’s MEDIC disease vocabulary is a modified subset of descriptors from the “Diseases” category of the U.S. National Library of Medicine (NLM) Medical Subject Headings (MeSH®), combined with genetic disorders from the Online Mendelian Inheritance in Man® (OMIM®) database. CTD biocurators mapped OMIM diseases to terms within the hierarchical MeSH disease vocabulary to expand our disease representation. This combined vocabulary is used to curate gene–disease and chemical–disease associations. You can browse diseases, or use them to formulate gene or reference queries.

Genes

The CTD cross-species gene vocabulary (symbols, names, and synonyms) is derived from the Gene database at the National Center for Biotechnology Information (NCBI), a division of the U.S. National Library of Medicine. You can view diverse information about genes, including curated interacting chemicals, curated and inferred disease relationships, and associated pathways and functional annotations. You can browse genes, or access them using the Keyword search or by formulating advanced queries.

Phenotypes

At CTD, we distinguish between diseases and phenotypes, wherein a phenotype refers to a non-disease-term biological event: e.g., abnormal cell cycle arrest (phenotype) vs. lung cancer (disease), increased fat cell differentiation (phenotype) vs. obesity (disease), decreased spermatogenesis (phenotype) vs. male infertility (disease), etc. CTD uses the Gene Ontology (GO) as a source of phenotype vocabulary terms for biological outcomes. All GO terms have comprehensive definitions and stable accession identifiers, the latter of which allows GO annotations to act as a nexus to connect, integrate, and harmonize knowledge from domains curated across a variety of databases. You can browse Gene Ontology terms directly, or access them through the Keyword search, or you can perform Chemical-Phenotype Interaction queries.

Chemical–Gene/Protein Interactions

To improve understanding about the mechanisms of chemical actions, we manually curate chemical–gene and –protein interactions in vertebrates and invertebrates from the published literature. These interactions are both direct (e.g., “chemical binds to protein”) and indirect (e.g., “chemical results in increased phosphorylation of a protein” via intermediate events).
We curate interactions using a hierarchical interaction-type vocabulary that characterizes common physical, regulatory, and biochemical interactions between chemicals and genes or proteins. This vocabulary comprises 70 terms including actions (e.g., “binds to”, “imports”), operators that describe the degree of a chemical’s effect (e.g., “increases”), and qualifiers that specify the form of the gene or chemical involved in an interaction (e.g., “protein” or “chemical metabolite,” respectively).
You can search chemical–gene interactions directly via the chemical–gene interaction query, or access them via a gene, chemical, disease, or reference.

Gene–Disease Associations

Gene-disease associations may be inferred via curated chemical-gene and chemical-disease associations. CTD contains curated and inferred gene–disease associations. Curated gene–disease associations are extracted from the published literature by CTD biocurators, or are derived from the OMIM database using the mim2gene file from the NCBI Gene database. Inferred associations (see figure) are established via CTD–curated chemical–gene interactions (e.g., gene A is associated with disease B because gene A has a curated interaction with chemical C, and chemical C has a curated association with disease B). Curated and inferred associations are identified, and help users develop hypotheses about mechanisms underlying environmental diseases.
Inference scores are calculated for all inferred relationships. These scores reflect the degree of similarity between CTD chemical–gene–disease networks and a similar scale-free random network. The higher the score, the more likely the inference network has atypical connectivity. Many biological networks, such as disease and metabolic networks, have been shown to be scale-free random networks.[4] The inference score is calculated as the log-transformed product of two common-neighbor statistics used to assess the functional relationships between proteins in a protein–protein interaction network.[5] The first statistic takes into account the connectivity of the chemical and disease along with the number of genes used to make the inference. The second statistic takes into the account the connectivity of each of the genes used to make the inference.

Chemical–Disease Associations

Chemical-disease associations may be inferred via curated chemical-gene and gene-disease associations. CTD contains curated and inferred chemical–disease associations. Curated chemical–disease associations are extracted from the published literature by CTD biocurators. Inferred associations (see figure) are established via CTD–curated chemical–gene interactions (e.g., chemical A is associated with disease B because chemical A has a curated interaction with gene C, and gene C has a curated association with disease B). Curated and inferred associations are identified, and help users develop hypotheses about mechanisms underlying environmental diseases.

Chemical–Phenotype Interactions

A CTD chemical-phenotype interaction statement includes 8 types of data (C-Q-E-A-T-M-S-P) annotated using 8 controlled vocabularies, including, at a minimum: C, a chemical from the CTD Chemical Vocabulary; Q, a CTD action qualifier that reflects the direction of the interaction (“increases,” “decreases,” or “affects,” when not specified by the authors); E, the entity phenotype from GO; A, an anatomical term from the MeSH “Anatomy [A]” branch; T, an organism from NCBI Taxonomy; M, a CTD method code (in vivo, in vitro); S, the CTD information source code (abstract, full text); and P, the article identifier (PMID) from NCBI PubMed. “Not reported” is allowed for both taxon and anatomy fields if the authors do not provide this information.
Chemical-phenotype content can be accessed using the Keyword Search Box in the upper right hand corner of any CTD page by querying either the “Chemical” or “GO” field (from the drop-down pick-list) with a term-of-interest. A phenotype icon identifies the retrieved matching terms that have chemical-phenotype associated data. Clicking the icon, or going to the “Chemical Interactions” tab on a respective GO page, shows all the curated chemical-phenotype interactions in a tabular web-display. Users can sort the information by clicking on any column header. Any co-mentioned terms (e.g., chemicals, genes, and other phenotypes) are hyperlinked to their respective CTD pages, allowing users to easily traverse the database.

Gene–Gene Interactions

CTD represents gene–gene interactions from BioGRID[6] that consist of genetic and protein interactions curated from primary literature for all major model organisms by BioGRID curators. These interactions are available for each gene and reference, and for the inference networks underlying each chemical–disease association. In addition, you can generate pathways for custom collections of genes using the Set Analyzer tool.

References

CTD contains reference articles related to toxicologically significant vertebrate and invertebrate genes, diseases, and associated chemicals. References were identified by information retrieval methods, and comprise a subset of MEDLINE ®/PubMed®, a database of the U.S. National Library of Medicine.

Organisms

CTD’s hierarchical organism vocabulary consists of the Eumetazoa (vertebrates and invertebrates) branch of the Taxonomy Database from the National Center for Biotechnology Information (NCBI), a division of the U.S. National Library of Medicine. You can browse organisms, or use them to formulate gene, interaction, or reference queries.

Gene Ontology

Gene Ontology (GO) annotations are integrated with gene data in CTD. In addition, GO terms that are statistically enriched among genes/proteins that interact with a chemical are displayed for each chemical. You can browse GO and use it to formulate gene and interaction queries.

Pathways

KEGG and REACTOME pathway data describe known molecular interaction and reaction networks. These data are integrated with chemicals, genes, and diseases in CTD to provide insights into molecular networks that may be affected by chemicals, and possible mechanisms underlying environmental diseases. You can browse pathways, or use them to formulate gene or chemical–gene interaction queries. Pathway information is provided for chemical, gene, and disease detail pages. Pathways that are statistically enriched among genes/proteins that interact with a chemical are displayed for each chemical.

Exposures

CTD is working to enhance the capacity to identify environment–disease connections by developing an Exposure Ontology (ExO) that will be used to curate and present exposure data.

*Online at https://ctdbase.org/