Guide to MetaCyc
This guide provides additional information on the MetaCyc database (DB) beyond that found in other MetaCyc publications [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], to help users of the database understand its contents in more depth. MetaCyc is a member of the BioCyc collection of Pathway/Genome Databases. In contrast to all other members of that collection, which are organism-specific DBs, MetaCyc is a multiorganism DB. The other BioCyc databases describe the metabolic network and genome of a single organism, and mix experimentally determined pathways with computationally predicted pathways. MetaCyc contains experimentally elucidated pathways only; one goal of the MetaCyc project is for MetaCyc to contain a representative example of every experimentally determined metabolic pathway.
MetaCyc does not seek to model the complete metabolism of any particular organism, which is the role of individual BioCyc DBs. Instead, MetaCyc serves as a broad reference on metabolic pathways and enzymes. For example, MetaCyc is a high-quality reference DB for predicting metabolic pathways in other organisms. Scientists typically use MetaCyc to answer metabolic questions that span multiple domains of life, such as “what are all the pathways for arginine degradation in microbes,” or “what cofactor biosynthesis pathways are known in bacteria?” For questions that require information about the complete genome, proteome, or metabolic network of an organism, instead consult the organism-specific PGDB. For example, MetaCyc contains 14 pathways that have been experimentally studied in Staphylococcus aureus, and 36 enzymes that participate in these pathways. In contrast, the BioCyc Staphylococcus aureus RF122 PGDB contains 189 pathways (most of which are computationally predicted), plus the entire genome and proteome of that strain.
MetaCyc is a database of non-redundant, experimentally elucidated metabolic pathways and enzymes. It also contains reactions, chemical compounds, and genes. It stores predominantly qualitative information rather than quantitative data, although it does contain some quantitative data such as enzyme kinetics data. “MetaCyc” is pronounced “met-a-sike”. It sounds like “encyclopedia”. A unique property of MetaCyc is that it is curated[def] from the scientific experimental literature according to an extensive process , such that:
The MetaCyc mission is to serve a broad community of researchers from genetics, molecular biology, microbiology, biochemistry, genomics, bioinformatics, metabolic engineering, and systems biology in support of the following tasks:
MetaCyc also stores metabolites, enzymes, enzyme complexes, and genes associated with these pathways.
MetaCyc is extensively linked to other biological databases  containing protein and nucleic-acid sequence data, bibliographic data and protein structures.
Unlike EcoCyc, MetaCyc provides little genomic data. MetaCyc does contain objects for the genes that encode most of the enzymes within the DB, but MetaCyc contains no sequence data. It does contain links to external sequence databases.
Comparison features combine MetaCyc with other BioCyc databases to provide additional ways for viewing data. Examples for Cross-Species comparisons include:
Additionally, a desktop version of the software provides substantially more powerful capabilities. When installed locally with multiple organism-specific databases, the desktop version enables several powerful capabilities, such as:
The desktop version can be downloaded here.
MetaCyc inter-relates pathway information (including reactions and their substrates) with genes and their protein products. The diagram below depicts the hyperlinks that are typically available within MetaCyc, allowing the user to navigate among pathways, genes, enzymes, etc.
Users are encouraged to link their Web site or application to MetaCyc as described here.
MetaCyc is a collaborative project between SRI International and the Boyce Thomson Institute for Plant Research. Since its beginning in 1998, MetaCyc’s data have been gathered from a variety of literature and on-line sources. A staff of several full-time curators update MetaCyc on an ongoing basis using a literature-based strategy [def] .
MetaCyc is available in several different forms to facilitate different uses of the data:
Curation is the process of manually refining and updating a bioinformatics database. The MetaCyc project uses a literature-based curation approach in which database contents are extracted in a step-wise manner from evidence in the experimental literature, as depicted below.
The curation procedures that MetaCyc curators follow are described in the Curator’s Guide to Pathway/Genome Databases.
MetaCyc data are derived from primary literature, from reviews, and from external databases.
For certain organisms, some of the data within MetaCyc have been directly imported from other databases which we consider to be the authoritative sources of data on those organisms:
Pathways include a mini-review summary that includes:
Other collected data include:
Enzymes include a mini-review summary that covers:
Other collected data:
Compound structures are obtained either from the primary literature or from public compound structure databases such as ChEBI and ChemSpider. The structures are edited using the Marvin software to provide a consistent look and to reflect the most prevalent protonation state at pH 7.3. For more information about protonation, see Reaction Balancing and Protonation State in BioCyc at the Guide to The BioCyc Database Collection.
We would like to express our gratitude to Chemaxon for granting us a free license to their Marvin software.
MetaCyc contains experimentally elucidated metabolic pathways. MetaCyc pathways are labeled with the name of one or more taxonomic groups in which wet-lab experiments have indicated that the pathway is present. These taxonomic designations are present on the pathway page in a line labeled “Some taxa known to possess this pathway include,” andinclude species names, species and strain names, and names of higher taxa such as genus names, e.g., Pseudomonas. When a high-level taxon, such as a genus, is present as a pathway label, the interpretation is that experimental evidence suggests that the pathway is present in all members of that taxon.
The “number of organisms” row in the MetaCyc statistics indicates the total number of different organisms that are listed in the taxonomic designations of all MetaCyc pathways. There is wide variation in how many pathways a given taxon contributes to MetaCyc, with some taxa contributing only a single pathway, and other taxa contributing more than 100 pathways. The taxonomic distribution of MetaCyc pathways is summarized here: [Pathway Taxonomic Distribution] .
To query MetaCyc pathways by species:
MetaCyc pathway pages also specify an “Expected taxonomic range,” which are the taxonomic groups in which this pathway is expected to occur, in contrast to the taxonomic groups in which the pathway has been proven to occur (discussed previously). This information is useful for pathway prediction.
New versions of MetaCyc are released 3–4 times per year.
A detailed history of the enhancements to MetaCyc in each MetaCyc release is available here. This page also contains statistics on the size of MetaCyc over time.
The MetaCyc staff perform the following operations as part of each MetaCyc release:
A common early step in performing pathway analysis of genomes and
metagenomes is to associate protein sequences to MetaCyc reactions.
The Pathway Tools software infers such associations by using EC
numbers, enzyme names, and Gene Ontology terms within protein
Such annotations might be inferred using a variety of
sequence-analysis methods. To aid researchers in associating
sequences to MetaCyc reactions, each release of MetaCyc includes a
file that associates MetaCyc reaction IDs with the UniProt identifiers
of enzymes known to catalyze those reactions. Note that not all
MetaCyc reactions have EC numbers (because not all enzyme-catalyzed
reactions have yet been assigned EC numbers), therefore EC numbers are
not a comprehensive mechanism for associating sequences to reactions.
The file is called
MetaCyc contains links to many other bioinformatics DBs. Some MetaCyc links are “unification links”, meaning that they are links from an object in MetaCyc to an object in another DB that represents the same biological object. Other links are “relationship links”, meaning that they are links from an object in MetaCyc to an object in another DB that represents a related object, such as a link from a MetaCyc reaction to a PIR protein that catalyzes that reaction. Note that not all objects contain links to all of the databases listed here; rather, this list describes the potential links for each object type.
The following types of MetaCyc objects contain links to the following databases.
MetaCyc incorporates information that was obtained from the following sources:
A detailed comparison of KEGG and MetaCyc is published in .
KEGG contains two types of pathways: maps and modules. A KEGG map is typically derived from multiple literature sources, and typically integrates reactions and pathways found in multiple species (it is therefore chimeric) to create a pathway that is not found in its entirety in any one species. The KEGG web site can display KEGG maps with reactions colored to indicate which reactions within that map are predicted to be catalyzed by a given organism. We call these diagrams “species views of pathway maps.” The organism-specific PGDBs within BioCyc correspond to KEGG species views of its pathway maps. KEGG maps are comparable to MetaCyc superpathways in size. However, whereas KEGG maps are rarely if ever found in a single species in their entirety, entire MetaCyc superpathways usually are found in a single species.
KEGG also contains a smaller type of pathway called a module. Modules are similar in size to individual MetaCyc pathways.
We argue that MetaCyc pathways (and KEGG modules) are closer to true biological pathways than are KEGG maps, in part because of the chimeric nature of KEGG maps. KEGG maps are typically 3–4 times larger than are KEGG modules and MetaCyc pathways because MetaCyc pathways attempt to model individual biological pathways from individual organisms. For example, KEGG map MAP00270 called “cysteine and methionine metabolism” combines pathways for the biosynthesis of L-methionine, L-cysteine, L-homocysteine, L-homoserine, ethylene, and methanethiol; for degradation of L-serine, L-cysteine, L-methionine, sulfolactate, S-methyl-5’-thioadenosine, and S-methyl-5-thio-alpha-D-ribose 1-phosphate; for homocysteine and cysteine interconversion; and for methionine salvage.
The smaller pathways in MetaCyc (and KEGG modules) are advantageous for several reasons. First, these smaller pathways correspond more closely to biologically meaningful units — meaningful in the sense that they correspond to a single biological function, they are regulated as a unit, and they tend to be conserved through evolution. For example, a program for predicting the metabolic pathways of an organism could not predict methionine biosynthesis independently of cysteine degradation using KEGG maps (say in an organism that had only one of those pathways), because those two separate processes are fused into one unit in KEGG. Similarly, a program for enrichment analysis of transcriptomics data could not separately detect the enrichment of these two biosynthetic pathways. These issues are discussed in more detail in , which discusses how these different pathway ontologies can affect computational analyses of pathway data. KEGG modules are similar in extent to MetaCyc pathways, but KEGG’s collection of modules is very incomplete because they are a relatively new development in KEGG.
Maps and superpathways are useful in showing how individual pathways connect, and in presenting the larger biochemical context in which a pathway operates. MetaCyc pathways can be displayed at multiple detail levels, such as showing chemical structures for substrates. In addition, all MetaCyc pathway diagrams include chemical names and enzyme names; KEGG module diagrams contain unintelligible identifiers only.
MetaCyc version 21.1 (2017) contained 2,572 pathways, compared to the 316 modules in a KEGG version downloaded in January 2017. MetaCyc 21.1 contained 377 superpathways, compared to the 237 maps found in KEGG. MetaCyc contained 14,347 reactions, compared to 10,411 in KEGG. MetaCyc cited 54,000 articles from which its contents were derived and contained 7,900 textbook-equivalent pages of mini-review summaries for enzymes and pathways; KEGG contains small numbers of citations and of mini-reviews. In addition, MetaCyc records separately the different pathway variants that have been observed in different organisms. For example, MetaCyc contains six different pathway variants for synthesizing L-lysine. KEGG does not identify pathway variants. Within the large maps defined by KEGG, it is impossible for the user to tell which subnetworks correspond to distinct biological units, nor in which species these units have been elucidated experimentally.
MetaCyc curators author extensive mini-review summaries that describe individual pathways and enzymes. KEGG contains short summaries for approximately half of its pathway maps.
MetaCyc pathways are labeled with the name(s) of some of the species in which the presence of those pathways has been experimentally determined, whereas KEGG maps do not state the species in which they were experimentally observed. Pathways in MetaCyc and in other BioCyc PGDBs contain evidence codes that indicate whether experimental or computational evidence supports the presence of the pathway in that organism; KEGG does not use evidence codes.
MetaCyc contains data on enzyme properties for specific enzymes from specific species, such as subunit composition, substrate specificity, cofactor requirements, activators, and inhibitors. KEGG has only cofactor data. However, because those data are associated with KEGG reactions rather than with KEGG enzymes, it is difficult to be sure for which proteins from which species the cofactor requirement was experimentally elucidated.
BioCyc Organism-Specific PGDBs Compared to KEGG Species Views of Pathway Maps
As of April 2018, KEGG contained 5,702 organisms whereas BioCyc version 22.0 (April 2018) contained 13,075 organism databases.
Forty nine of the BioCyc databases are designated as Tier 1 or Tier 2, meaning that have undergone anywhere from person-months of curation (e.g., the Bacillus subtilis and Saccharomyces cerevisiae databases) to person-years of curation (e.g., for Arabidopsis thaliana) to person-decades of curation (e.g., for EcoCyc and MetaCyc). In contrast, KEGG curates only its reference pathway maps and modules; it does not curate organism-specific views of those data; those views are generated computationally. Manually curated databases have a number of advantages including higher accuracy and richer information. Curators remove incorrectly predicted pathways and add pathways that should have been predicted. Curators also add additional information from the literature that cannot be predicted computationally such as mini-review summaries, evidence codes, literature citations, and enzyme properties.
As of 2006, KEGG contained a large and systematic set of errors in the assignment of enzymes to reactions in its species views of reference pathways. These errors were caused by the improper use of partial EC numbers .
Pathway Tools Software Compared to KEGG Software
The Pathway Tools software that underlies MetaCyc and BioCyc is more advanced than the KEGG software in many respects. Pathway Tools can be installed locally at your site, and many of its operations are available via the BioCyc website.
The Biocatalysis/Biodegradation Database was developed by the University of Minnesota and used to be known as UM-BBD. The database contains information on microbial biocatalytic reactions and biodegradation pathways for chemicals largely considered to be potential environmental pollutants.
The data is now hosted at the Swiss Federal Institute of Aquatic Science and Technology and the database is known as the EAWAG Biocatalysis/Biodegradation Database. As of January 18, 2016 the database contained 219 pathways and 993 enzymes from 543 microorganisms. Pathways are curated from the biomedical literature and contain significant comments and literature citations.
Reactome is a curated database of biological processes in humans and other organisms. It covers biological pathways ranging from the basic processes of metabolism to high-level processes such as hormonal signaling. Reactome information is curated form the literature, and includes significant comments and literature citations. Reactome contains far fewer metabolic pathways than does MetaCyc, and because most Reactome pathways are curated based on human biology, Reactome does not have the taxonomic breadth of MetaCyc, although Reactome pathways have been computationally projected to a number of model organisms.
This section summarizes the many past and present contributors to the MetaCyc project.
Roles: Curation of non plant-related pathways, software development, Website operations
Role: Curation of plant-related and fungal pathways
We at MetaCyc would like to incorporate pathways created by other scientists into the database.
If you are a Pathway Tools user and have created a pathway that fits our criteria, why not send it to us? If we end up including it in MetaCyc, we will credit your contribution in the MetaCyc release notes, and if you wish, your name and your institution will appear on the pathway page. In addition, by submitting pathways to MetaCyc you increase the power of the PathoLogic metabolic-pathway prediction software. PathoLogic recognizes MetaCyc pathways in genome sequence data, and is now in use by more than 100 groups worldwide.
If you would like to submit a pathway for inclusion in a future release of MetaCyc, please make sure that you curate the pathway following these guidelines:
For examples of pathways that have been curated based on these guidelines, please see:
Further information can be found in the Curator’s Guide for Pathway/Genome Databases.
Pathway Tools includes an author crediting system that can attach author and organization credentials to individual pathways. We recommend that prior to creating new objects in the PGDB you should create an Organization frame for your institute and an Author frame for yourself. This way, items that you create afterwards will be associated with these frames, providing you with the credit that you deserve. This credit information would be retained upon exporting the pathways and importing them into MetaCyc. It is also possible to add credit information to older pathways that were created prior to the creation of your author frame, through the Pathway Info Editor.
Detailed instructions on how to create organization and author frames are found in the user manual, in the section ’Creating author frames’.
Pathways should be exported into a text file, which can be emailed to us at: . The procedure for exporting a pathways is:
Please indicate if you would like your name and/or affiliation to appear on the pathway and enzyme pages.
If you use MetaCyc in your research, we ask that you cite the following publication:
[MetaCyc14] Caspi, R. Altman, T. Billington. R, Dreher. K, Foerster. H, Fulcher. CA, Holland. TA, Keseler. IM, Kothari. A, Kubo. A, Krummenacker. M, Latendresse. M, Mueller. LA, Ong. Q, Paley. S, Subhraveti. P, Weaver. DS, Weerasinghe. D, Zhang P, and Karp, P.D.(2014)
[MetaCyc13] T, Altman., M, Travers., A, Kothari., R. Caspi., and P.D. Karp.
A systematic comparison of the MetaCyc and KEGG pathway
[MetaCyc12] Caspi, R., Altman, T., Dreher, K., Fulcher, CA., Subhraveti, P., Keseler, IM., Kothari, A., Krummenacker, M., Latendresse, M.,
Mueller, LA., Ong, Q., Paley, S., Pujar, A., Shearer, AG., Travers, M., Weerasinghe, D., Zhang, P., and Karp, P.D. (2012)
[MetaCyc11] Karp, PD, and Caspi, R.,
A survey of metabolic databases emphasizing the MetaCyc family
[MetaCyc10] Caspi, R., Altman, T., Dale, J.M., Dreher, K., Fulcher, C.A., Gilham, F., Kaipa, P., Karthikeyan, A.S., Kothari, A., Krummenacker, M., Latendresse, M., Mueller, L.A., Paley, S., Popescu, L., Pujar, A., Shearer, A., Zhang, P. and Karp, P.D. (2010)
[MetaCyc08] Caspi, R., Foerster, H., Fulcher, C.A., Kaipa, P., Krummenacker, M.,
Latendresse, M., Paley, S., Rhee, S.Y., Shearer, A., Tissier, C.,
Walk, T.C., Zhang, P. and Karp, P.D. (2008)
[MetaCyc06] Caspi, R., Foerster, H., Fulcher, C.A., Hopkinson, R.,
Ingraham, J., Kaipa, P., Krummenacker, M., Paley, S., Pick, J.,
Rhee, S.Y., Tissier, C., Zhang, P. and Karp, P.D. (2006)
Krieger, C.J., Zhang, P., Mueller, L.A., Wang, A., Paley, S.,
Arnaud, M., Pick, J., Rhee, S.Y., and Karp, P.D. (2004)
Karp, P.D. (2003)
Karp, P.D., Riley, M., Paley, S. and Pellegrini-Toole, A. (2002)
[MetaCyc00] Karp, P.D., Riley, M., Saier, M., Paulsen, I.T., Paley, S., and Pellegrini-Toole, A. (2000)
See also the BioCyc Publications Page.
©2018 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493