Adoption of BioCyc Databases
The resources available for SRI for curation of databases are limited. By creating a large collection of tier 3 PGDBs we provide a useful first step. However, turning these computationally-generated databases into resources that accurately reflect the depth and breadth of knowledge that the biomedical research community is producing requires harnessing the expertise and effort of the scientific community. We encourage scientists to adopt and curate individual PGDBs within the BioCyc collection, bringing these databases to a high level of curation.
All of the Tier 3 BioCyc PGDBs, and some of the Tier 2 PGDBs, are available for adoption. The DBs are available under an open license agreement, meaning that they are freely available to all, they may be modified, and they may be freely redistributed.
The Pathway Tools software system on which BioCyc is based provides a package of graphical editing tools that allow biologists to create and update database entities such as genes, proteins, metabolic pathways, metabolites, operons, and transcriptional regulatory interactions.
To Initiate the adoption process, contact biocyc-support at ai dot sri dot com.This page summarizes steps you can take after adopting a BioCyc Pathway/Genome Database (PGDB).
1. Obtain and Install Pathway Tools
Obtain SRI's Pathway Tools software by executing the Pathway Tools software license agreement.
Install Pathway Tools on a computer at your site. Note that versions exist for Linux, Microsoft Windows, and Apple OS X.
2. Install the Database within Pathway Tools
All of our databases are available for download via a facility called the PGDB Registry, which is invoked from within the Pathway Tools application. Once you install Pathway Tools, download your adopted PGDB to your local computer. Detailed information about downloading databases via the Registry is provided in the Pathway Tools User Guide, section 6.1.
3. Refine the Database
If the PGDB is based on an old annotation, and a new one is available, it is highly recommended to start by updating the annotation.
Keep in mind that SRI used the automated portions of the PathoLogic component of Pathway Tools to create the PGDB that you adopted. SRI did not execute aspects of PathoLogic that require manual intervention. These manual steps are listed in Chapter 6.4, "Refining the PGDB" of the Pathway Tools User's Guide.
Recommended operations include:
- Propagate updates from MetaCyc (command is found under the Tools menu)
- Re-run the name matcher (a PathoLogic Refine procedure)
- Assign probable enzymes (a PathoLogic Refine procedure)
- Create protein complexes (a PathoLogic Refine procedure)
- Rescore pathways (a PathoLogic Refine procedure)
- Rerun the Transport Inference Parser (a PathoLogic Refine procedure)
- Rerun the Pathway Hole Filler (although it was run by SRI, it is a good idea to run it again to take advantage of new enzyme/reaction assignments made above)
- Run the Consistency Checker (command is found under the Tools menu)
- Update the cellular overviews (a PathoLogic Refine procedure)
4. Refine the Database by Literature-Based Curation
Additional refinement of the PGDB can occur by incorporating information from the experimental literature for the organism into the PGDB. For example, you could add to the PGDB metabolic pathways that were not predicted by PathoLogic. You could add new gene functions that were recently reported in the literature. You can author mini-review comments summarizing experimental information about proteins and pathways, along with literature citations, as is done for EcoCyc [example mini-review in EcoCyc].You could add promoters and transcription-factor binding sites, and define interactions between those sites and the transcription factors that bind to them to describe gene regulatory networks. Use of the Pathway Tools Editors is described in Chapter 7 of the Pathway Tools User's Guide. You also may want to refer to the curation guidelines used by SRI in curating its PGDBs, such as EcoCyc.
Curation of PGDBs is a very big job. Enlist colleagues who are specialists in different aspects of the organisms biology to help. A group of collaborating curators can all run Pathway Tools on separate computers to update a PGDB stored in a commonly accessed relational database manager such as MySQL.
Dedicated funding for a PGDB is important to provide significant resources over an extended period. Many funding government agencies are supporting organism-specific database curation projects.
5. Submit the Database to BioCyc
Once your adopted PGDB is improved significantly, you can send it to SRI for inclusion in BioCyc, again via the PGDB Registry. Including your PGDB in the Registry will also make it available for instant download for all Pathway Tools users.Including your PGDB in the Registry involves a registration procedure within a server at SRI. The PGDB remains on your FTP site, but other Pathway Tools users would be able to see it while browsing the catalog of available PGDBs from their installation of Pathway Tools, and will be able to download download it directly from your FTP server. For more information, see "Chapter 4: Database Sharing" in the Pathway Tools User's Guide.