Coming July 29: Improved Carbohydrate Data at the PDB
PDB data will incorporate a new data representation for carbohydrates in PDB entries and reference data that improves the Findability and Interoperability of these molecules in macromolecular structures. In order to remediate and improve the representation of carbohydrates across the archive, the wwPDB has:
- standardized Chemical Component Dictionary nomenclature following IUPAC-IUBMB recommendations
- provided uniform representation for oligosaccharides
- adopted Glycoscience-community commonly used linear descriptors using community tools
- annotated glycosylation sites in PDB structures
Starting July 29, 2020, users will be able to access the improved data via FTP or wwPDB partner websites. Developers of software packages that produce, access, or visualize PDB data are encouraged to review this information and adapt their software as soon as possible, as originally highlighted in the February 2020 announcement. Detailed information about this project is available at the wwPDB website; lists of impacted entries and chemical components will be published on this page after data release.
The wwPDB has created a new ‘branched’ entity representation for polysaccharides, describing all the individual monosaccharide components of these in the PDB entry. As part of this process, we have standardized atom nomenclature of >1,000 monosaccharides in the Chemical Component Dictionary (CCD) and applied a branched entity representation to oligosaccharides for >8000 PDB entries. To guarantee unambiguous chemical description of oligosaccharides in the affected PDB entries, an explicit description of covalent linkage information between their monosaccharide units is included. In addition, wwPDB validation reports provide consistent representation for these oligosaccharides and include 2D representations based on the Symbol Nomenclature for Glycans (SNFG).
To support the remediation of carbohydrate representation, software tools providing linear descriptors were developed in collaboration with the glycoscience community to enable easy translation of PDB data to other representations commonly used by glycobiologists. These include Condense IUPAC from GMML at University of Georgia, WURCS from PDB2Glycan at The Noguchi Institute, Japan, and LINUCS from pdb-care at Germany.
Furthermore, to ensure continued Findability of 118 common oligosaccharides (e.g., sucrose, Lewis Y antigen), we have expanded the Biologically Interesting molecule Reference Dictionary (BIRD) that contains the covalent linkage information and common synonyms for such molecules.
wwPDB has also used this opportunity to improve the organization of chemical synonyms in the CCD by introducing a new _pdbx_chem_comp_synonyms data category. This will enable more comprehensive capture of alternative names for small molecules in the PDB. To minimize disruption to users, the legacy data item, _chem_comp.pdbx_synonyms, will be retained for a transition period through 2021.
The carbohydrate remediation project is a wwPDB collaborative project that is carried out principally by RCSB PDB at Rutgers, The State University of New Jersey and is funded by NIGMS grant U01 CA221216 in collaboration with Complex Carbohydrate Research Center at the University of Georgia.