Remembering John Westbrook on the Anniversary of His Passing
John D. Westbrook Jr. (1957-2021), Research Professor at Rutgers University and Data & Software Architect Lead for the RCSB PDB, passed away on October 18, 2021.
He was incredibly beloved and respected by his colleagues at Rutgers and throughout the world, known for his dry wit and endless enthusiasm for thinking about all aspects of data and data management.
John had a long and highly successful career developing ontologies, tools, and infrastructure in data acquisition, validation, standardization, and mining in the structural biology and life science domains. His work established the PDBx/mmCIF data dictionary and format as the foundation of the modern Protein Data Bank (PDB) archive (wwPDB.org).
More than twenty-five years ago, while still a graduate student, John recognized the importance of a well-defined data model for ensuring delivery of high quality and reliable structural information to data users. He was the principal architect of the mmCIF data representation for biological macromolecular data. Based on a simple, context-free grammar (without column width constraints), data are presented in either key-value or tabular form. All relationships between common data items (e.g., atom and residue identifiers) are explicitly documented within the PDBx Exchange Dictionary (mmcif.wwpdb.org). Use of the PDBx/mmCIF format enables software applications to evaluate and validate referential integrity within any PDB entry. A key strength of the mmCIF technology is the extensibility afforded by its rich collection of software-accessible metadata.
The current PDBx/mmCIF dictionary contains more than 6,200 definitions relating to experiments involved in macromolecular structure determination and descriptions of the structures themselves. The first implementation of this schema was used for the Nucleic Acid Database, a data resource of nucleic acid-containing X-ray crystallographic structures. Today, this dictionary underpins all data management of the PDB. Since 2014, it has served as the Master Format for the PDB archive. It also forms the basis of the Chemical Component Dictionary (wwpdb.org/data/ccd), which is used to maintain and distribute small molecule chemical reference data in the PDB.
In 2011, the Worldwide Protein Data Bank (wwPDB) PDBx/mmCIF Working Group was established to enable direct use of PDBx/mmCIF format files within major macromolecular crystallography software tools and to provide recommendations on format extensions required for deposition of larger macromolecule structures to the PDB. This was a key step in the evolution of the PDB archive, which enabled studies of macromolecular machines, such as the ribosome, as single PDB structures (instead of split entries with atomic coordinates distributed among different entry files). In 2019, mandatory submission of PDBx/mmCIF format files for deposition was announced (Adams et al. Acta Crystallographica D75, 451-454).
To ensure the success of the PDBx/mmCIF dictionary and format, John worked with a wide range of community experts to extend the framework to encompass descriptions of macromolecular X-ray crystallographic experiments, 3D cryo-electron microscopy experiments, NMR spectroscopy experiments, protein and nucleic acid structural features, diffraction image data, and protein production and crystallization protocols. Most recently, these efforts have been focused on developing compatible data representations for X-ray free electron (XFEL) methods, and for integrative or hybrid methods (I/HM). I/HM structures, currently stored in the prototype PDB-Dev archive (pdb-dev.wwpdb.org), presented new challenges for data exchange among rapidly evolving and heterogeneous experimental repositories. Proper management of I/HM structures in PDB-Dev also required extension of the PDBx/mmCIF data dictionary to include coarse-grained or multiscale models, which will be essential for studying macromolecular structures in situ using cryo-electron tomography and other bioimaging methods.
John contributed broadly to community data standards enabling interoperation and data integration within the biology and structural biology domains. His efforts have included (i) describing the increasing molecular complexity of macromolecular structure data, (ii) representing new experimental methodologies, including I/HM techniques, and (iii) expanding the biological context required to facilitate broader integration with a spectrum of biomedical resources. John’s work has been central to connecting crystallographic and related structural data for biological macromolecules to key resources across scientific disciplines. His efforts have been described in more than 120 peer-reviewed publications, one of which has been cited more than 21,000 times according to the Web of Science (Berman et al. Nucleic Acids Research 28, 235-242). Eight of his most influential published papers have appeared in the International Tables of Crystallography.
John has also done yeoman service to the crystallographic community over many years and was recognized with the inaugural Biocuration Career Award from the International Society for Biocuration in 2016.
For the International Union of Crystallography, John served on the Commission for Maintenance of CIF Standard (COMCIFS), the Working Group on Data Diffraction Deposition (DDDWG), and the Committee on Data (CommDat). He also served as an Associate Editor for Acta Crystallographica Section F.
John was a long-standing member of the American Crystallographic Association, and served on the Data, Standards & Computing Committee. He also served on the Metadata Interest Group for the Research Data Alliance.
John is survived by his wife, Bonnie J. Wagner-Westbrook, Ed.D. and his devoted Mother-in-Law, Joan N. Wagner of Clinton Twp., NJ; many cousins including Chandler Turner (of Portsmouth, VA), Ann (Turner) Heyes (of Tasmania, Australia) and Louise (Turner) Brown (of Oakland CA).
John D. Westbrook Jr (1957–2021) Acta Cryst (2021) D77: 1475-1476 doi: 10.1107/S2059798321011402
- RCSB Protein Data Bank: Celebrating 50 years of the PDB with new tools for understanding and visualizing biological macromolecules in 3D (2022) Protein Science 31: 187-208 doi: 10.1002/pro.4213
- Collecting Experiments. Making Big Data Biology. Helliwell, J. R. (2022). J. Appl. Cryst. 55: 211-214 doi: 10.1107/S1600576721012140
- PDBx/mmCIF Ecosystem: Foundational Semantic Tools for Structural Biology (2022) Journal of Molecular Biology 434: 167599 doi: 10.1016/j.jmb.2022.167599
- RCSB Protein Data Bank: improved annotation, search and visualization of membrane protein structures archived in the PDB (2022) Bioinformatics 38: 1452-1454 doi: 10.1093/bioinformatics/btab813