Web Services Overview
All static data that is exposed in rcsb.org is available in the Data API. The schema follows the mmCIF dictionary, extended with annotations coming from external resources. The core PDB data is split up into core objects, one per level of the structural data hierarchy, with entity subdivided into polymeric and non-polymeric subschemas (differing from the mmCIF dictionary). These are some of the core objects:
- core_entry: data that relates to a PDB entry. Identified by a 4 letter pdb_id.
- core_polymer_entity: data for each polymeric molecular entity in a PDB entry. Identified by a
- core_nonpolymer_entity: data for each non-polymeric molecular entity in a PDB entry. Identified by a
- core_assembly: data for each biological assembly in a PDB entry. Identified by a
- core_polymer_entity_instance: an instance of a certain polymeric molecular entity, also known as chain. Identified by a
- core_chem_comp: a chemical component. Identified by a 3-letter chem_comp_id
Both internal additions to the mmCIF dictionary and external resources annotations are prefixed with
rcsb_. In each core object, the
rcsb_<core_object>_container_identifiers field holds the cardinal identifiers for the objects and any parent/child. Additionally every core object contains a single string identifier in field
The data is available via 2 different interfaces:
The REST API permits the retrieval of all data for one core object at a time.
The GraphQL interface offers more flexible data retrieval, essentially making it possible to grab any piece of data from any level of the hierarchy in a single query. To use it programmatically POST your GraphQL queries under the
All output from both REST and GraphQL interfaces is offered in JSON format.
The search API programmatically exposes all search functionality available at rcsb.org. It is possible to perform queries with arbitrary Boolean logic across all data available in the RCSB PDB data API via a convenient JSON-format query language. At the root level it is also possiblen to combine text-based searches (any text/numerical field in the RCSB PDB data API) with protein/nucleotide sequence search (mmseqs2 software) and Structure similarity searches (BioZernike software, described in Guzenko et al 2020). All output from the Search API is offered in JSON format.
The ModelServer is a service for accessing subsets of macromolecular model data. It delivers atomic coordinates together with annotations in the primary data files in a compressed BinaryCIF encoding (BCIF). Structure data can be served at different levels of granularity (e.g., assembly, polymer chain, ligand), and ligand data may also be delivered in popular chemical informatics formats (e.g., SDF, MOL, MOL2).
The specification of the BinaryCIF format can be found at: https://github.com/molstar/BinaryCIF.
The VolumeServer is a service for accessing subsets of volumetric data. It automatically downsamples the data depending on the volume of the requested region to reduce the bandwidth requirements and provide near-instant access to even the largest data sets.
Both ModelServer and VolumeServer are part of Mol* (D. Sehnal, A.S. Rose, J. Kovca, S.K. Burley, S. Velankar (2018) Mol*: Towards a common library and tools for web molecular graphics MolVA/EuroVis Proceedings.doi:10.2312/molva.20181103).
1D Coordinate Server
The RCSB PDB 1D Coordinate Server compiles alignments between structural and sequence databases and integrates protein positional features from multiple resources. Alignment data is available for NCBI RefSeq (including protein and genomic sequences), UniProt and PDB sequences. Protein positional features are integrated from UniProt, CATH, SCOPe and RCSB PDB and collected from the RCSB PDB Data Warehouse.
RCSB PDB Web Services usage is available under the same terms and condition as RCSB PDB Web Portal (see usage policies)
Contact RCSB PDB with questions or suggestions for specific services.