What is SCOP2?
The Structural Classification of Proteins (SCOP) database was created in the 1990s by mostly manual inspection and by ordering domains of known protein structures according to a hierarchy based on structural and evolutionary relationships (Murzin et al., 1995). A successor of this classification (SCOP2) was designed to provide a more advanced framework for protein structure annotation and classification (Andreeva et al., 2014 and Andreeva et al., 2020).
The legacy SCOP structural classification was last released in 2009 (version 1.75). This browser is based on the SCOP2 classification that is updated at regular intervals. The main levels of organization in this classification include:
- Family groups closely related proteins with evidence for evolutionary origin detectable using sequence comparison methods, e.g. BLAST, PSI- BLAST, HMMER
- Superfamily brings together distantly related protein domains with a probable evolutionary ancestry. These domains may have common structural features, conserved architecture of active or binding sites, or similar modes of oligomerization.
- Fold groups superfamilies on the basis of global structural features, shared by the majority of their members - e.g., composition of the secondary structures in the domain, their architecture, and topology. Although Fold is an attribute of a superfamily, some superfamilies have evolved distinct structural features and can belong to a different fold.
- IUPR (Intrinsically Unstructured Protein Region) organizes superfamilies of proteins or protein regions that do not adopt any specific globular folded structure. Some of these proteins exist in ensembles of different conformations or are unstructured till they are ordered by binding to other macromolecules.
- Classes bring together folds and IUPRs with different secondary structural content. These include all-alpha and all-beta proteins, containing predominantly alpha-helices and beta-strands, respectively, and ‘mixed’ alpha and beta classes (a/b) and (a+b) with respectively alternating and segregated alpha-helices and beta-strands, and the fifth class of small proteins with little or no secondary structures.
- Protein type groups together folds and IUPRs into four groups: soluble, membrane, fibrous and intrinsically disordered. Each of these types correlate with characteristic sequence and structural features.
The SCOP database classifies non-redundant protein domains. Representative proteins are selected based on sequence (UniProtKB) and structure (PDB) for manual SCOP classification. The classification of the representative is then automatically extended to related entries using SIFTS.
Why use the SCOP2 Browser?
Nearly all proteins in the PDB have structural similarities with other proteins, some of which share common evolutionary origins. The classification of proteins in SCOP2 uses the knowledge acquired and the lessons learned from the SCOP. This browser organizes small proteins and non-globular, intrinsically unstructured parts of proteins, too. This provides you opportunities to discover functional and evolutionary relationships between proteins and identify starting models for phasing (in X-ray experiments), for modeling in EM volumes (in EM experiments), for simulations, for hypothesis generation, and/or for experimental design.
How to use the SCOP2 Browser?
There are two ways in which an entry is classified in SCOP2: (a) by structural class (top level classification assigned the IDs (1000000 to 1000004) or (b) by protein type (top level classification assigned the IDs 1-4). Regardless of the type of classification used for browsing (structure class or protein type), each node in the SCOP2 classification can be uniquely identified by a seven digit identifier - the SCOP node identifiers.
The SCOP2 browser allows users to type in a protein name in the search box and select from the options in the autocomplete list. Alternatively, you can enter a SCOP2 unique identifier (SCOP ID) to find structure(s) of interest. Since there are two ways of classifying a protein - the protein name
After locating the individual or protein class of interest in the browser, users can view the number of PDB structures in this group. Clicking on the numbers listed next to the process name will launch a search for the PDB structures that have the SCOPe domain of interest.
The SCOP domain boundaries assigned to PDB and UniProtKB entries can be found in the Sequence tab of the structure summary page of any structure of interest.
- Navigate through the two trees (browsing entry points) and its branches for
- “all alpha proteins” >> “Globin-like” >> “globin-like” and "Globins" OR
- “Globular proteins” >> “Globin-like” >> “Globin-like” and Globins”
- Type Globins in the search box on the top of the page and select from the options "globins 4000551", OR
- Type the SCOP2 ID (4000551) in the search box on the top of the page
- Murzin, A. G., Brenner, S. E., Hubbard, T., Chothia, C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536-540, https://doi.org/10.1006/jmbi.1995.0159
- Andreeva, A., Howorth, D., Chothia, C., Kulesha, E., Murzin, A. G. (2014) SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res. 42: D310–D314, https://doi.org/10.1093/nar/gkt1242
- Andreeva, A. Kulesha, E., Gough, J., Murzin, A. G. (2020). The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures, Nucleic Acids Research, 48, D376–D382, https://doi.org/10.1093/nar/gkz1064