PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 90%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771225
1978328
1979533
1980336
1981844
19821862
19831072
19841183
19851093
19869102
198710112
198823135
198945180
199043223
199149272
199265337
1993215552
1994434986
19953461,332
19963841,716
19975512,267
19987312,998
19999173,915
200010234,938
200110706,008
200211337,141
200315978,738
2004220110,939
2005249613,435
2006281616,251
2007317719,428
2008291422,342
2009302525,367
2010304428,411
2011286531,276
2012298934,265
2013323637,501
2014401941,520
2015336244,882
2016377748,659
2017416952,828
2018387756,705
2019428560,990
2020519966,189
2021453770,726
2022552776,253
2023116677,419