PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 70%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771124
1978327
1979229
1980332
1981739
19821857
1983865
19841176
1985985
1986994
19879103
198822125
198939164
199035199
199143242
199254296
1993179475
1994367842
19953091,151
19963411,492
19975071,999
19986252,624
19998083,432
20009244,356
20019625,318
200210386,356
200314607,816
200420409,856
2005231412,170
2006260314,773
2007283317,606
2008265320,259
2009272822,987
2010273525,722
2011252928,251
2012262530,876
2013278533,661
2014341137,072
2015286339,935
2016314743,082
2017335946,441
2018328649,727
2019353853,265
2020422557,490
2021347160,961
2022432765,288
202392066,208