Science for Health
09 September 2009
As the number of protein structures that have been determined has increased, the occurrence of a novel fold for a globular protein of reasonable size (over 100 residues) has become an increasingly rare event. This has led to speculation that we are close to having a structural representation for every basic protein fold, with the implication that all natural protein sequences can be constructed from known structures through the assembly of domains taken from the current collection. This position may have been reached either because there is, in principle, only a limited number of possible protein structures or by historical accident in which nature has been "lazy" and restricted herself to the reuse of a limited set of protein folds. Starting from a few basic folds, the known evolutionary mechanisms of gene duplication, fusion, and deletion could have generated the current variety of protein structures.
In order to investigate the possibility that there is a theoretical constraint on the number of possible folds, Willie Taylor (pictured) and his group, from NIMR's Division of Mathematical Biology, in collaboration with colleagues at the University of Bergen, has used a protein structure prediction method to generate a variety of folds as models. When all the models were compared to a nonredundant set of all known structures, only one-in-ten were found to have a match. This large excess of novel folds was associated with each protein probe and if true in general, implies that the space of possible folds is larger than the space of realized folds, in much the same way that sequence-space is larger than fold-space. The large excess of novel folds exhibited no unusual properties and have been referred to as the "dark matter" of protein fold space.
In this study we have merely probed at a few points into the "dark matter" of protein fold space, but this has been sufficient to show that there is a plethora of unseen novel folds. For smaller proteins it has been shown previously that fold space will be more completely covered by known folds but for larger proteins the same analysis suggests that the number of unexplored topologies will expand greatly. As protein size increases, our estimate of a 10-fold ratio of known to novel folds may well be very conservative. If so, it would seem likely that there would not be room in the combined genomes of life on Earth to hold such a variety of proteins; however, the universe is very big and, like dark matter, the bulk might exist elsewhere.
Willie Taylor
The research findings are published in full in:
William R. Taylor, Vijayalakshmi Chelliah, Siv Midtun Hollup, James T. MacDonald and Inge Jonassen (2009).
Probing the "dark matter" of protein fold space
Structure, 17(9), 1244-52 . Publisher abstract
© MRC National Institute for Medical Research
The Ridgeway, Mill Hill, London NW7 1AA
Top of page