Part of Advances in Neural Information Processing Systems 20 (NIPS 2007)
Ben Blum, David Baker, Michael I. Jordan, Philip Bradley, Rhiju Das, David E Kim
Rosetta is one of the leading algorithms for protein structure prediction today. It is a Monte Carlo energy minimization method requiring many random restarts to find structures with low energy. In this paper we present a resampling technique for structure prediction of small alpha/beta proteins using Rosetta. From an ini- tial round of Rosetta sampling, we learn properties of the energy landscape that guide a subsequent round of sampling toward lower-energy structures. Rather than attempt to fit the full energy landscape, we use feature selection methods—both L1-regularized linear regression and decision trees—to identify structural features that give rise to low energy. We then enrich these structural features in the second sampling round. Results are presented across a benchmark set of nine small al- pha/beta proteins demonstrating that our methods seldom impair, and frequently improve, Rosetta’s performance.