Simulating proteins on a millisecond time-scale – Bioinformatics Centre - University of Copenhagen

Simulating proteins on a millisecond time-scale

Details of the grant

This project is funded by the strategiske forskningsråds programkomite for nanovidenskab og -teknologi, bioteknologi og IT between Sept. 2007 and Feb. 2011 with 7.8 million DKK. The grant's PI is Prof. Anders Krogh. The daily leadership of the project is in the hands of Assoc. Prof. Thomas Hamelryck. The project is in close collaboration with Novozymes A/S. Key scientific collaborations include Assoc. Prof. Jesper Ferkinghoff-Borg, DTU and Prof. Kanti Mardia, University of Leeds, UK


The focus of the project is the exploration of the dynamics behaviour of proteins beyond what is currently possible. Molecular dynamics simulations are widely used in science, medicine and biotechnology to obtain a detailed view of the motions in biological macromolecules. Applications include understanding protein folding, drug design, increasing protein stability, identification of mutations that underly disease, improving the properties of enzymes and understanding diseases that involve protein misfolding such as Alzheimer's and type 2 diabetes. However, the use of molecular dynamics is severely hampered by several problems: the method requires huge amounts of computer time and suffers from several inherent limitations. Many relevant biological processes take place on time scales between ten milliseconds and one second, which is totally out of reach for conventional molecular dynamics simulations. 

The structure group in KU's bioinformatics center has developed a revolutionary approach to the prediction, design and simulation of protein structure and dynamics, based on the use of probabilistic models, Bayesian machine learning methods and directional statistics, which form the first cornerstone of the project. For more information on our statistical approach to protein structure prediction, see our articles on probabilistic models of protein structure that appeared in PLoS computational Biology (2006) and PNAS (2008), and the review on probabilistic methods in structural bioinformatics (2009).

The second cornerstone is the use of efficient methods to explore the conformational space of proteins. Instead of classic molecular dynamics methods, we make use of Markov chain Monte Carlo methods (MCMC) that are based on sampling. For this, efficient MCMC methods and conformational samplng methods are necessary. This part of the project is led by and done in collaboration with Assoc. Prof. Jesper Ferkinghoff-Borg, DTU .  

Published research highlights

  • Our probabilistic model of side chain conformations, Basilisk (BMC Bioinformatics, 2010), abolishes the need for the use of discrete side chain rotamers in conformational sampling.
  • Our dynamic Bayesian network toolkit Mocapy++ (BMC Bioinformatics, 2010) - which was used to formulate these models -  is freely available from SourceForge.
  • Our probabilistic model of protein structure can now be used for structure determination using small angle X-ray scattering (SAXS) data (BMC Bioinformatics, 2010).
  • After 20 years of controversy and countless publications on the subject, we finally settle the discussion on the validity of so-called potentials of mean force as proposed by Sippl in 1990. Moreover, our results point to important new applications and solve the classic problem of the reference state. See our recent article in PLoS ONE (2010).
  • We developed TYPHON, a method to investigate protein dynamics using probabilistic models (Structure, 2012). This method is an attractive, computationally efficient replacement for molecular dynamics simulations for a wide range of problems. The method is part of PHAISTOS, our software framework for proten structure prediction and simulation.


Group leader


  • Mikael Borg (Jan. 2008-April 2010)
  • Martin Paluszewski (Sept. 2008-July 2010)
  • Wouter Boomsma (July 2008-June 2009)

PhD students


  • Boomsma, W., Borg, M., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Krogh, A., Mardia, KV. and Hamelryck, T. (2008) PHAISTOS: protein structure prediction using a probabilistic model of local structure.  Proceedings of CASP8, Cagliari, Sardinia, Italy, December 3-7 2008. pp 82-83. PDF@CASP8
  • Hamelryck, T. (2009) Probabilistic models and machine learning in structural bioinformatics. Statistical Methods in Medical Research, Review. 18, 505-526.  PDF@SMMR
  • Borg, M., Mardia, KV., Boomsma, W., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Røgen, P., Hamelryck, T. A probabilistic approach to protein structure prediction: PHAISTOS in CASP9. LASR 2009 - Statistical tools for challenges in bioinformatics, pp. 65-70. Leeds university press, Leeds, UK. Free PDF@LASR 2009
  • Paluszewski, M., Hamelryck, T. (2010) Mocapy++ - A toolkit for inference and learning in dynamic Bayesian networks. BMC Bioinformatics, 11:126. Free PDF@BMC 
  • Harder, T., Boomsma, W., Paluszewski, M., Frellsen, J., Johansson, KE., Hamelryck, T. (2010) Beyond rotamers: A generative , probabilistic model of side chains in proteins. BMC Bioinformatics, 11:306. Free PDF@BMC
  • Paulsen, J., Paluszewski, M., Mardia, KV., Hamelryck, T. (2010) A probabilistic model of hydrogen bond geometry in proteins. LASR 2010 - High-throughput sequencing, proteins and statistics, pp. 61-64. Leeds university press, Leeds, UK. PDF@LASR
  • Stovgaard, K., Andreetta, C., Ferkinghoff-Borg, J., Hamelryck, T. (2010) Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models. BMC Bioinformatics, 11:429.  PDF@BMC Bioinformatics
  • Hamelryck, T., Borg, M., Paluszewski, M., Paulsen, J.,  Frellsen, J., Andreetta, C., Boomsma, W. Bottaro, S., Ferkinghoff-Borg, J. (2010) Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS ONE, 5(11): e13714. PDF@PLoS ONE , Preprint@arXiv
  • Mardia, KV.,  Frellsen, J.,  Borg, M.,  Ferkinghoff-Borg, J., Hamelryck, T. A statistical view on the reference ratio method, LASR 2011 - High-throughput sequencing, proteins and statistics, pp. 55-61. Leeds university press, Leeds, UK. PDF@LASR
  • Olsson, S., Boomsma, W., Frellsen, J., Bottaro, S., Harder, T., Ferkinghoff-Borg, J., Hamelryck, T. (2011) Generative probabilistic models extend the scope of inferential structure determination. J. Magn. Reson. 213(1), 182-6. PDF
  • Harder, T., Borg, M., Boomsma, W., Røgen,  P., Hamelryck, T. (2012) Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics. 28(4), 510-515. PDF@Bioinformatics.
  • Bottaro, S., Boomsma, W., Johansson, K.E., Andreetta, C., Hamelryck, T., Ferkinghoff-Borg, J. (2012) Subtle Monte Carlo updates in dense molecular systems. J. Chem. Theory Comput., 8, 695–702. PDF@ACS
  • Boomsma, W., Bottaro, S., Hamelryck, T., Frellsen, J., Andreetta, C., Borg, M., Harder, T., Johansson, KE., Stovgaard, S., Tian, P. (2012) Phaistos user manual (version 1.0). University of Copenhagen. PDF@SourceForge
  • Paluszewski, M., Frellsen, J., Hamelryck, T.  (2009) Mocapy++: A C++ toolkit for inference and learning in dynamic Bayesian networks. University of Copenhagen.  PDF
  • Hamelryck, T., Mardia, KV., Ferkinghoff-Borg, J., Editors. (2012) Bayesian methods in structural bioinformatics. Book in the Springer series "Statistics for biology and health", 385 pages, 13 chapters. Springer Verlag, March, 2012. Book description at Springer.
  • Hamelryck, T. (2012) An overview of Bayesian inference and graphical models.  In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
  • Borg, M., Hamelryck, T. Ferkinghoff-Borg, J. (2012) On the physical relevance and statistical interpretation of knowledge based potentials.  In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
  • Frellsen, J., Mardia, KV., Borg, M., Ferkinghoff-Borg, J., Hamelryck, T. (2012) Towards a probabilistic model of protein structure: The reference ratio method. In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
  • Boomsma, W., Frellsen, J., Hamelryck, T. (2012) Probabilistic models of local biomolecular structure and their applications. In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
  • Harder, T., Borg, M., Bottaro, S., Boomsma, W.,  Olsson, S., Ferkinghoff-Borg, J., Hamelryck, T. (2012)  An efficient null model for conformational fluctuations in proteins. Structure. Accepted and published online.

Publications in preparation

  • Muninn: a C++ toolkit for generalized ensemble Markov chain Monte Carlo sampling. Frellsen, J. et al.
  • Modeling flexible multidomain proteins using SAXS data. Andreetta, C., Stovgaard, K, et al.
  • PHAISTOS: a software framework for simulation and prediction of protein structure and dyamics. Boomsma et al.


All software released under this project is open source, released under the GPL license through SourceForge .

  • Mocapy++: a toolkit implemented for training and using dynamic Bayesian networks, with special facilities for the formulation of probabilistic models of protein structure.
  • Phaistos: a molecular modelling toolkit implemented in C++, containing probabilistic models of proteins structure, several force fields, a generalized ensemble MCMC method, and a highly efficient conformational sampling method.
  • Muninn : a generalized ensemble MCMC method, implemented in C++.