Structural bioinformatics – Bioinformatics Centre - University of Copenhagen

Forward this page to a friend Resize Print Bookmark and Share

Bioinformatics Centre > Research > Structural bioinformatics

PLoS Cover 2006

One of the major unsolved problems in modern day molecular biology is the protein folding problem: given an amino acid sequence, predict the overall three-dimensional structure of the corresponding protein. It has been known since the seminal work of Christian B. Anfinsen in the early seventies that the sequence of a protein encodes its structure, but the exact details of the encoding still remain elusive. Since the protein folding problem is of enormous practical, theoretical and medical importance - and in addition forms a fascinating intellectual challenge - it is often called the holy grail of bioinformatics. The Structural Bioinformatics group focuses on protein structure prediction, protein design and protein structure determination from experimental data (NMR, SAXS), including data obtained from protein ensembles

We are tackling the protein structure prediction problem from an original angle. Our group develops sophisticated probabilistic models that describe various aspects of protein structure, and uses these models in prediction, design and structure determination. We also extended our statistical approach to RNA 3D structure. These probabilistic models are mainly based on three key ingredients:

  1. Graphical models (including dynamic Bayesian networks), which are powerful machine learning methods that can be interpreted in the language of statistical physics
  2. Directional statistics, the statistics of angles, directions and orientations. When combined with graphical models, this allows the formulation of efficient and flexible probabilistic models of protein structure on a local lenghth scale. These models are statistically valid alternatives to the use of protein fragment libraries.
  3. The reference ratio method: this method is a statistical reformulation of the so-called "knowledge based potentials of mean force" that are widely used in protein structure prediction. This method allows us to combine probabilistic models of local and nonlocal structure correctly. From a statistical point of view, the method is a formulation of probability kinematics or Jeffrey's conditioning.

Graphical models, directional statics and probability kinematics combined allow, for the first time, the formulation of valid probabilistic models of protein structure in continuous space, and with atomic detail (Valentin et al., 2013). The use of probability kinematics also allows the rigorous Bayesian inference of protein ensembles (Olsson et al., 2013).

Our probabilistic view on protein structure prediction, simulation and inference is expounded in the recently published book entitled "Bayesian methods in structural bioinformatics" (Springer, April, 2012). The above innovations are available in PHAISTOS version 1.0, our Markov chain Monte Carlo software framework for protein structure simulation.

Research highlights


RNA sample

People

Group leader

  • Thomas Hamelryck,
  • Associate professor, KU, Denmark
  • Visting professor, University of Leeds, UK
  • Address:
    Bioinformatics center, Department of Biology
    University of Copenhagen
    Ole Maaloes Vej 5
    DK-2200 Copenhagen N
    Denmark
    Tel: +45 23960613

Postdocs

  •  None at the moment.

    PhD students

    • Lubo Antonov
    • Jesper Foldager

    Former members

    News

    Funding

    Teaching

    Publications

    Peer reviewed articles (2005-now)

    1. Hamelryck T. (2005) An amino acid has two sides: A new 2D measure provides a different view of solvent exposure. Proteins Struct. Func. Bioinf. 59, 38-48. PDF
    2. Boomsma, W., Hamelryck, T. (2005) Full Cyclic Coordinate Descent: Solving the protein loop closure problem in Calpha space, BMC Bioinformatics 6:159 Abstract&PDF@BioMed
    3. Hamelryck, T., Kent, J., Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS Comp. Biol. 2(9): e131 PDF@PLoS
    4. Paluszewski, M., Hamelryck, T. and Winter, P. (2006) Reconstructing protein structure from solvent exposure using Tabu Search. Algorithms Mol. Biol. 1:20. PDF@AlgMolBiol.
    5. Won, KJ., Hamelryck, T., Prugel-Bennett, A. and Krogh, A. (2007) An evolving method for learning HMM Structure: prediction of protein secondary structure. BMC Bioinformatics 8, 357 PDF@BMC Bioinformatics
    6. Boomsma, W., Mardia, KV., Taylor, CC., Ferkinghoff-Borg, J., Krogh, A. and Hamelryck, T. (2008) A generative, probabilistic model of local protein structure. Proc. Natl. Acad. Sci. USA 105, 8932-8937  PDF@PNAS, Video lecture by Wouter Boomsma
    7. Hamelryck, T. (2009) Probabilistic models and machine learning in structural bioinformatics. Statistical Methods in Medical Research Review. 18, 505-526.  PDF
    8. Cock, P., Antao, T., Chang, J., Chapman, B., Cox, C., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11),1422-1423. Free PDF@Bioinformatics
    9. Frellsen, J., Moltke, I., Thiim, M., Mardia, KV., Ferkinghoff-Borg, J., Hamelryck, T. (2009) A probabilistic  model of RNA conformational space. PLoS Comp. Biol. 5(6), e1000406 Free PDF@PLOS, Video of a presentation by Jes Frellsen
    10. Paluszewski, M., Hamelryck, T. (2010) Mocapy++ - A toolkit for inference and learning in dynamic Bayesian networks. BMC Bioinformatics 11:126. Free PDF@BMC 
    11. Harder, T., Boomsma, W., Paluszewski, M., Frellsen, J., Johansson, KE., Hamelryck, T. (2010) Beyond rotamers: A generative , probabilistic model of side chains in proteins. BMC Bioinformatics 11:306. Free PDF@BMC
    12. Stovgaard, K., Andreetta, C., Ferkinghoff-Borg, J., Hamelryck, T. (2010) Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models. BMC Bioinformatics 11:429.  PDF@BMC Bioinformatics
    13. Hamelryck, T., Borg, M., Paluszewski, M., Paulsen, J.,  Frellsen, J., Andreetta, C., Boomsma, W. Bottaro, S., Ferkinghoff-Borg, J. (2010) Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS ONE 5(11): e13714. PDF@PLoS ONE , Preprint@arXiv
    14. Olsson, S., Boomsma, W., Frellsen, J., Bottaro, S., Harder, T., Ferkinghoff-Borg, J., Hamelryck, T. (2011) Generative probabilistic models extend the scope of inferential structure determination. J. Magn. Reson. 213(1), 182-6. PDF
    15. Harder, T., Borg, M., Boomsma, W., Røgen,  P., Hamelryck, T. (2012) Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics 28, 510-515. PDF@Bioinformatics.
    16. Bottaro, S., Boomsma, W., Johansson, K.E., Andreetta, C., Hamelryck, T., Ferkinghoff-Borg, J. (2012) Subtle Monte Carlo updates in dense molecular systems. J. Chem. Theory Comput. 8, 695–702. PDF@ACS
    17. Harder, T., Borg, M., Bottaro, S., Boomsma, W.,  Olsson, S., Ferkinghoff-Borg, J., Hamelryck, T. (2012)  An efficient null model for conformational fluctuations in proteins.  Structure, 20, 1028-1039. PDF@Structure.
    18. Mardia, KV., Kent, JT., Zhang, Z., Taylor, C., Hamelryck, T. (2012) Mixtures of concentrated multivariate sine distributions with applications to bioinformatics. J. Appl. Stat. 39, 2475-2492. PDF
    19. Johansson, KE., Hamelryck, T. (2013) A simple probabilistic model of multibody Interactions in proteins. Proteins 81, 1340-50.
    20. Boomsma, W., Frellsen, J., Harder, T., Bottaro, S., Johansson, KE., Tian, P., Stovgaard, K., Andreetta, C., Olsson, S., Valentin, J., Antonov, L., Christensen, A., Borg, M., Jensen, J., Lindorff-Larsen, K., Ferkinghoff-Borg, J., Hamelryck, T. (2013) PHAISTOS: A framework for Markov chain Monte Carlo simulation and inference of protein structure. J. Comput. Chem. 34, 1697-705. PDF
    21. Valentin, J., Andreetta, C., Boomsma, W., Bottaro, S., Ferkinghoff-Borg, J., Frellsen, J., Mardia, KV, Tian, P., Hamelryck, T. (2013) Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method. Proteins 82:288–299. PDF@Proteins
    22. Olsson, S., Frellsen, J., Boomsma, W., Mardia, KV., Hamelryck, T. (2013) Inference of structure ensembles of flexible biomolecules from sparse, averaged data. PLoS ONE. 8(11): e79439. Article@PLoS ONE
    23. Christensen, AS., Linnet, TE., Borg, M., Boomsma, W., Lindorff-Larsen, K., Hamelryck, T., Jensen, J. (2013) Protein structure validation and refinement using amide proton chemical shifts derived from quantum mechanics. PLoS ONE. 8(12):e84123 . Article@PLoS ONE
    24. Christensen AS., Hamelryck T., Jensen JH. (2014) FragBuilder: An efficient Python library to setup quantum chemistry calculations on peptide models. PeerJ. 2:e277 Article@PeerJ
    25. Olsson, S., Vögeli, B., Cavalli, A., Boomsma, W., Ferkinghoff-Borg, J., Lindorff-Larsen, K., Hamelryck, T. (2014) Probabilistic approach to the determination of native state ensembles of proteins. J. Chem. Theory Comput. 10(8):3484-3491. Article@JCT
    26. Boomsma, W., Tian, P., Ferkinghoff-Borg, J., Hamelryck, T., Lindorff-Larsen, K. , Vendruscolo, M. (2014) Equilibrium simulations of proteins using molecular fragment replacement and NMR chemical shifts. Proc. Natl. Acad. Sci. USA. 111(38):13852-13857. Article@PNAS

    Conference proceedings

    1. Won, KJ., Hamelryck, T., Prugel-Bennet, A., Krogh, A. (2005) Evolving hidden Markov models for protein secondary structure prediction. Proceedings of the 2005 IEEE Congress on Evolutionary Computation, pp. 33-40, Edinburgh. PDF
    2. Kent, J.T., Hamelryck, T. (2005) Using the Fisher-Bingham distribution in stochastic models for protein structure. In S. Barber, P.D. Baxter, K.V.Mardia, & R.E. Walls (Eds.), LASR 2005 - quantitative biology, shape analysis, and wavelets, pp. 57-60. Leeds university press, Leeds, UK. PDF@LASR
    3. Boomsma, W., Kent, J.T., Mardia, K.V., Taylor, C.C. & Hamelryck, T. (2006) Graphical models and directional statistics capture protein structure. In S. Barber, P.D. Baxter, K.V.Mardia, & R.E. Walls (Eds.), LASR 2006 - Interdisciplinary statistics and bioinformatics, pp. 91-94. Leeds university press, UK. PDF@LASR
    4. Boomsma, W., Borg, M., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Krogh, A., Mardia, KV. and Hamelryck, T. (2008) PHAISTOS: protein structure prediction using a probabilistic model of local structure.  Proceedings of CASP8, Cagliari, Sardinia, Italy, December 3-7 2008. pp 82-83. PDF@CASP8
    5. Borg, M., Mardia, KV., Boomsma, W., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Røgen, P., Hamelryck, T. A probabilistic approach to protein structure prediction: PHAISTOS in CASP9. LASR 2009 - Statistical tools for challenges in bioinformatics, pp. 65-70. Leeds university press, Leeds, UK. PDF@LASR
    6. Paulsen, J., Paluszewski, M., Mardia, KV., Hamelryck, T. (2010) A probabilistic model of hydrogen bond geometry in proteins. LASR 2010 - High-throughput sequencing, proteins and statistics, pp. 61-64. Leeds university press, Leeds, UK. PDF@LASR
    7. Mardia, KV.,  Frellsen, J.,  Borg, M.,  Ferkinghoff-Borg, J., Hamelryck, T. (2011) A statistical view on the reference ratio method, LASR 2011 - High-throughput sequencing, proteins and statistics, pp. 55-61. Leeds university press, Leeds, UK. PDF@LASR
    8. Antonov, L., Andreetta, C., Hamelryck, T.  (2012) An efficient parallel GPU evaluation of small angle X-ray scattering profiles. In  BIOSTEC 2012, 5th Int'l Joint Conf. on Biomedical Engineering Systems and Technologies,  102-108, Algarve, Portugal. PDF
    9. Hamelryck, T., Haslett, J., Mardia, K., Kent, JT., Valentin, J., Frellsen, J., Ferkinghoff-Borg, J. (2013) On the reference ratio method and its application to statistical protein structure prediction. LASR 2013 - Statistical models and methods for non-Euclidean data with current scientific applications. Leeds university press, Leeds, UK. PDF@LASR
    10. Olsson, S., Hamelryck, T. (2013) On the significance of the reference ratio method in inferential structure determination of biomolecules. LASR 2013 - Statistical models and methods for non-Euclidean data with current scientific applications. Leeds university press, Leeds, UK. PDF@LASR
    11. Frellsen, J., Hamelryck, T., Ferkinghoff-Borg, J. (2013) Combining the multicanonical ensemble with generative probabilistic models of local biomolecular structure. 59th ISI World Statistics Congress. Hong Kong, China. 25-30 August, 2013. PDF

    Books and book chapters

    1. Chang, J.,  Chapman, B.,  Friedberg, I., Hamelryck, T., de Hoon, M., Cock, P., Antao, T., Talevich, E., Wilczyński, B. (2012) Biopython tutorial and cookbook. Biopython project. PDF@Biopython.org
    2. Boomsma, W., Bottaro, S., Hamelryck, T., Frellsen, J., Andreetta, C., Borg, M., Harder, T., Johansson, KE., Stovgaard, S., Tian, P. (2012) Phaistos user manual (version 1.0). University of Copenhagen. PDF@SourceForge
    3. Paluszewski, M., Frellsen, J., Hamelryck, T.  (2009) Mocapy++: A C++ toolkit for inference and learning in dynamic Bayesian networks. University of Copenhagen. PDF
    4. Hamelryck, T., Mardia, KV., Ferkinghoff-Borg, J., Editors. (2012) Bayesian methods in structural bioinformatics. Book in the Springer series "Statistics for biology and health", 385 pages, 13 chapters. Springer Verlag, March, 2012. Book description at Springer.
    5. Hamelryck, T. (2012) An overview of Bayesian inference and graphical models.  In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
    6. Borg, M., Hamelryck, T. Ferkinghoff-Borg, J. (2012) On the physical relevance and statistical interpretation of knowledge based potentials.  In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
    7. Frellsen, J., Mardia, KV., Borg, M., Ferkinghoff-Borg, J., Hamelryck, T. (2012) Towards a probabilistic model of protein structure: The reference ratio method. In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
    8. Boomsma, W., Frellsen, J., Hamelryck, T. (2012) Probabilistic models of local biomolecular structure and their applications. In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
    9. Antonov, LD., Andreetta, C., Hamelryck, T. (2013) Parallel GPGPU evaluation of small angle X-ray scattering profiles in a Markov chain Monte Carlo framework. In J. Gabriel et al. (eds.). BIOSTEC 2012, CCIS, 357, 222-235. PDF@Springer

    In the press

    1. One step closer to green chemistry and improved pharmaceuticals. Press release, KU, June, 2008.
    2. Designerenzymer til grøn kemi. Press release,  Det Frie Forskningsråd (DFF), June, 2009.