Bioinformatics Centre > Research > Structural bioinformatics
Structural bioinformatics group
Group leader: Thomas Hamelryck
One of the major unsolved problems in modern day molecular biology is the protein folding problem: given an amino acid sequence, predict the overall three-dimensional structure of the corresponding protein. It has been known since the seminal work of Christian B. Anfinsen in the early seventies that the sequence of a protein encodes its structure, but the exact details of the encoding still remain elusive. Since the protein folding problem is of enormous practical, theoretical and medical importance - and in addition forms a fascinating intellectual challenge - it is often called the holy grail of bioinformatics. The Structural Bioinformatics group focuses on protein structure prediction, protein design and protein structure determination from experimental data (NMR, SAXS), including data obtained from protein ensembles.
We are tackling the protein structure prediction problem from an original angle. Our group develops sophisticated probabilistic models that describe various aspects of protein structure, and uses these models in prediction, design and structure determination. These probabilistic models are mainly based on two key ingredients: graphical models (including dynamic Bayesian networks), which are powerful machine learning methods that can be interpreted in the language of statistical physics, and directional statistics, the statistics of angles, directions and orientations. Recently, we extended our statistical approach to RNA 3D structure.
Our probabilistic view on protein structure prediction, simulation and inference is expounded in an upcoming book entitled "Bayesian methods in structural bioinformatics", to be published by Springer in March, 2012. The above innovations are available in PHAISTOS version 1.0, our Markov chain Monte Carlo software framework for protein structure simulation.
Research highlights
- For more information on our statistical approach to protein structure prediction, see our articles on probabilistic models of protein structure that appeared in PLoS computational Biology (2006) and PNAS (2008), and the review on probabilistic methods in structural bioinformatics (2009).
- Our probabilistic model of side chain conformations, Basilisk (BMC Bioinformatics, 2010), abolishes the need for the use of discrete side chain rotamers in conformational sampling.
- We also developed a probabilistic model of RNA structure in atomic detail, see PLoS Computational Biology (2009).
- Our dynamic Bayesian network toolkit Mocapy++ (BMC Bioinformatics, 2010) - which was used to formulate these models - is freely available from SourceForge.
- After 20 years of controversy and countless publications on the subject, we finally settle the discussion on the validity of so-called potentials of mean force as proposed by Sippl in 1990. Moreover, our results point to important new applications and solve the classic problem of the reference state. See our recent article in PLoS ONE (2010).
- PHAISTOS version 1.0, our Markov chain Monte Carlo framework for protein structure simulation, is now available from sourceforge .
People
Group leaderPostdocs
- None at the moment.
- Kristoffer E. Johansson (in collaboration with DTU-Elektro/BIKU-Section for protein science)
- Simon Olsson
- Jan Valentin
- Martin Paluszewski, Postdoc
- Wouter Boomsma, PhD student and postdoc
- Tim Harder, PhD student
- Kasper Stovgaard, PhD student
- Mikael Borg, Postdoc
- Christian Andreetta, PhD student
- Jes Frellsen, PhD student and postdoc
News
- The Structure Group is involved in Google's Summer of Code with a project around Biopython/Mocapy++.
Funding
- Danish Research Council for Technology and Production Sciences (FTP), "Data driven protein structure prediction" Feb 2007-Feb 2010. 3,800,000 DKK (510,200 EUR).
- Danish Research Council for Strategic Research (NABIIT), "Simulating proteins on a millisecond time-scale" Sep 2006-Feb 2010. 7,800,000 DKK (1,047,037 EUR). PI: Prof Anders Krogh. In collaboration with Novozymes .
- Danish Research Council for Technology and Production Sciences (FTP), "Protein design: Development of molecular biology and bioinformatics tools" Sep. 2007-Sep. 2010. 5,600,000 DKK (750,900 EUR). Partner in a project of Jakob R. Winther, department of biology, university of Copenhagen.
- Danish Research Council for Technology and Production Sciences (FTP), "Protein structure ensembles from mathematical models - with application to Parkinson's alpha-synuclein" , April 2010-March 2013, 4.280.930 DKK
In the press
- One step closer to green chemistry and improved pharmaceuticals. Press release, KU, June, 2008.
- Designerenzymer til grøn kemi. Press release,
Det Frie Forskningsråd (DFF), June, 2009.
Teaching
- Structural bioinformatics course (7.5 ECTS), Block 2, Autumn 2011.
- Machine learning course: Advanced topics in data modelling (7.5 ECTS), Block 3, Spring 2010
Publications
2004
- Winther, O., Krogh, A. (2004) Teaching computers to fold proteins. Phys. Rev. E, 70:030903. PDF
2005
- Hamelryck T. (2005) An amino acid has two sides: A new 2D measure provides a different view of solvent exposure. Proteins Struct. Func. Bioinf., 59, 38-48. PDF
- Boomsma, W., Hamelryck, T. (2005) Full Cyclic Coordinate Descent: Solving the protein loop closure problem in Calpha space, BMC Bioinformatics, 6:159 Abstract&PDF@BioMed
- Won, KJ., Hamelryck, T., Prugel-Bennett, A., Krogh, A. (2005) Evolving Hidden Markov Models for Protein Secondary Structure Prediction, Proceedings of the 2005 IEEE Congress on Evolutionary Computation, pp. 33-40, Edinburgh. PDF
- Kent, J.T., Hamelryck, T. (2005) Using the Fisher-Bingham distribution in stochastic models for protein structure. In S. Barber, P.D. Baxter, K.V.Mardia, & R.E. Walls (Eds.), LASR 2005 - quantitative biology, shape analysis, and wavelets, pp. 57-60. Leeds university press, Leeds, UK. PDF@LASR
2006
- Boomsma, W., Kent, J.T., Mardia, K.V., Taylor, C.C. & Hamelryck, T. (2006) Graphical models and directional statistics capture protein structure. In S. Barber, P.D. Baxter, K.V.Mardia, & R.E. Walls (Eds.), LASR 2006 - Interdisciplinary statistics and bioinformatics, pp. 91-94. Leeds university press, UK. PDF@LASR
- Hamelryck, T., Kent, J., Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS Comp. Biol., 2(9): e131 PDF@PLoS
- Paluszewski, M., Hamelryck, T. and Winter, P. (2006) Reconstructing protein structure from solvent exposure using Tabu Search. Algorithms Mol. Biol, 1:20. PDF@AlgMolBiol.
2007
- Won, KJ., Hamelryck, T., Prugel-Bennett, A. and Krogh, A. (2007) An evolving method for learning HMM Structure: prediction of protein secondary structure. BMC Bioinformatics, 8, 357 PDF@BMC Bioinformatics
2008
- Boomsma, W., Mardia, KV., Taylor, CC., Ferkinghoff-Borg, J., Krogh, A. and Hamelryck, T. (2008) A generative, probabilistic model of local protein structure. Proc. Natl. Acad. Sci. USA, 105, 8932-8937 PDF@PNAS
- Boomsma, W., Borg, M., Frellsen, J., Harder, T., Stovgaard, K.,
Ferkinghoff-Borg, J., Krogh, A., Mardia, KV. and Hamelryck, T. (2008) PHAISTOS: protein structure prediction using a probabilistic model of local structure. Proceedings of CASP8, Cagliari, Sardinia, Italy, December 3-7 2008. pp 82-83. PDF@CASP8
2009
- Hamelryck, T. (2009) Probabilistic models and machine learning in structural bioinformatics. Statistical Methods in Medical Research, Review. 18, 505-526. PDF
- Cock, P., Antao, T., Chang, J., Chapman, B., Cox, C., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11),1422-1423. Free PDF@Bioinformatics
- Frellsen, J., Moltke, I., Thiim, M., Mardia, KV., Ferkinghoff-Borg, J., Hamelryck, T. (2009) A probabilistic model of RNA conformational space. PLoS Comp. Biol., 5(6), e1000406 Free PDF@PLOS
- Borg, M., Mardia, KV., Boomsma, W., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Røgen, P., Hamelryck, T. A probabilistic approach to protein structure prediction: PHAISTOS in CASP9. LASR 2009 - Statistical tools for challenges in bioinformatics, pp. 65-70. Leeds university press, Leeds, UK. Free PDF@LASR 2009
2010
- Paluszewski, M., Hamelryck, T. (2010) Mocapy++ - A toolkit for inference and learning in dynamic Bayesian networks. BMC Bioinformatics, 11:126. Free PDF@BMC
- Harder, T., Boomsma, W., Paluszewski, M., Frellsen, J., Johansson, KE., Hamelryck, T. (2010) Beyond rotamers: A generative , probabilistic model of side chains in proteins. BMC Bioinformatics, 11:306. Free PDF@BMC
- Paulsen, J., Paluszewski, M., Mardia, KV., Hamelryck, T. (2010) A probabilistic model of hydrogen bond geometry in proteins. LASR 2010 - High-throughput sequencing, proteins and statistics, pp. 61-64. Leeds university press, Leeds, UK. PDF@LASR
- Stovgaard, K., Andreetta, C., Ferkinghoff-Borg, J., Hamelryck, T. (2010) Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models. BMC Bioinformatics, 11:429. PDF@BMC Bioinformatics
2011
- Mardia, KV., Frellsen, J., Borg, M., Ferkinghoff-Borg, J., Hamelryck, T. A statistical view on the reference ratio method, LASR 2011 - High-throughput sequencing, proteins and statistics, pp. 55-61. Leeds university press, Leeds, UK. PDF@LASR
- Olsson, S., Boomsma, W., Frellsen, J., Bottaro, S., Harder, T., Ferkinghoff-Borg, J., Hamelryck, T. (2011) Generative probabilistic models extend the scope of inferential structure determination. J. Magn. Reson. 213(1), 182-6. PDF
- Harder, T., Borg, M., Boomsma, W., Røgen, P., Hamelryck, T. (2011) Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics. Accepted. Preliminary PDF@Bioinformatics.
2012
- Bottaro, S., Boomsma, W., Johansson, K.E., Andreetta, C., Hamelryck, T., Ferkinghoff-Borg, J. (2012) Subtle Monte Carlo updates in dense molecular systems. J. Chem. Theory Comput. Accepted. Preliminary PDF@ACS
- Hamelryck, T., Mardia, KV., Ferkinghoff-Borg, J., Editors. (2012) Bayesian methods in structural bioinformatics. Book in the Springer series "Statistics for biology and health", 400 pages, 13 chapters. To be published in March, 2012. Book description at Springer.
- An efficient parallel GPU evaluation of small angle X-ray scattering profiles. Antonov, L., Andreetta, C., Hamelryck, T. In BIOSTEC 2012, 5th Int'l Joint Conf. on Biomedical Engineering Systems and Technologies, Algarve, Portugal.
Collaborations
Denmark- François Anton , Department of Informatics and Mathematical Modelling (IMM), DTU
- Jesper Ferkinghoff-Borg, Ørsted.DTU. This collaboration includes postdoc Wouter Boomsma and PhD student Sandro Bottaro.
- Jan H. Jensen, Department of chemistry, University of Copenhagen
- Peter Røgen, Department of Mathematics, DTU
- Jakob R. Winther, Department of biology, University of Copenhagen
- Pawel Winter, Department of computer science (DIKU), University of Copenhagen
International
- Kanti Mardia, John Kent, Charles Taylor, University of Leeds, UK
- Karolin Luger, Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biology, Colorado State University, USA
