Bioinformatics Centre > Local news > Paper in PLoS Computat...
2009-11-26
Paper in PLoS Computational Biology
Eivind Valen, Albin Sandelin,Ole Winther and Anders Krogh published a paper in PLoS Computational Biology - Pattern Discovery is improved by a Discriminatory Approach , describing how pattern discovery in promoters can be improved by using a large negative set of actual promoters as background instead of a statistical model.
The method is called Motif Annealer (MoAN), and is available here , together with data sets used for testing.
Here is the abstract:
A major goal in post-genome biology is the complete mapping of the gene
regulatory networks for every organism. Identification of regulatory
elements is a prerequisite for realizing this ambitious goal. A common
problem is finding regulatory patterns in promoters of a group of
co-expressed genes, but contemporary methods are challenged by the size
and diversity of regulatory regions in higher metazoans. Two key issues
are the small amount of information contained in a pattern compared to
the large promoter regions and the repetitive characteristics of
genomic DNA, which both lead to “pattern drowning”. We present a new
computational method for identifying transcription factor binding sites
in promoters using a discriminatory approach with a large negative set
encompassing a significant sample of the promoters from the relevant
genome. The sequences are described by a probabilistic model and the
most discriminatory motifs are identified by maximizing the probability
of the sets given the motif model and prior probabilities of motif
occurrences in both sets. Due to the large number of promoters in the
negative set, an enhanced suffix array is used to improve speed and
performance. Using our method, we demonstrate higher accuracy than the
best of contemporary methods, high robustness when extending the length
of the input sequences and a strong correlation between our objective
function and the correct solution. Using a large background set of real
promoters instead of a simplified model leads to higher discriminatory
power and markedly reduces the need for repeat masking; a common
pre-processing step for other pattern finders.
