Signal finding (pattern discovery) in biological sequences is a fundamental problem in both computer science and molecular biology. From a biological point of view, the knowledge of which motifs are the most frequent in a given set of sequences is only partially informative. In general, one can identify different subsets of sequences, each characterized by different motifs. To superimpose a single set of frequent motifs over the whole dataset is often an oversimplification that do not expose important pieces of information. We propose a de novo framework allowing one to simultaneously build partitions of protein sequences and groups of associated patterns. In this way we are able to individuate a richer set of motifs, each one possibly characterizing only some of the sequences in the whole dataset.
File in questo prodotto:
Non ci sono file associati a questo prodotto.