Computational Biology


Random Permutations Based Alignment / “Shuffling-Based” Alignment

cards01prototype example implementation: Download Page (RPA P3). See “software” page for more information.

An introduction paper: “A Random-Permutations-Based Approach to Fast Read Alignment”  (RECOMB-SEQ 2013).

Also see [Charikar, 2002] (random permutations in search without the special properties for the sequencing problem), and this report on the properties of sequencing.

For more information about the algorithm see search / alignment page and seach / alignment FAQ.


Additional applications: Assembly and more.

puzzle01Assembly: the algorithm is also used to construct approximate overlap graphs. These graph are used for fast assembly. Note that the algorithm allows errors in the reads, so no error correction is necessary prior to the construction of the graph. See: technical report.

More about using permutations-based search for assembly, error correction and other applications by constructing graphs of reads: search / alignment page.


 Long-Range “Independence” in DNA

elephants02The repetitive nature of DNA strings is one of the challenges in read alignment. When one examines longer substrings of DNA, they appear less repetitive, or more unique.

Permutations-based algorithms benefit from this property. We describe a way of measuring the property in this report and ways of using this property in reads with many indels, in this report.





Homopolymer Length Filters

CompBio05Homopolymer length filters eliminate the mapping problem caused by homopolymer length errors (ionTorrent/454). A technical report about “Homopolymer Length Filters” is available here.

More information: HPLFs page.