Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

LazyB: fast and cheap genome assembly

T. Gatter, S. von Löhneysen, J. Fallmann, P. Drozdova, T. Hartmann, und P. Stadler. Algorithms Mol. Biol., 16 (1): 8 (Juni 2021)

Zusammenfassung

BACKGROUND: Advances in genome sequencing over the last years have lead to a fundamental paradigm shift in the field. With steadily decreasing sequencing costs, genome projects are no longer limited by the cost of raw sequencing data, but rather by computational problems associated with genome assembly. There is an urgent demand for more efficient and and more accurate methods is particular with regard to the highly complex and often very large genomes of animals and plants. Most recently, ``hybrid'' methods that integrate short and long read data have been devised to address this need. RESULTS: LazyB is such a hybrid genome assembler. It has been designed specificially with an emphasis on utilizing low-coverage short and long reads. LazyB starts from a bipartite overlap graph between long reads and restrictively filtered short-read unitigs. This graph is translated into a long-read overlap graph G. Instead of the more conventional approach of removing tips, bubbles, and other local features, LazyB stepwisely extracts subgraphs whose global properties approach a disjoint union of paths. First, a consistently oriented subgraph is extracted, which in a second step is reduced to a directed acyclic graph. In the next step, properties of proper interval graphs are used to extract contigs as maximum weight paths. These path are translated into genomic sequences only in the final step. A prototype implementation of LazyB, entirely written in python, not only yields significantly more accurate assemblies of the yeast and fruit fly genomes compared to state-of-the-art pipelines but also requires much less computational effort. CONCLUSIONS: LazyB is new low-cost genome assembler that copes well with large genomes and low coverage. It is based on a novel approach for reducing the overlap graph to a collection of paths, thus opening new avenues for future improvements. AVAILABILITY: The LazyB prototype is available at https://github.com/TGatter/LazyB .

Zitieren Sie diese Publikation

%0 Journal Article %1 Gatter2021-ha %A Gatter, Thomas %A von Löhneysen, Sarah %A Fallmann, Jörg %A Drozdova, Polina %A Hartmann, Tom %A Stadler, Peter F %D 2021 %I Springer Science and Business Media LLC %J Algorithms Mol. Biol. %K Anchors; Genome Illumina Nanopore Spanning Unitigs assembly; sequencing; tree; %N 1 %P 8 %T LazyB: fast and cheap genome assembly %V 16 %X BACKGROUND: Advances in genome sequencing over the last years have lead to a fundamental paradigm shift in the field. With steadily decreasing sequencing costs, genome projects are no longer limited by the cost of raw sequencing data, but rather by computational problems associated with genome assembly. There is an urgent demand for more efficient and and more accurate methods is particular with regard to the highly complex and often very large genomes of animals and plants. Most recently, ``hybrid'' methods that integrate short and long read data have been devised to address this need. RESULTS: LazyB is such a hybrid genome assembler. It has been designed specificially with an emphasis on utilizing low-coverage short and long reads. LazyB starts from a bipartite overlap graph between long reads and restrictively filtered short-read unitigs. This graph is translated into a long-read overlap graph G. Instead of the more conventional approach of removing tips, bubbles, and other local features, LazyB stepwisely extracts subgraphs whose global properties approach a disjoint union of paths. First, a consistently oriented subgraph is extracted, which in a second step is reduced to a directed acyclic graph. In the next step, properties of proper interval graphs are used to extract contigs as maximum weight paths. These path are translated into genomic sequences only in the final step. A prototype implementation of LazyB, entirely written in python, not only yields significantly more accurate assemblies of the yeast and fruit fly genomes compared to state-of-the-art pipelines but also requires much less computational effort. CONCLUSIONS: LazyB is new low-cost genome assembler that copes well with large genomes and low coverage. It is based on a novel approach for reducing the overlap graph to a collection of paths, thus opening new avenues for future improvements. AVAILABILITY: The LazyB prototype is available at https://github.com/TGatter/LazyB .

PUMA

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

LazyB: fast and cheap genome assembly

Zusammenfassung

Links und Ressourcen

Tags

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

PUMA

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML LazyB: fast and cheap genome assembly

Zusammenfassung

Links und Ressourcen

Tags

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

LazyB: fast and cheap genome assembly

Kommentare und Rezensionen
(0)