Abstract
Motivation: Many repetitive DNA elements are transcribed at appreciable expression levels. Mapping the corresponding RNA sequencing reads back to a reference genome is notoriously difficult and error-prone task, however. This is in particular true if chemical modifications introduce systematic mismatches, while at the same time the genomic loci are only approximately identical, as in the case of tRNAs. Results: We therefore developed a dedicated mapping strategy to handle RNA-seq reads that map to tRNAs relying on a modified target genome in which known tRNA loci are masked and instead intronless tRNA precursor sequences are appended as artificial 'chromosomes'. In a first pass, reads that overlap the boundaries of mature tRNAs are extracted. In the second pass, the remaining reads are mapped to a tRNA-masked target that is augmented by representative mature tRNA sequences. Using both simulated and real life data we show that our best-practice workflow removes most of the mapping artefacts introduced by simpler mapping schemes and makes it possible to reliably identify many of chemical tRNA modifications in generic small RNA-seq data. Using simulated data the FDR is only 2\%. We find compelling evidence for tissue specific differences of tRNA modification patterns. Availability and implementation: The workflow is available both as a bash script and as a Galaxy workflow from https://github.com/AnneHoffmann/tRNA-read-mapping. Contact: fabian@tbi.univie.ac.at. Supplementary information: Supplementary data are available at Bioinformatics online.
Users
Please
log in to take part in the discussion (add own reviews or comments).