haku: @keyword reconstruction / yhteensä: 16
viite: 7 / 16
Tekijä: | El Hadidi, Mohamed |
Työn nimi: | The use of RNA-seq data for re-annotation of transcriptomes |
Julkaisutyyppi: | Diplomityö |
Julkaisuvuosi: | 2012 |
Sivut: | 56 Kieli: eng |
Koulu/Laitos/Osasto: | BIT-tutkimuskeskus |
Oppiaine: | Informaatiotekniikka (T-61) |
Valvoja: | Lähdesmäki, Harri ; Correia, Isabel Sá |
Ohjaaja: | Jiménez-Gómez, José M. |
OEVS: | Sähköinen arkistokappale on luettavissa Aalto Thesis Databasen kautta.
Ohje Digitaalisten opinnäytteiden lukeminen Aalto-yliopiston Harald Herlin -oppimiskeskuksen suljetussa verkossaOppimiskeskuksen suljetussa verkossa voi lukea sellaisia digitaalisia ja digitoituja opinnäytteitä, joille ei ole saatu julkaisulupaa avoimessa verkossa. Oppimiskeskuksen yhteystiedot ja aukioloajat: https://learningcentre.aalto.fi/fi/harald-herlin-oppimiskeskus/ Opinnäytteitä voi lukea Oppimiskeskuksen asiakaskoneilla, joita löytyy kaikista kerroksista.
Kirjautuminen asiakaskoneille
Opinnäytteen avaaminen
Opinnäytteen lukeminen
Opinnäytteen tulostus
|
Sijainti: | P1 Ark Aalto | Arkisto |
Avainsanat: | RNA sequencing transcriptome annotation gene prediction transcriptome reconstruction de novo assembly |
Tiivistelmä (eng): | Recently, demands for whole genome sequencing have been greatly increased for many applications, including the study of SNPs and their role in phenotypic diversity in nature. However, whole genome sequencing using high throughput sequencing methods remains an expensive task, only suitable to large consortium of researchers funded by strong agencies. As an alternative, RNA-seq seems to be an appropriate alternative for many reasons. First, while genome sizes can differ by as much as 5 orders of magnitude, transcriptome sizes differ by less than 2 orders of magnitude even between yeast and polyploid plants. Second, coding sequences are more conserved and have less repetitive elements than non-coding sequences. Finally, RNAseq allows not only the identification of coding polymorphism but also characterization of expression differences, both of which have been shown to underlie phenotypic diversity. In this study, we have developed a pipeline for annotating transcriptomes for species without an available direct reference genome, based mainly on RNA-seq data and a closely related reference genome. Benchmarking studies were performed to decide software components of the pipeline. Among three programs; AUGUTSUS gene prediction tool incorporated with RNA-seq data, Cufflinks transcriptome reconstruction tool and Trinity denovo transcriptome assembler, AUGUSTUS proved to be the most accurate software in terms of sensitivity and specificity. We have used published gene models of Col accession of Arabidopsis thaliana as a reference annotation and compared it with the software generated gene models. The performance of the pipeline in the absence of an available direct reference genome was tested. In such case, a pseudo-reference genome was constructed by incorporating accession-specific SNPs into the closest reference genome. RNA-seq reads were mapped against both the published A. thaliana (Ler accession) and the Ler pseudo-reference genome, which is constructed by incorporating Ler SNPs into Col accession of A. thaliana. The two gene models gave highly similar results when compared with the published Ler gene models. Finally, the pipeline was applied on four different tomato species; S. lycopersicum var. m82, S. pennellii, S. pimpinellifolium and S. habrochaites. Among the four species, only S. lycopersicum var. m82 has a reference genome of S. lycopersicum var. Heinz from which we have constructed pseudoreference genomes for the four species using the available RNA-seq data. AUGUSTUS with RNA-seq guidance was applied to predict genes models from the four constructed pseudoreference genomes. In order to monitor the effect of incorporating species-specific SNPs on annotation, we compared each of the four generated annotations with the published ITAG S. Iycopersicum var. Heinz annotation. Results showed variation in the values of sensitivity and specificity between pairs of compared gene models. We illustrated that evolutionary distances between the four tomato species and the values of sensitivity and specificity are inversely correlated with each others. |
ED: | 2012-09-19 |
INSSI tietueen numero: 45275
+ lisää koriin
INSSI