search query: @keyword modeling / total: 119
reference: 20 / 119
Author: | Casado Cuervo, Julia |
Title: | Integrating ENCODE data to model transcriptional regulation |
Publication type: | Master's thesis |
Publication year: | 2014 |
Pages: | vii + 54 s. + liitt. 3 Language: eng |
Department/School: | Perustieteiden korkeakoulu |
Main subject: | Informaatiotekniikka (T-61) |
Supervisor: | Rousu, Juho |
Instructor: | Hautaniemi, Sampsa ; Hollmén, Jaakko ; Ovaska, Kristian |
Electronic version URL: | http://urn.fi/URN:NBN:fi:aalto-201507013756 |
OEVS: | Electronic archive copy is available via Aalto Thesis Database.
Instructions Reading digital theses in the closed network of the Aalto University Harald Herlin Learning CentreIn the closed network of Learning Centre you can read digital and digitized theses not available in the open network. The Learning Centre contact details and opening hours: https://learningcentre.aalto.fi/en/harald-herlin-learning-centre/ You can read theses on the Learning Centre customer computers, which are available on all floors.
Logging on to the customer computers
Opening a thesis
Reading the thesis
Printing the thesis
|
Location: | P1 Ark Aalto 2390 | Archive |
Keywords: | gene regulation modeling linear regression ChIP-seq RNA-seq public biodatabases ENCODE project |
Abstract (eng): | Gene transcription is an essential step in protein expression and fundamental understanding cell function and malfunction. Although our understanding of its underlying mechanisms has increased considerably during the last decade, determining the specific regulatory elements targeting the expression of a given gene remains a challenge. The development of next generation sequencing technologies enabled international consortia like the ENCODE project to produce vast amounts of high-throughput data. The ENCODE project made publicly available sequencing data on dozens of different tissues, and several biological features, including more than a hundred transcription factors. Due to the significant size and detail of these data, manual inspection is impractical, requiring automatic methods to perform this task. Here we present a computational pipeline to apply machine learning techniques to the high-throughput data available at the ENCODE repositories. We measure gene regulatory features from ChIP-seq data, and infer gene expression levels from RNA-seq data. Then, we apply linear regression analyses to model every protein-coding gene, and explore the predictive power of gene regulatory features for gene expression levels. Ultimately, we show the potential of using existing data and its contribution to further understand gene regulatory features. Our genome-wide regression analysis indicates that gene expression is not related to a linear combination of regulatory elements, but may be better modelled with a quadratic function. Further data will be necessary to accurately define the role i that each regulatory feature plays on the initiation of gene transcription. |
ED: | 2014-12-10 |
INSSI record number: 50142
+ add basket
INSSI