search query: @instructor Lähdesmäki, Harri / total: 14
reference: 3 / 14
« previous | next »
Author:Khakipoor, Banafsheh
Title:Integrated data analysis pipeline for whole human genome transcription factor binding sites prediction
Publication type:Master's thesis
Publication year:2015
Pages:36 s. + liitt. 1      Language:   eng
Department/School:Perustieteiden korkeakoulu
Main subject:Bioinformatics   (T3012)
Supervisor:Lähdesmäki, Harri
Instructor:Lähdesmäki, Harri
Electronic version URL: http://urn.fi/URN:NBN:fi:aalto-201506303593
Location:P1 Ark Aalto  2892   | Archive
Keywords:transcription factor
PWM
TRANSFAC
JASPAR
SELEX
PBM
Abstract (eng):Transcription factors (TF) have a central role in regulating gene expression by binding to regulatory regions in DNA.
Position weight matrix (PWM) model is the most commonly used model for representing and predicting TF binding sites.

Consequently, several studies have been done on predicting TF binding sites using PWMs and many databases have been created containing large numbers of PWMs.
However, these studies require the user to search for binding sites for each PWM separately, thus making it is difficult to get a general view of binding predictions for many PWMs simultaneously.

In response to this need, this thesis project evaluates both individual and groups of PWMs and creates an effortless method to analyze and visualize the desired set of PWMs together, making it easier for biologist to analyze large amount of data in a short period of time.

For this purpose, we used bioinformatics methods to detect putative TF binding sites in human genome and make them available online via the UCSC genome browser.
Still, the sheer amount of data in PWM databases required a more efficient method to summarize TF binding prediction.
Hence, we used PWM similarity measures and clustering algorithms to group together PWMs and to create one integrated database from four popular PWM databases: SELEX, TRANSFAC, UniPROBE, and JASPAR.
All results are made publicly available for the research community via the UCSC genome broswer.
ED:2015-08-16
INSSI record number: 52009
+ add basket
« previous | next »
INSSI