Inssi

Helecon

Vocabulary

Tenttu

search query: @supervisor Lähdesmäki, Harri / total: 35

reference: 1 / 35

« previous | next »

Author:	Weldatsadik, Rigbe
Title:	Pool-seq analysis for the identification of polymorphisms in bacterial strains and utilization of the variants for protein database creation
Publication type:	Master's thesis
Publication year:	2016
Pages:	58 s. + liitt. 10 Language: eng
Department/School:	Perustieteiden korkeakoulu
Main subject:	Bioinformatiikka (T3012)
Supervisor:	Lähdesmäki, Harri
Instructor:	Jokiranta, Sakari T.
Electronic version URL:	http://urn.fi/URN:NBN:fi:aalto-201611025401
Location:	P1 Ark Aalto 4740 \| Archive
Keywords:	pooled sequencing variant protein database variant calling shotgun proteomics
Abstract (eng):	Pooled sequencing (Pool-seq) is the sequencing of a single library that contains DNA pooled from different samples. It is a costeffective alternative to individual whole genome sequencing. In this study, we utilized Pool-seq to sequence 100 streptococcus pyogenes strains in two pools to identify polymorphisms and create variant protein databases for shotgun proteomics analysis. We investigated the efficacy of the pooling strategy and the four tools used for variant calling by using individual sequence data of six of the strains in the pools as well as 3407 publicly available strains from the European Nucleotide Archive. Besides the raw sequence data from the public repository, we also extracted polymorphisms from 19 S.pyogenes publicly available complete genomes and compared the variations against our pools. In total 78955 variants (76981 SNPs and 1725 INDELs ) were identified from the two pools. Of these, about 60.5% and 95.7% were discovered in the complete genomes and the European Nucleotide Archive data respectively. Collectively, the four variant calling tools were able to mine majority of the variants, about 96.5%, found from the six individual strains, suggesting Pool-seq is a robust approach for variation discovery. Variants from the pools that fell in coding regions and had non synonymous effects constituted 24% and were used to create variant protein databases for shotgun proteomics analysis. These variant databases improved protein identification in mass spectrometry analysis.
ED:	2016-11-13

INSSI record number: 54932

« previous | next »

INSSI