haku: @keyword protein / yhteensä: 3
viite: 2 / 3
Tekijä:Dykstra, Karmen
Työn nimi:Predicting Protein Producibility: Binary classification of recombinant proteins produced in filamentous fungi
Julkaisutyyppi:Diplomityö
Julkaisuvuosi:2016
Sivut:71 s. + liitt. 7      Kieli:   eng
Koulu/Laitos/Osasto:Perustieteiden korkeakoulu
Oppiaine:Machine Learning and Data Mining   (SCI3015)
Valvoja:Rousu, Juho
Ohjaaja:Arvas, Mikko
Elektroninen julkaisu: http://urn.fi/URN:NBN:fi:aalto-201602161354
Sijainti:P1 Ark Aalto  5796   | Arkisto
Avainsanat:binary classification
SVM
protein
filamentous fungi
semi-supervised
Tiivistelmä (eng):Recombinant protein synthesis aims to produce specific protein products of interest in living cells.
However, protein production is subject to failure, and thus the successful development of a computational tool to predict protein sequence success prior to laboratory experimentation would save time and resources.
We demonstrate the ability of an SVM trained on protein amino acid composition to predict successful protein production in a dataset of sequences tested in the host species Trichoderma reesei.

We found that predictive models generalize well between two species of filamentous fungi, and furthermore that 50 training sequences are sufficient to train a model that yields an AUC of over .7.
We introduced novel predictive features using protein domains detected with the InterProScan tool, which were modestly successful in the predictive task but whose addition did not improve over the use of amino acid composition alone.

Experiments applying semi-supervised SVM formulations to the predictive task did not yield significant improvement, most likely because the spatial distribution of data points under the chosen numeric representations did not conform to the assumptions of the semi-supervised models.
We explored the species of origin and enzyme function of sequences from the UniProt SwissProt database predicted to be successful by the trained SVM models, and showed that models trained with an RBF kernel were the most conservative in terms of the number of predicted successes.
ED:2016-02-21
INSSI tietueen numero: 53146
+ lisää koriin
INSSI