search query: @keyword filamentous fungi / total: 3
reference: 2 / 3
« previous | next »
Author:Dykstra, Karmen
Title:Predicting Protein Producibility: Binary classification of recombinant proteins produced in filamentous fungi
Publication type:Master's thesis
Publication year:2016
Pages:71 s. + liitt. 7      Language:   eng
Department/School:Perustieteiden korkeakoulu
Main subject:Machine Learning and Data Mining   (SCI3015)
Supervisor:Rousu, Juho
Instructor:Arvas, Mikko
Electronic version URL: http://urn.fi/URN:NBN:fi:aalto-201602161354
Location:P1 Ark Aalto  5796   | Archive
Keywords:binary classification
SVM
protein
filamentous fungi
semi-supervised
Abstract (eng):Recombinant protein synthesis aims to produce specific protein products of interest in living cells.
However, protein production is subject to failure, and thus the successful development of a computational tool to predict protein sequence success prior to laboratory experimentation would save time and resources.
We demonstrate the ability of an SVM trained on protein amino acid composition to predict successful protein production in a dataset of sequences tested in the host species Trichoderma reesei.

We found that predictive models generalize well between two species of filamentous fungi, and furthermore that 50 training sequences are sufficient to train a model that yields an AUC of over .7.
We introduced novel predictive features using protein domains detected with the InterProScan tool, which were modestly successful in the predictive task but whose addition did not improve over the use of amino acid composition alone.

Experiments applying semi-supervised SVM formulations to the predictive task did not yield significant improvement, most likely because the spatial distribution of data points under the chosen numeric representations did not conform to the assumptions of the semi-supervised models.
We explored the species of origin and enzyme function of sequences from the UniProt SwissProt database predicted to be successful by the trained SVM models, and showed that models trained with an RBF kernel were the most conservative in terms of the number of predicted successes.
ED:2016-02-21
INSSI record number: 53146
+ add basket
« previous | next »
INSSI