Inssi

Helecon

Vocabulary

Tenttu

search query: @keyword Knowledge Representation / total: 8

reference: 2 / 8

« previous | next »

Author:	Rauhala, Antti
Title:	Pair Expression: Re-expression Based Machine Learning Technique
	Pari-ilmaisu: Uudelleenilmaisuun perustuva koneoppimistekniikka
Publication type:	Master's thesis
Publication year:	2006
Pages:	110 Language: eng
Department/School:	Informaatio- ja luonnontieteiden tiedekunta
Main subject:	Informaatiotekniikka (T-61)
Supervisor:	Lähdesmäki, Harri
Instructor:	Haavikko, Matti
OEVS:	Electronic archive copy is available via Aalto Thesis Database. Instructions close Reading digital theses in the closed network of the Aalto University Harald Herlin Learning Centre In the closed network of Learning Centre you can read digital and digitized theses not available in the open network. The Learning Centre contact details and opening hours: https://learningcentre.aalto.fi/en/harald-herlin-learning-centre/ You can read theses on the Learning Centre customer computers, which are available on all floors. Logging on to the customer computers Aalto University staff members log on to the customer computer using the Aalto username and password. Other customers log on using a shared username and password. Opening a thesis On the desktop of the customer computers, you will find an icon titled: Aalto Thesis Database Click on the icon to search for and open the thesis you are looking for from Aaltodoc database. You can find the thesis file by clicking the link on the OEV or OEVS field. Reading the thesis You can either print the thesis or read it on the customer computer screen. You cannot save the thesis file on a flash drive or email it. You cannot copy text or images from the file. You cannot edit the file. Printing the thesis You can print the thesis for your personal study or research use. Aalto University students and staff members may print black-and-white prints on the PrintingPoint devices when using the computer with personal Aalto username and password. Color printing is possible using the printer u90203-psc3, which is located near the customer service. Color printing is subject to a charge to Aalto University students and staff members. Other customers can use the printer u90203-psc3. All printing is subject to a charge to non-University members.
Location:	P1 Ark Aalto \| Archive
Keywords:	re-expression machine learning compression re-expression driven learning language learning pair expression statistical learning naive Bayesian knowledge representation uudelleenilmaisu koneoppiminen pakkaus uudelleenilmaisuun perustuva oppiminen pari-ilmaisu naiivi bayesialainen kielen oppiminen tilastollinen oppiminen tiedon esittäminen
Abstract (eng):	This paper introduces a new technique for machine learning that is based on a brand new approach. Pair-expression attempts to find simpler and more dense expression for data so, that unknown variables becomes easier to predict and data is easier to compress. So in fact the technique is re expression technique, but it has been designed for and it can be successfully used for machine learning. Combined with naive Bayesian predicting, it eliminates efficiently the bias resulting from naive assumption, and it can lead to even dramatic reduction in the error depending of the sample. As a result, naive Bayesian no more functioned as a mere classifier, but as a predictor which provided (approximately) bialess probability estimates for unknown variables. So as a difference to traditional machine learners, the technique attempts to optimize the information expression. In this problem setting, the aim is to find an optimal language L, that can be used for re-expressing the original data in form, where the redundancy between variables has been minimized and as a consequence the regularities present in the data are captured in the language's structure. During language construction, surprisingly common variable pairs are re expressed by introducing new expression variables. The redundancy introduced by the new expression variables is eliminated with a special technique called 'variable reduction'. Technique's properties were examined with a thought play, where the initial independence assumption is equated with classical analysis (where problems are divided into sub problems) and re-expression is equated with classical synthesis (where solutions are formed from sub solutions). In the method the 'synthesis' is targeted against regular subsystems, which is considered to reduce approximation error with ideally small price of complexity and training error. Technique's performance was evaluated by teaching it seven samples from 1995 Statlog study and by comparing results against 22 machine learners public results. In testing pair-expression produced superior results for data, which consisted mostly of discrete variables and it was first or second in four of seven samples, but with purely numeric samples the results were mediocre. The results were interpreted as extremely good and promising, especially because there is still lot to develop in actual algorithms. Especially the predicted variables could not be included for re expression, because problems with prediction algorithm, which limited the learning ability. Based on experiences of this study, expression driven learning appears as very fruitful grounds for future research.
Abstract (fin):	Tämä paperi esittelee uuden tekniikan koneoppimiseen, joka perustuu uudelle lähestymistavalle. Tekniikassa pyritään hakemaan yksinkertaisempaa ja tiiviimpää ilmaisua datalle siten, että tuntemattomat muuttujat on helpompi selvittää ja että data pakkaantuu pienempään tilaan. Varsinaisesti tekniikka on siis tiedon uudelleenilmaisemistekniikka, mutta se on suunniteltu ja sitä voidaan soveltaa menestyksekkäästi koneoppimiseen. Käytettynä naiivin Bayesialaisen ennustajan kanssa, se eliminoi tehokkaasti naiivista oletuksesta johtuvaa systemaattista harhaa, ja voi johtaa jopa dramaattiseen ennustusvirheen pienenemiseen riippuen otteesta. Seurauksena naiivi Bayesialainen ei toiminut niinkään luokittajana, vaan ennustajana, joka tarjosi (likimain) harhattomia todennäköisyysestimaatteja tuntemattomille muuttujille. Erona perinteisiin koneoppimismenetelmiin tekniikalla pyritään siis optimoimaan tiedon ilmaisua. Ongelman asettelussa pyritään löytämään optimaalinen kieli L, jolla alkuperäisen datan voidaan ilmaista uudelleen muodossa, jossa esityksen muuttujien välinen redundanssi on minimoitu ja seurauksena tieto datan säännönmukaisuuksista tallentuu kielen rakenteeseen. Kieltä muodostettaessa alkuperäiseen tiedon esitykseen lisätään yksitellen uusia pari-ilmaisumuuttujia ilmaisemaan yllättävän yleisiä tilapareja. Ilmaisujen lisäyksien jälkeen käytetään muuttujien vähennystekniikkaa, jolla eliminoidaan ilmaisumuuttujan järjestelmään tuoma redundanssi. Tekniikan ominaisuuksia tarkasteltiin ajatusleikillä, missä pohjaoletuksena tehty riippumattomuusoletus rinnastettiin klassiseen analyysiin (jossa ongelma jaetaan osaongelmiin) ja uudelleenilmaisu rinnastetaan klassiseen synteesiin (jossa ratkaisu kootaan osaratkaisuista). Menetelmässä "synteesi" on kohdistettu säännöllisiin osajärjestelmiin, minkä katsottiin vähentävän approksimointivirhettä jopa ideaalisen pienellä monimutkaisuuden ja opetusvirheen hinnalla. Tekniikan suorituskykyä arvioitiin opettamalla sille seitsemän otetta 1995 Statlog tutkimuksesta ja vertaamalla tuloksia 22 koneoppijan julkisia tuloksia vastaan. Testauksessa pari-ilmaisu tuotti ylivertaisia tuloksia datalle, joka koostui pääasiassa diskreeteistä muuttujista ollen ensimmäinen tai toinen neljälle otteelle, mutta puhtaasti numeerisilla otteilla tulokset olivat keskinkertaisia. Tulokset tulkittiin erittäin hyviksi ja lupaaviksi, erityisesti koska itse algoritmeissa on vielä paljon kehitettävää. Erityisesti on mainittava, että ennustettavia arvoja ei otettu uudelleenilmaisuun mukaan johtuen ongelmista ennustamisalgoritmin kanssa, mikä rajoitti oppimiskykyä. Perustuen tutkimuksessa saatuun kokemukseen, uudelleen-ilmaisuun pohjautuva oppiminen vaikuttaa erittäin lupaavalta alueelta lisätutkimukselle.
ED:	2010-07-12

INSSI record number: 39897

« previous | next »

INSSI