search query: @keyword ontologies / total: 10
reference: 4 / 10
« previous | next »
Author:Nyberg, Katariina
Title:Document classification using machine learning and ontologies
Asiakirjojen luokittelu koneoppimista ja ontologioita käyttäen
Publication type:Master's thesis
Publication year:2011
Pages:[9] + 71 s.      Language:   eng
Department/School:Mediatekniikan laitos
Main subject:Viestintätekniikka   (T-75)
Supervisor:Hyvönen, Eero
Instructor:Hyvönen, Eero
OEVS:
Electronic archive copy is available via Aalto Thesis Database.
Instructions

Reading digital theses in the closed network of the Aalto University Harald Herlin Learning Centre

In the closed network of Learning Centre you can read digital and digitized theses not available in the open network.

The Learning Centre contact details and opening hours: https://learningcentre.aalto.fi/en/harald-herlin-learning-centre/

You can read theses on the Learning Centre customer computers, which are available on all floors.

Logging on to the customer computers

  • Aalto University staff members log on to the customer computer using the Aalto username and password.
  • Other customers log on using a shared username and password.

Opening a thesis

  • On the desktop of the customer computers, you will find an icon titled:

    Aalto Thesis Database

  • Click on the icon to search for and open the thesis you are looking for from Aaltodoc database. You can find the thesis file by clicking the link on the OEV or OEVS field.

Reading the thesis

  • You can either print the thesis or read it on the customer computer screen.
  • You cannot save the thesis file on a flash drive or email it.
  • You cannot copy text or images from the file.
  • You cannot edit the file.

Printing the thesis

  • You can print the thesis for your personal study or research use.
  • Aalto University students and staff members may print black-and-white prints on the PrintingPoint devices when using the computer with personal Aalto username and password. Color printing is possible using the printer u90203-psc3, which is located near the customer service. Color printing is subject to a charge to Aalto University students and staff members.
  • Other customers can use the printer u90203-psc3. All printing is subject to a charge to non-University members.
Location:P1 Ark Aalto     | Archive
Keywords:document classification
ontologies
syntactical analysis
machine learning
logistic discriminant
bag of words
YSO
asiakirjojen luokittelu
ontologiat
kieliopillinen analyysi
koneoppiminen
logistinen diskriminantti
YSO
Abstract (eng): This master's thesis explores a way in which documents can be automatically classified based on their contents.
Automatic classification of data is one of the main applications of machine learning.
With the help of already classified data a model for the most likely class can be learned.

Whether adding background knowledge from ontologies can be added to the model in order to improve the classification accuracy, is also explored in this master's thesis.
A new machine learning model is introduced that incorporates ontology information.

The proposed method for learning a classification model and enhancing it with ontology information is used in a case study for the Finnish National Archives and a set of digital documents that have been manually classified.

An RDF schema for representing documents, sentences and words is created in order to prepare the data for the machine learning analysis.
The words are put into base form and matched semi-automatically with concepts of the General Finnish Ontology YSO.
Then the ontology enhanced model is applied on the data and the most likely classes for documents are learned.

The master's thesis shows that the classification accuracy of the model increases when ontology information is added to it.
Abstract (fin): Tässä diplomityössä tutkitaan asiakirjojen automaattista luokittelua niiden sisällön pohjalta.
Tiedon automaattinen luokittelu on yksi koneoppimisen keskeisiä aihepiirejä.
Oppivasta luokittimesta luodaan malli jo valmiiksi luokitetulla esimerkkidatalla.

Tehtävänä on kokeilla ontologisen taustatiedon hyödyntämistä oppivassa luokittimessa ja selvittää parantaako taustatiedon lisääminen mallin luokittelutarkkuutta.
Diplomityö esittelee uuden oppivan luokittimen, joka sisällyttää ontologiatiedon analyysiinsa.
Luokitinta testataan Suomen Kansallisarkiston sähköisillä asiakirjoilla, jotka ovat kasin luokiteltuja.

Asiakirjojen ja niiden sisältämien lauseiden sekä sanojen esittämistä varten diplomityössä on kehitetty RDF skeema, jota käyttäen sanat voidaan muuttaa perusmuotoon ja yhdistää puoliautomaattisesti Yleisen suomalaisen ontologian käsitteisiin.
Skeemaa hyödynnetään datan valmisteluun oppivan luokittimen analyysia varten.

Diplomityössä on osoitettu, että luokittelutarkkuus paranee, kun oppivaan luokittimeen lisätään ontologiatietoa.
ED:2011-03-07
INSSI record number: 41554
+ add basket
« previous | next »
INSSI