Tekijä:Zhao, Yang
Työn nimi:Anomaly Detection from Patient Visit Data
Sivut:50 s. + liitt. 4      Kieli:   eng
Koulu/Laitos/Osasto:Perustieteiden korkeakoulu
Oppiaine:Machine Learning and Data Mining   (SCI3044)
Valvoja:Rousu, Juho
Ohjaaja:Hollmén, Jaakko
Avainsanat:sequence data
generative Markov models
duration modelling
Poisson distribution
negative binomial distribution
Tiivistelmä (eng):Hospital operation cost rises due to the growing demand for outpatient services by increasing elderly population.
To reduce the operation cost and serve the patients better, improvements on the e ciency in healthcare service institutes are required.
Among several potential aspects of e ciency improvements, smoother patient visits are highly desired.
Thanks to the digital era, patient visits to the hospital can be recorded with all details.
The Oulu Hospital in Finland starts to gather patient visits data since 2011, using queue system provided by X-Akseli company.
Utilizing these collected data, this thesis aims at designing a practical way of detecting anomalies from patient visits.
With the help from this system, the hospital administrative sta↵ could analyse the performance of the queue procedure in the hospital and optimize the procedure.
Even better, the system can identify anomalies in real-time so that the patient can get immediate help when it is needed.

The thesis explored two categories of methods: clustering methods and generative methods.
Four candidate algorithms, K-Means, DBSCAN, Markov Chain, and Hidden Markov Model, are discussed.
The discussion suggests that DBSCAN and Hidden Markov Model are more practical.
Then we proposed a new data representation and used negative binomial distribution in Hidden Markov Model to model patient states durations.
The experiment result was visualized using t- SNE and evaluated by user interpretation.
The analyses show that both DBSCAN and Hidden Markov Model can e↵ectively detect anomalies from patient visits data.
But in terms of time and space complexity, and real-time detection, Hidden Markov Model is a better choice.
