search query: @keyword scalability / total: 30
reference: 3 / 30
« previous | next »
Author:Canellas, Jorge
Title:Full-text search engines: Analysis and bencmarking of distributed text-search solutions
Publication type:Final Project work
Publication year:2014
Pages:64      Language:   eng
Department/School:Perustieteiden korkeakoulu
Main subject:Tietokoneverkot   (T-110)
Supervisor:Heljanko, Keijo
Instructor:Fabra Caro, Francisco Javier
OEVS:
Electronic archive copy is available via Aalto Thesis Database.
Instructions

Reading digital theses in the closed network of the Aalto University Harald Herlin Learning Centre

In the closed network of Learning Centre you can read digital and digitized theses not available in the open network.

The Learning Centre contact details and opening hours: https://learningcentre.aalto.fi/en/harald-herlin-learning-centre/

You can read theses on the Learning Centre customer computers, which are available on all floors.

Logging on to the customer computers

  • Aalto University staff members log on to the customer computer using the Aalto username and password.
  • Other customers log on using a shared username and password.

Opening a thesis

  • On the desktop of the customer computers, you will find an icon titled:

    Aalto Thesis Database

  • Click on the icon to search for and open the thesis you are looking for from Aaltodoc database. You can find the thesis file by clicking the link on the OEV or OEVS field.

Reading the thesis

  • You can either print the thesis or read it on the customer computer screen.
  • You cannot save the thesis file on a flash drive or email it.
  • You cannot copy text or images from the file.
  • You cannot edit the file.

Printing the thesis

  • You can print the thesis for your personal study or research use.
  • Aalto University students and staff members may print black-and-white prints on the PrintingPoint devices when using the computer with personal Aalto username and password. Color printing is possible using the printer u90203-psc3, which is located near the customer service. Color printing is subject to a charge to Aalto University students and staff members.
  • Other customers can use the printer u90203-psc3. All printing is subject to a charge to non-University members.
Location:P1 Ark Aalto  1772   | Archive
Keywords:full-text search engines
distributed systems
scalability
Cloudera
Solr
SolrCloud
elastic search
Lucene
HDFS
Abstract (eng): The amount of available data has increased notably in the last few years, exposing scalability problems of storage systems.
Traditional clusters built with expensive storage solutions have proven not to be a feasible solution.
The amount of investment needed to build and expand such clusters is not affordable by many companies.
Commodity hardware is much cheaper but fails more often.
Fault tolerance has been passed to the application layer, which allows building larger clusters with less investment thus leading to more powerful systems.
However the fault tolerance mechanisms have to be taken into account when designing the application.

The most common mechanisms used when implementing data storage applications is replication.
Creating several copies of the same data ensures that the data is still available if there is at least one replica alive.
On the other hand, replication introduces new problems.
Managing replicas can be complicated when modifying existing data.
It is important to make sure that all the replicas store the same version of the data.

Searching in huge amounts of data requires new approaches since non-distributed text search engines are not able to return relevant documents in a reduced amount of time.
Scaling a text search engine requires that the storage capabilities of the cluster can be increased horizontally and that the response time does not increase drastically as the number of computers increases.

The purpose of this work is to analyse two different full-text search engines, Elastic search and Cloud era's distribution of SolrCloud.
Both text search engines use Lucene, a search library written in Java, under the hood to build a text search engine.
However, they manage data distribution and scaling in different manners.
We have prepared benchmarks to visualize how do they behave with different setups and how does the number of available nodes influence in their search and indexing performance.
ED:2014-06-30
INSSI record number: 49350
+ add basket
« previous | next »
INSSI