Inverted index and Analyzers in ElasticSearch

Posted By :Rozi Ali |26th February 2021

 

ElasticSearch is an open-source search engine built on top of Apache Lucene, responsible for searching and indexing. It stores data in the form of documents. Hence, you don't need to provide a schema to store your data. 

However, internally ElasticSearch provides a schema called mapping to Lucene. This schema tells how to index data and what should be the data type. This mapping can be explicit or implicit.

In this blog, we will learn how ElasticSearch is able to process data very rapidly. 

 

Inverted Index

The inverted index is a data structure that supports a high-speed search for full texts. The inverted index is the reason behind this fast search that ElasticSearch provides. 

How does it work? let's understand this by a simple example:

 

suppose we insert two documents:

 

Document 1: "It is a beautiful day"

Document 2: "What a beautiful flower"

 

An inverted index of the above documents would look like:-

 

Terms Document Position Frequency

It 1 1 1

is 1 2 1

a 1, 2 3, 2 1

beautiful 1, 2 4, 3 1

day 1 5 1

what 2 1 1

flower 2 4 1

 

Using this type of data structure, it becomes very easy to perform searching for ElasticSearch.

ElasticSearch indexes all data in every field, and every indexed field has an optimized data structure.

 

ElasticSearch Analyzers

Analyzers are the algorithm that determines how a text field is transformed into terms in the inverted index. It first breaks the terms and then standardizes them. It is a three steps process:

 

Step 1: Character Filtering

It is a pre-process where the stream of characters is transformed by adding, removing, or updating characters.

 

Step 2: Tokenization

In this step, the stream of characters breaks down into terms, also known as tokens. For example, a stream can be tokenized by white space to generate individual works generated in output.

 

Step 3: Token Filters

In this final step, the tokens then filter and transformed into the given user standard.

 

The result of the analysis process is then put in the inverted index.

ElasticSearch Analyzers provides great support for improving search accuracy.


About Author

Rozi Ali

Rozi Ali is an accomplished software developer with extensive experience in the field of JAVA. She possesses a solid grasp of programming languages such as Java/Spring-boot, Python, and Typescript/Nodejs/GraphQL. Rozi has a strong background in Object-oriented programming (OOP) and is skilled in working with both relational databases like MySql, PostgreSQL and non-relational databases like MongoDb. She is proficient in REST APIs, Microservices, and code deployment, along with the development tools such as Jira, Git, and Bash. Additionally, Rozi has experience working with Cloud providers such as AWS and Azure. She has contributed significantly to a number of projects, including Konfer, VNS, Influsoft, VN Platform, QuickDialog, and Oodles-Dashboard.

Request For Proposal

[contact-form-7 404 "Not Found"]

Ready to innovate ? Let's get in touch

Chat With Us