Skip to main content

Introduction to Elasticsearch

Greetings!

Elasticsearch is a powerful tool that most companies use for various use cases. It is the best tool we can use to give users a seamless search experience. Before jumping in, let's try to understand it at a high level.
We know that our data is there in our databases, however, we are not able to retrieve them faster mostly due to the spreadsheet-like data structure. Elasticsearch is document-oriented, meaning it stores data as documents as it is. It not only saves documents but also makes them easy to searchable.

What is Elasticsearch

Elasticsearch is a distributed, free, and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured.
Elasticsearch is built on top of Apache Lucine but hides the complexity behind a simple, coherent, restful API. We can name a few use cases.
  • Search in websites
  • Logging and log analytics
  • Application performance monitoring
  • Geospatial data analysis and visualization
  • Security analytics

Why do we need it

As mentioned above, our data sit there in warehouses without a faster to way explore. Elasticsearch fulfills this requirement by making the data searchable. Even though the other databases also have the ability to do that, those will be harder, and slow in advanced search scenarios.

We are indexing

One of the main things we need to understand is that we index everything by default. Usually, the term index is used when saving data. Usually, relational databases create an index for the primary key column and columns we specify. By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure.

Inverted index

Elasticsearch uses the inverted index data structure to index documents. An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in. (Inverted Index)

Easy to start

It would be fairly easy to start exploring it. There is a free cloud offering that we can use for practicing indexing. If like, there is a Docker container as well. However, it would not easy to design complex indices and complex queries. It takes time and effort to master it.

Index API

Unlike other database systems, Elasticsearch provides a REST API to interact with data. Initially, it feels odd but it is nicer as we can use any of the rest clients. (Index API)

Common search architecture

In most modern web and mobile applications we need searches. It is mostly handled by Elasticsearch to achieve desired performance with complex searches. Ideally, we would have the main database but for searches, we will index the data into Elasticsearch. These will vary on the business model.
  • Ingest through a queue like Kafka
  • Ingest the difference with a time interval
  • Ingest everything every day
This is a very quick and short introduction to Elasticsearch. Let's explore more in the coming articles.

Happy learning ☺


Comments