Elastic Index Lifecycle Management

Elastic Stack is quite capable of running blazing fast queries against your data.
However, the more data you have, the more time it will take to query it.
Most of the times, however, Elastic Stack will not be the mechanism you use for long time data retention. Consequently you will have to get rid of the old data.
Let's find out how.

Brief overview of steps required

Pick a naming convention
Create index lifecycle management policy
Create an index template that connects the alias and the ilm policy
Create an index
Associate an alias with the index

Let's dig into the details.

Pick a naming convention

Depending on the size of your company, your topology, team and org structure, you will probably have a Elastic Stack deployment that is shared between several teams and used by multiple services (distributed tracing benefits).

Namespaces is a useful notion that we can leverage while naming ELK objects, even though the notion itself is not directly supported. The purpose is, like always provide a certain convention and structure that would reflect the purpose and usage of the object, and partition different objects, so similar concepts are easy to "lump together" and different - are easy to distinguish.

The key thing here is to

understanding how will you be querying the data, and
creating a naming convention that would be easy to describe with an index pattern in Kibana

For example, one could choose names that follow such patterns

{applicationName}-{environment}-{source}-{objectType}, or
{environment}-{source}-{applicationName}

Where

Application Name - could be either a specific service, or a set of related micro services that deliver one business feature
Environment - the environment where the services are deployed (Dev, Test, Pre-Prod, Prod)
Source - the source of data, for example: logs, metrics, events, etc.
[Optionally] Object type - is the type of ELK object: index, alias, ILM, template, etc.

Create and Index Life-cycle Management Policy

During it's lifetime an index in Elastic can pass through several stages.
The purpose of a policy is to automate the transitions between those stages based on some criteria.
Managing Index Lifecycle gives a pretty good description.
Here is how we could create an ILM for an index, that would create a separate index every 2 days, or once an index exceeds 1GB and delete indexes that are older than 10 days

PUT _ilm/policy/myapplicaton-test-logs-ilm  
{
  "policy": {                       
    "phases": {
      "hot": {                      
        "actions": {
          "rollover": {             
            "max_size": "1GB",
            "max_age": "2d"
          }
        }
      },
      "delete": {
        "min_age": "10d",           
        "actions": {
          "delete": {}              
        }
      }
    }
  }
}

Create an Index Template

Index templates allow you to set specific settings and mappings to newly created indexes that match a certain index pattern.
Well, that's exactly what we need! For every index, created by a roll-over policy, we want to apply the same roll over policy. Here is how:

PUT _template/myapplicaton-test-logs-template
{
  "index_patterns":["myapplicaton-test-logs-index*"],
  "settings" : {
      "index" : {
        "lifecycle" : {
          "name" : "myapplicaton-test-logs-ilm",
          "rollover_alias" : "myapplicaton-test-logs-alias"
                      }
                }
               }
}

Basically, for every new index that matches the index_pattern by name, we will apply the settings that contain the lifecycle policy, and set a rollover alias.

Create an Index

The simplest way to enable roll-over in a life cycle policy is to add a numeric suffix to the index, so ILM policy can automatically increment it, for example myIndexName-000001
Something like this

PUT myapplicaton-test-logs-index-000001

(index name must be lowercase)
Of course, this is just an example, for production use you should think about the mappings and settings like number of shards and replicas.

Associate an alias with the index

An alias allows us to have a stable fixed name that could be pointing to several indexes under the hood, where those indexes could actually change over time.
Only one of those indexes would be a write index, others would only be read from.
The fact that the alias is stable and does not change, allows us to put it into our application config, and forget about it. Nobody wants to increment elastic log index name in application configs manually!
The fact that alias points to multiple indexes that can change over time us to read from it, and do not worry about roll-over and automatic creation of indexes.
Imagine we would like to keep 10 days of data, and each day is several GB of data. We could roll over every day, so at any moment it time we would have last 10 indexes, and we can read all that data using just one alias! Without worrying what is the current index suffix. Awesome feature!

POST /_aliases
{
    "actions" : [
        {
            "add" : {
                 "index" : "myapplicaton-test-logs-index-000001",
                 "alias" : "myapplicaton-test-logs-alias",
                 "is_write_index" : true
            }
        }
    ]
}

Engineering, Design and Architecture

Search This Blog