Elastic Stack is quite capable of running blazing fast queries against your data.
However, the more data you have, the more time it will take to query it.
Most of the times, however, Elastic Stack will not be the mechanism you use for long time data retention. Consequently you will have to get rid of the old data.
Let's find out how.
However, the more data you have, the more time it will take to query it.
Most of the times, however, Elastic Stack will not be the mechanism you use for long time data retention. Consequently you will have to get rid of the old data.
Let's find out how.
Brief overview of steps required
- Pick a naming convention
- Create index lifecycle management policy
- Create an index template that connects the alias and the ilm policy
- Create an index
- Associate an alias with the index
Let's dig into the details.
Pick a naming convention
Depending on the size of your company, your topology, team and org structure, you will probably have a Elastic Stack deployment that is shared between several teams and used by multiple services (distributed tracing benefits).
Namespaces is a useful notion that we can leverage while naming ELK objects, even though the notion itself is not directly supported. The purpose is, like always provide a certain convention and structure that would reflect the purpose and usage of the object, and partition different objects, so similar concepts are easy to "lump together" and different - are easy to distinguish.
The key thing here is to
- understanding how will you be querying the data, and
- creating a naming convention that would be easy to describe with an index pattern in Kibana
For example, one could choose names that follow such patterns
- {applicationName}-{environment}-{source}-{objectType}, or
- {environment}-{source}-{applicationName}
Where
- Application Name - could be either a specific service, or a set of related micro services that deliver one business feature
- Environment - the environment where the services are deployed (Dev, Test, Pre-Prod, Prod)
- Source - the source of data, for example: logs, metrics, events, etc.
- [Optionally] Object type - is the type of ELK object: index, alias, ILM, template, etc.
Create and Index Life-cycle Management Policy
The purpose of a policy is to automate the transitions between those stages based on some criteria.
Managing Index Lifecycle gives a pretty good description.
Here is how we could create an ILM for an index, that would create a separate index every 2 days, or once an index exceeds 1GB and delete indexes that are older than 10 days
PUT _ilm/policy/myapplicaton-test-logs-ilm
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "1GB",
"max_age": "2d"
}
}
},
"delete": {
"min_age": "10d",
"actions": {
"delete": {}
}
}
}
}
}
Create an Index Template
Index templates allow you to set specific settings and mappings to newly created indexes that match a certain index pattern.Well, that's exactly what we need! For every index, created by a roll-over policy, we want to apply the same roll over policy. Here is how:
PUT _template/myapplicaton-test-logs-template
{
"index_patterns":["myapplicaton-test-logs-index*"],
"settings" : {
"index" : {
"lifecycle" : {
"name" : "myapplicaton-test-logs-ilm",
"rollover_alias" : "myapplicaton-test-logs-alias"
}
}
}
}
Create an Index
The simplest way to enable roll-over in a life cycle policy is to add a numeric suffix to the index, so ILM policy can automatically increment it, for example myIndexName-000001Something like this
PUT myapplicaton-test-logs-index-000001
Of course, this is just an example, for production use you should think about the mappings and settings like number of shards and replicas.
Associate an alias with the index
An alias allows us to have a stable fixed name that could be pointing to several indexes under the hood, where those indexes could actually change over time.Only one of those indexes would be a write index, others would only be read from.
The fact that the alias is stable and does not change, allows us to put it into our application config, and forget about it. Nobody wants to increment elastic log index name in application configs manually!
The fact that alias points to multiple indexes that can change over time us to read from it, and do not worry about roll-over and automatic creation of indexes.
Imagine we would like to keep 10 days of data, and each day is several GB of data. We could roll over every day, so at any moment it time we would have last 10 indexes, and we can read all that data using just one alias! Without worrying what is the current index suffix. Awesome feature!
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "myapplicaton-test-logs-index-000001",
"alias" : "myapplicaton-test-logs-alias",
"is_write_index" : true
}
}
]
}
Comments
Post a Comment