Feb 162014
 

Edit: This post is pretty old and Elasticsearch/Logstash/Kibana have evolved a lot since it was written.

Part 4 of 4 – Part 1Part 2Part 3

Now that you’ve got all your logs flying through logstash into elasticsearch, how to remove old records that are no longer doing anything but consuming space and ram for the index?

These are all functions of elasticsearch. Deleting is pretty easy, as is closing an index.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-delete-index.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-open-close.html

The awesome people working on elasticsearch already have the solution! It’s called curator.
https://github.com/elasticsearch/curator
https://logstash.jira.com/browse/LOGSTASH-211

Advertisement:

I like the idea of being able to let a cron job kick off the cleanup so I don’t forget.

To install, we’ll have to instal pip.

sudo apt-get install python-pip

Then use pip to install elasticsearch-curator

pip install elasticsearch-curator

When making a cron job, I always use full paths

which curator
/usr/local/bin/curator

edit the crontab. Any user should have access so I’ll run this under my user.

crontab -e

Add the following line to run curator at 20 minutes past midnight (system time) and connect to the elasticsearch node on 127.0.0.1 and delete all indexes older than 120 days and close all indexes older than 90 days.

20 0 * * * /usr/local/bin/curator --host 127.0.0.1 -d 120 -c 90

If you prefer an alternative, here’s one written in perl.
https://github.com/bloonix/logstash-delete-index

  8 Responses to “Removing Old Records for Logstash / Elasticsearch / Kibana”

  1. Good article, helped me alot! Thank you for that!

    I suggest using long parameters for cron jobs. Increases the readability. Also maybe mention the –dry-run argument before executing the cronjob 🙂

  2. Hi, due to alot of log data that comes into my small server I would like to delete the log data that is older than 1 hour. How can I use the above crontab line to delete by hour instead of by day?

  3. Starting with curator 1.1.0 (2014/07/13) its command line syntax has changed. To delete data older than 90 days you’d use
    curator delete –older-than 90

    See: http://www.elasticsearch.org/blog/elasticsearch-curator-version-1-1-0-released/

  4. You can filter out old records with a filter as well. See this example:

    https://github.com/coolacid/GettingStartedWithELK/blob/master/Snippets/Date/drop-old-timestamps.txt

    I had an issue where syslog would see some *very* old timestamps on some machines, creating a ton of useless ES indices.

  5. Curator is in constant development. See the official documentation at https://www.elastic.co/guide/en/elasticsearch/client/curator/current/index.html

  6. You can also set curator to wait to prune indices until the disk is full to a certain size. Example:

    */15 * * * * root curator --loglevel CRITICAL delete --disk-space 480 indices --all-indices

    To prune disk usage to 480GB

  7. Thanks, very helpful. It might be worth adding that above commands work for 3.5 branch of curator, it didn’t work for me until I specifically went for 3.5

    pip install elasticsearch-curator==3.5.0

  8. Doesn’t work with 3.5.0 either. OP would/could you possibly put what version you have?

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)