You cannot back up an Elasticsearch cluster by copying its nodes’ data directories. Similarly, there are no officially supported methods for restoring data from a filesystem-level backup.
Attempting to recover a cluster using this method may result in operational failures, including reports of data corruption, missing files, or other inconsistencies. Sometimes, the restoration process might appear successful while silently omitting portions of the data.
A direct copy of the cluster’s nodes fails to deliver a consistent snapshot of the data at a specific moment. Due to Elasticsearch’s cluster-wide consistency requirements, this issue cannot be resolved by stopping nodes during the copying process or by using atomic filesystem-level snapshots. Instead, the integrated snapshot functionality that Elasticsearch provides must be used to ensure proper backups.
The primary method for backing up Elasticsearch is through the built-in Snapshot and Restore API [2], which provides:
- Point-in-time backups of indices, mappings, and cluster state.
- Incremental snapshots (only changed data is stored).
- Repository options, including shared filesystems, cloud storage, and plugins.
Snapshot and Restore
In Elasticsearch, a snapshot repository is a storage location for snapshots outside of the cluster. You must register a repository before you can take or restore snapshots. Elasticsearch supports several types of repositories, including local shared filesystem-based storage, HDFS, and various cloud options such as Azure, Google Cloud Storage, and Amazon S3.
To use the snapshot and restore feature, either through Kibana or the API, you must have the following permissions:
- Cluster privileges:
monitor
,manage_slm
,cluster:admin/snapshot/*
, andcluster:admin/repository
. - Index privileges:
all
on themonitor
index.
The manage_slm
privilege allows users to execute Snapshot Lifecycle Management tasks (see the Process section for more details), such as creating or modifying policies and starting or stopping SLM. The cluster:admin/snapshot/*
privilege permits users to take and delete snapshots of any index, irrespective of their access to a specific index. Finally, the cluster:admin/repository
privilege authorizes users to manage snapshot repositories.
Each snapshot repository is unique and independent; no data is shared among registered repositories. However, the precise location or bucket can be registered as a repository in multiple Elasticsearch clusters. In this scenario, only one cluster should have write access to the repository, while the others should register it as read-only.
When a cluster is upgraded, it can continue to use the same repository. However, if multiple clusters access the same repository, they should all operate on the same version of Elasticsearch. If a specific version of Elasticsearch modifies a repository, it might not work correctly with other versions. Additionally, you can recover from a failed upgrade using a snapshot taken before the upgrade, even if new snapshots were created during or after the upgrade.
Process
You can register and manage snapshot repositories in Kibana by using the Snapshot and Restore feature or the Snapshot Repository Management API from Elasticsearch [2].
Snapshot Lifecycle Management (SLM) can automate the snapshot process. It allows the creation of policies to manage regular backups for a cluster. These policies can also delete snapshots based on custom retention rules.
The overview of the snapshot and restore process is as follows:
-
Create a repository for storing the data, e.g., Azure repository using Kibana or the Snapshot Repository Management API. See [3] for a repository configuration example for Azure.
-
Schedule regular snapshots
- Define daily or more granular snapshots as needed.
- Automate creation and management using Elasticsearch’s Snapshot Lifecycle Management (SLM).
-
Optionally, you can create manual snapshots when needed:
1PUT /_snapshot/my_backup/snapshot_1 2{ 3 "indices": "index_1,index_2", 4 "ignore_unavailable": true, 5 "include_global_state": true 6}
Or you can trigger a SLM policy manually as well to create a snapshot:
1POST _slm/policy/nightly-snapshots/_execute
-
Use the following command to restore from a snapshot:
1POST /_snapshot/my_backup/snapshot_1/_restore 2{ 3 "indices": "index_1,index_2", 4 "rename_pattern": "index_(.+)", 5 "rename_replacement": "restored_index_$1" 6}
This will restore indices
index_1
andindex_2
from snapshot namedsnapshot_1
asrestored_index_1
andrestored_index_2
.
Additional Recommendations
-
Back up the Elasticsearch configuration files separately. Snapshots do not include data outside the selected Elasticsearch indices.
-
Regularly test the restoration from snapshots. This includes verifying that snapshots can be successfully restored. Additionally, the restoration process should be documented step-by-step using runbooks.
-
Monitor the status of snapshot creation, tracking both successes and failures, and set up alerts for failed snapshots.
-
Regularly check that all required indices are included in the snapshots.
-
Improve snapshot performance by adjusting snapshot throttling parameters and scheduling snapshots during low-traffic times.
Conclusions
Backing up an Elasticsearch cluster requires a systematic approach that employs the built-in Snapshot and Restore API. Relying on filesystem-level copies can lead to data corruption and inconsistencies. The snapshot feature provides dependable, incremental backups that capture indices, mappings, and the cluster state. Moreover, it offers flexibility with various storage repository options.
To ensure data integrity and disaster recovery readiness, you should implement Snapshot Lifecycle Management (SLM) for automated backups, enforce appropriate access permissions, and routinely verify snapshot restorations. Maintaining Elasticsearch configuration backups and monitoring snapshot health are crucial to a robust backup strategy.
By following these best practices, you can safeguard the data in your Elasticsearch clusters, and perform a restore in case of any failure.
References
- Snapshot and Restore: https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html
- Snapshot and Restore APIs: https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore-apis.html#snapshot-restore-repo-apis
- Repository Configuration - Azure repository: https://www.elastic.co/guide/en/elasticsearch/reference/current/repository-azure.html