CUSTOMER TIPS

How to Repair an Elasticsearch Cluster

Note: For the purposes of this article, node refers to Elasticsearch node and not to storage node or bizstorenode.

Description

A broken Elasticsearch cluster can have multiple variants:

The stats look wrong

The supervisor GUI reports it is unable to connect to Elasticsearch

Several issues can simultaneously emerge:

  • Elasticsearch may be stopped on some nodes.
  • Several shards may be unassigned.
  • Several shards may be assigned but unallocated.

Investigation Steps

Check overall Elasticsearch status

salt '*' cmd.run 'service elasticsearch status'

Check Elasticsearch cluster consistency

To prevent split-brain situations, the minimum number of Elasticsearch nodes set should correlate with the total number of nodes.
Gather the minimum number of nodes setting on a node:

salt '*' cmd.run 'grep discovery.zen.minimum_master_nodes /etc/elasticsearch/elasticsearch.yml'

The value should be identical on all members of the cluster.

For a given number (N) of Elasticsearch cluster nodes, it should be set to: N/2 +1. For example, for a 5 node cluster, the minimum number of nodes should be set to 3 to prevent split brain.

Important: Make sure to restart the Elasticsearch service whenever an update is made to this setting.

Check cluster health

curl -Ls http://SUP:4443/api/v0.1/es_proxy/_cluster/health?pretty=true
or
curl http://NODE:9200/_cluster/health?pretty=true
Expected Output:
{ "cluster_name" : "Scality",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 6,
"active_primary_shards" : 36,
"active_shards" : 72,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

Note: Another means of checking cluster health is to go to http://NODE:9200/_plugin/kopf/ and note the color of the bar at the top.

If the cluster status is red

Compare the “number_of_data_nodes” value with the number of Elasticsearch nodes that should be running (N, computed earlier). If “number_of_data_nodes” < N, list the nodes in the cluster.

curl -Ls http://SUP:4443/api/v0.1/es_proxy/_cat/nodes

or

curl http://NODE:9200/_cat/nodes

1. Check which nodes are missing. For each missing node:

  • SSH to the host system.
  • Check /var/log/elasticsearch/Scality.log to learn why the ES node failed.
    1. If there are OutOfMemory errors, it is necessary to raise Elasticsearch’s heap size (refer to Raise
  • Elasticsearch Heap Size below).
  • 2. For any other issues, contact GS.
  • Kill and restart the Elasticsearch service.

2. Once all nodes are back to the Elasticsearch cluster, re-check the cluster’s health. If it is still not green:

  • If “relocating_shards” is greater than 0, wait for shard relocation to finish.
  • if “relocating_shards” is equal to 0 and “unassigned_shards” is greater than 0, follow the procedure below.

Use the autofix_es_cluster.sh script to force the nodes to assign and allocate the remaining shards.

#!/bin/bash
NODE=$1
IFS=$''
for line in $(curl -s 'localhost:9200/_cat/shards' | fgrep UNASSIGNED); do
INDEX=$(echo $line | (awk '{print $1}'))
SHARD=$(echo $line | (awk '{print $2}'))
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands": [
{
"allocate": {
"index": "'$INDEX'"
, "shard": '$SHARD',
"node": "'$NODE'",
"allow_primary": true
}
}
]
}'
done

3. The script must be run on one of the ES hosts (it does not matter which).
Depending on how well DNS is configured, its first and only argument will be:

  • the host’s FQDN;
  • the host’s short hostname;
  • the host’s IP address.

4. Ignore the output (unless it complains about name resolution). Kopf should reflect a real-time decrease in the number of unassigned shards.
5. When the autofix_es_cluster.sh script has finished, recheck to confirm that the cluster health is green.

Raise Elasticsearch heap size

Raise the memory footprint of Elasticsearch nodes from 4 GB to 8 GB, in order for the nodes to work properly when hosting a large number of indexes.

  1. SSH as root on the Supervisor.
  2. Modify the /srv/scality/pillar/.sls file, which contains pillars common to all servers hosting Elasticsearch (e.g., scality-common.sls).
  3. Add:
elasticsearch:
default:
ES_HEAP_SIZE: 8g

4. Apply the new configuration.

salt -G 'roles:ROLE_ELASTIC' state.sls scality.elasticsearch

5. Restart the Elasticsearch service in all storage nodes that host Elasticsearch.

Downsample old indexes

If the Elasticsearch cluster has been red a long time, it is likely that there are old indexes waiting to be integrated in their respective *-archive indexes.

Manually run downsampling:

scality-stats-downsampling --config /etc/scality-stats-downsampling.yaml --debug --backtrace

Once downsampling is complete, check that there are no indices older than 2 days.

curl http://SUP:4443/api/v0.1/es_proxy/_cat/indices

OR

curl http://NODE:9200/_cat/indices

If this does not solve your issues please get in touch with support

SAN FRANCISCO, USA

149 New Montgomery Street, Suite 607
San Francisco, CA, 94105
Email: sales.us@scality.com
Telephone: +1 (650) 356-8500
Fax: +1 (650) 356-8501
Toll Free: +1 (855) 722-5489


PARIS, FRANCE

11 rue Tronchet
75008 Paris, France
Email: sales.eu@scality.com
Telephone: +33 1 78 09 82 70


WASHINGTON, D.C., USA

43777 Central Station Drive, Suite 410
Ashburn, VA 20147, USA
Email: sales.us@scality.com
Toll Free: +1 (855) 722-5489

TOKYO, JAPAN

Otemachi Bldg. 4F, 1-6-1, Otemachi
Chiyoda-ku Tokyo, 100-0004 Japan
Email: sales.japan@scality.com
Telephone: +81-3-4405-5400


LONDON, UNITED KINGDOM

20 St Dunstans Hill
London, United Kingdom, EC3R 8HL
Email: sales.eu@scality.com
Telephone: +44 203 795 2434

Products

The most powerful data storage platform.

Protect, search and manage your data on any cloud.

RING

ON PREMISES, PRIVATE CLOUD STORAGE SOLUTION

ZENKO

DATA MANAGEMENT ACROSS
MULTIPLE CLOUDS

NAS ARCHIVER

TIER INACTIVE DATA FROM NAS TO SCALITY RING

Hardware Alliances

Delivering fully integrated solutions.

Learn More

Read about Scality data storage and management solutions.