CUSTOMER TIPS

NFS performance troubleshooting

Description

This Knowledge Base lists various common reasons an NFS connector (sfused) is slow, and explains how to remediate a situation where NFS clients encounter read or write operation slowness.

Slow NFS because of too many commits

If NFS is slow, watch the occurrence of commits by running fsops as root on the NFS connector:

# fsops -f /run/scality/connectors/sfused/misc/nfs_stats -c "nfs_write nfs_commit"
reading stats from /run/scality/connectors/sfused/misc/nfs_stats
| count
nfs_write nfs_commit
07:25:53.313469 300 280
07:25:54.321811 200 170

Having nfs_write and nfs_commit almost equal means the NFS client is sending too many commits per write operation. It can come from:

  • Memory exhaustion on the NFS client,
  • A bad NFS client configuration,
  • A bad client program configuration.

If the NFS client’s system is out of memory, make sure to gain free space and umount/remount the NFS share.

Slow NFS because of a too big local DB

With versions prior to 6.4.6/7.4.1/8.0.0 , an NFS connector can be slowed down by an internal database:

# ls -l /dev/shm/sfused/nfs_verifiers_dev-1.db
-rw------- 1 root root 46092288 May 7 09:56 nfs_verifiers_dev-0.db

You need to:
1.Stop the scality-sfused service.

systemctl stop scality-sfused

2. remove the /dev/shm/sfused/nfs_verifiers_dev-1.db file.
3. Start the scality-sfused service.

Scale-out on quota keys

If you have quotas enabled and multiple connectors, it is normal to see the following error. Each connector updates the quotas key (same keys) so there is scale out..the connector retries and this is why you see the error:

Jan 17 02:01:30 server sfused: 17274: [info] trace[0x1]:
modules/nasdk/src/libscal/cache/src/cache.c, scal_cache_entry_update:2019: [cache:1] update: upload failed: SCAL_RING_EUNSURE_PUT (-21)
Jan 17 02:01:30 server sfused: 17274: [info] trace[0x1]:
modules/nasdk/src/libscal/quota/src/quota.c, obj_update:398: failed to update object 1D42B90000000000000000000000010D00000040 err=SCAL_CACHE_EUNSURE_PUT(-6)
Jan 17 03:03:12 server sfused: 17274: [info] trace[0x1]:
modules/nasdk/src/libscal/chord/src/chord_put.c, scal_chord_put_consistent_ext:304: [id=1D42B90000000000000000000000010D00000040]: put rsv replica 1 failed:
SCAL_RING_RSVPENDINGREAD (-4)
Jan 17 03:03:12 server sfused: 17274: [info] trace[0x1]:
modules/nasdk/src/libscal/cache/src/cache.c, scal_cache_entry_update:2019: [cache:1] update: upload failed: SCAL_RING_EUNSURE_PUT (-21)
Jan 17 03:03:12 server sfused: 17274: [info] trace[0x1]:
modules/nasdk/src/libscal/quota/src/quota.c, obj_update:398: failed to update object 1D42B90000000000000000000000010D00000040 err=SCAL_CACHE_EUNSURE_PUT(-6)

Not enough workers

Sometimes with heavy loads it is possible to run out of workers, in which case you will see the following errors :

Jan 12 03:32:07 server sfused: 10202: [info] trace[0x2] reqid=000018510003E3EE:
modules/sfused/src/src/nfs/nfs_proc.c, nfsproc3_create_3_svc:885: NFS3: CREATE: WARNING: EXCLUSIVE mode not supported. Defaulting to GUARDED without attributes setting.
Jan 12 06:09:03 server sfused: 8768: [info] trace[0x2] reqid=00007E7B00105741:
modules/nasdk/src/libscal/std/src/task.c, scal_task_pool_put_wait_congestion:377: pool chord/queue[20] congestion reached [n_tasks=45]:more than 3 per workers [n_workers=15]
Jan 12 06:09:03 server sfused: 8768: [info] trace[0x2] reqid=00007E7B00105741:
modules/nasdk/src/libscal/std/src/task.c, scal_task_pool_put_wait_congestion:377: pool
chord/queue[29] congestion reached [n_tasks=60]:more than 3 per workers [n_workers=15] Jan 12 06:09:03 server sfused: 9127: [info] trace[0x2]:
modules/nasdk/src/libscal/std/src/task.c, worker_main:229: pool chord end of congestion: tasks/worker is less than 3
Jan 12 06:09:03 server sfused: 9429: [info] trace[0x2]:
modules/nasdk/src/libscal/std/src/task.c, worker_main:229: pool chord end of congestion: tasks/worker is less than 3

Follow this procedure to raise the number of workers:
1. Edit the /etc/sfused.conf file and raise the n_workers field in the “general” section:

"general": {
[...]
"n_workers": 600,
[...]
}

In the example above, the n_workers configuration value was raised from 500 to 600.
2. Restart the scality-sfused service.

systemctl restart scality-sfused

If this does not solve your issues please get in touch with support

SAN FRANCISCO, USA

149 New Montgomery Street, Suite 607
San Francisco, CA, 94105
Email: sales.us@scality.com
Telephone: +1 (650) 356-8500
Fax: +1 (650) 356-8501
Toll Free: +1 (855) 722-5489


PARIS, FRANCE

11 rue Tronchet
75008 Paris, France
Email: sales.eu@scality.com
Telephone: +33 1 78 09 82 70


WASHINGTON, D.C., USA

43777 Central Station Drive, Suite 410
Ashburn, VA 20147, USA
Email: sales.us@scality.com
Toll Free: +1 (855) 722-5489

TOKYO, JAPAN

Otemachi Bldg. 4F, 1-6-1, Otemachi
Chiyoda-ku Tokyo, 100-0004 Japan
Email: sales.japan@scality.com
Telephone: +81-3-4405-5400


LONDON, UNITED KINGDOM

20 St Dunstans Hill
London, United Kingdom, EC3R 8HL
Email: sales.eu@scality.com
Telephone: +44 203 795 2434

Products

The most powerful data storage platform.

Protect, search and manage your data on any cloud.

RING

ON PREMISES, PRIVATE CLOUD STORAGE SOLUTION

ZENKO

DATA MANAGEMENT ACROSS
MULTIPLE CLOUDS

NAS ARCHIVER

TIER INACTIVE DATA FROM NAS TO SCALITY RING

Hardware Alliances

Delivering fully integrated solutions.

Learn More

Read about Scality data storage and management solutions.