Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 7.6
Affects Version/s: 7.3, 7.4, 7.5
Component/s: Storage
Labels:
None
Environment:

We are taking snapshots of running vm every night on a pool of 3 Xenservers 7.2 (fully patched) connected to 2 iscsi SR with multipath.

Team:
- xs-storage
Internal JIRA Reference:
XSI-75

Description

This bug report is related to ---~~XSO-837~~--- and ~~XSO-855~~. As I haven't received any constructive answer, I investigated the problem myself.

iSCSI SR volume group metadata are randomly being corrupted during backups (snapshots) on 3 different installations. Every time metadata were corrupted, I noticed that the last process which wrote the metadata was always 'vgs' on the slave nodes (in /etc/lvm/backup/VG_XenStorage-ead98b75-7449-80e9-a54d-1b0a0c9449ac) :

description = "Created *after* executing '/sbin/vgs VG_XenStorage-ead98b75-7449-80e9-a54d-1b0a0c9449ac'"

Vgs should not write metadata in normal situation, but it can do it if it detects some anomalies. As it is often called from the function '_checkVG' without any lock on the SR, it can corrupt metadata if it writes at the same time another lvm command is writing on another node.

There is an undocumented flag (not in the manual, but in the command line help) which prevents vgs from writing on the volume group : "--readonly".

Since I patched '/opt/xensource/sm/lvutil.py' with this patch :

184c184 
<         cmd_lvm([CMD_VGS, vgname]) 
--- 
>         cmd_lvm([CMD_VGS, "--readonly", vgname])

I had no more volume group metadata corruption!

It explains also why the corruption happens more frequently on pools with more than 2 nodes (more nodes you have, more risk you have a node is calling vgs at wrong time).

It think this bug exists in every version of Xenserver (even the latest).

Kind regards,
Nicolas Michaux

Attachments

Issue Links

duplicates

XSO-837 SR volume group corrupted with simultaneous snapshots

Done

XSO-855 iSCSI SR volume group corrupted during snapshots

Done

Activity

People

Assignee:: Unassigned

Reporter:: Nicolas Michaux

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2018-08-10 02:36

Updated:: 2018-09-01 15:35

Resolved:: 2018-08-14 15:49