Status: Backlog (View Workflow)
Affects Version/s: 7.3, 7.4, 7.5
Fix Version/s: None
We are taking snapshots of running vm every night on a pool of 3 Xenservers 7.2 (fully patched) connected to 2 iscsi SR with multipath.
Since I resolved lvm metadata corruption in previous issue (
XSO-883), I noticed another problem during nighly backups : sometimes (happens randomly), a vdi can't be activated with this error :
The volume group has no 'Checksum error' and if I launch again the backup some minutes later, it works fine...
I investigated this problem and I found that the commands 'vdi_activate' and 'vdi_deactivate' don't lock the SR while they are running on a LVM based SR (they are locked on a file based SR). I think these commands should lock the SR as they are reading metadata which could be being written by another process on another node (I think this is what happened to my backup).
I added these 2 commands to the exclusive operations of LVM backend SR (/opt/xensource/sm/LVHDSR.py) :
Until now, it works fine : I can see (in /var/log/SMlog) the sr lock being acquired on vdi (de)activation and it should prevent this kind of error.