Uploaded image for project: 'XenServer Org'
  1. XenServer Org
  2. XSO-837

SR volume group corrupted with simultaneous snapshots

    Details

    • Type: Bug
    • Status: Done (View Workflow)
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 7.2
    • Fix Version/s: None
    • Component/s: Storage
    • Labels:
      None
    • Environment:

      We are using Xen Orchestra to take delta backups of running vm every night on a pool of 3 Xenservers 7.1 (fully patched) connected to 2 iscsi SR with multipath.

       

      Description

      Hi,

      The backup (consisting of snapshots with xapi) is corrupting a volume group sometimes. It seems that there is a race condition happening while snapshotting multiple vm at the same time.

      The backup is calling xapi to snapshot multiple vm at the same time (2 concurrent snapshots by default).

      Sometimes, everything is running smoothly, but most of the times, we loose a volume group after the backup : 

      FAILED in util.pread: (rc 5) stdout: '', stderr: '  /dev/disk/by-scsid/23133613436326131/mapper: Checksum error

      After analyzing logs, it seems that the master of the pool is corrupting itself the volume group (no slave has written to the volume group according to /etc/lvm/backup).

      We have logs of the issue from all servers of the pool if necessary.

       

      Best regards,

      Nicolas Michaux

        Attachments

        1. SMlog-010218.zip
          856 kB
        2. SMlog-14032018.tar.gz
          1.23 MB
        3. status-report.zip
          45.16 MB
        4. status-report-2018-03-16-10-07-38.zip
          46.76 MB

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                sarabanjina Nicolas Michaux
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: