Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: 7.2
Component/s: Installer
Labels:
None
Environment:

XenServer 6.5 farm, upgrading to 7.2

Many SRs attached through HBA FC.

Many path (4 each)

Team:
- xs-ring0

Description

After upgrading node ofter node, we saw attaching of storage failing. "REPAIR" did not work.

A pbd-plug told us that the needed Volume Group was not found.

A "pvs -v" revealed the whole Desaster: Some PVs where not member of any VG anymore.

Kind screwed up, needed another coffee an thought about what went wrong.

Started looking through the install log:

You see a VG remove on 2 Points there: first the correct VG on local disk is removed.

But before recreating it, another vgremove is run - and it removes the VG on the top disk starting with sda - example: sdab

So we took a deep dive into the script (upgrade.py in the installer.img under /opt/xenserver/installer/) and found the problem beginning line 227, taking ist peak in 231:

                if storage_partnum > 0 and self.vgs_output:
                    storage_part = partitionDevice(primary_disk, storage_partnum)
                    rc, out = util.runCmd2(['pvs', '-o', 'pv_name,vg_name', '--noheadings'], with_stdout = True)
                    vgs_list = out.split('\n')
                    vgs_output_wrong = filter(lambda x: str(primary_disk) in x, vgs_list)
                    if vgs_output_wrong:
                        vgs_output_wrong = vgs_output_wrong[0].strip()
                        if ' ' in vgs_output_wrong:
                            _, vgs_label = vgs_output_wrong.split(None, 1)
                            util.runCmd2(['vgremove', '-f', vgs_label])
                    util.runCmd2(['vgcreate', self.vgs_output, storage_part])

Just before the vgcreate there is another run of remove, when Special circumstances are true, but the check is done with the Code "str(primary_disk) in x" - and of Course the PRIMARY_DISK string "sda" is contained in "sdab" and so on...

Since it is not looped here, only the first disk containig "sda" gets removed...

In our case that took place for on at each node - stopped the update early enough to prevent a big downtime - but was not nice either.

We resolved the Problems caused with a vgcfgrestore - where at we were fuck*** happy to have backups of the volume Group metadata in /etc/lvm/backup...

Attachments

Activity

People

Assignee:: Simon Crowe

Reporter:: Daniel Benden

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2017-08-15 07:55

Updated:: 2017-09-18 07:13

Resolved:: 2017-09-18 07:13