Loading...

Details

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: 7.3, 7.4, 7.5
Component/s: Storage
Labels:
None
Environment:
Hide

We are taking snapshots of running vm every night on a pool of 3 Xenservers 7.2 (fully patched) connected to 2 iscsi SR with multipath.

// code placeholder
Show
We are taking snapshots of running vm every night on a pool of 3 Xenservers 7.2 (fully patched) connected to 2 iscsi SR with multipath. // code placeholder

Internal JIRA Reference:
XSI-88

Description

Hi,

This bug report is related to -----~~XSO-837~~---- ~~and~~ ~~XSO-855~~-. As I haven't received any constructive answer, I investigated the problem myself.

Since I resolved some lvm metadata corruption in previous issue (-~~XSO-883~~-), I had another lvm metadata corruption this week (not related to vgs) : I losed 3 volume groups this week! It tooks me hours to restore a stable environment (unplug stale vbd, delete stale snaphots, copy some disks, ...).

This time, the last process which wrote to lvm was a 'lvcreate' command with a tag 'journaler' :

['/sbin/lvcreate', '-n', 'coalesce_039c80ce-d70f-4b0d-a185-a95f6ce3b6aa_1', '-L', '4', 'VG_XenStorage-6010cef0-b5ef-a604-bfd3-a1fde94d0d6f', '--addtag', 'journaler']

After investigating your code, it seems that the 'SR._coalesce' function in '/opt/xensource/sm/cleanup.py' is running some lvm commands on SR without any lock! These commands are creating/deleting lvm on SR without lock : 'self.journaler.create', 'self.journaler.remove'. These are trashing the volume groups while they run at wrong times :

   def _coalesce(self, vdi): 
       if self.journaler.get(vdi.JRN_RELINK, vdi.uuid): 
           # this means we had done the actual coalescing already and just  
           # need to finish relinking and/or refreshing the children 
           Util.log("==> Coalesce apparently already done: skipping") 
       else: 
           # JRN_COALESCE is used to check which VDI is being coalesced in  
           # order to decide whether to abort the coalesce. We remove the  
           # journal as soon as the VHD coalesce step is done, because we  
           # don't expect the rest of the process to take long 
           self.journaler.create(vdi.JRN_COALESCE, vdi.uuid, "1") 
           vdi._doCoalesce() 
           self.journaler.remove(vdi.JRN_COALESCE, vdi.uuid) 

           util.fistpoint.activate("LVHDRT_before_create_relink_journal",self.uuid) 

           # we now need to relink the children: lock the SR to prevent ops  
           # like SM.clone from manipulating the VDIs we'll be relinking and  
           # rescan the SR first in case the children changed since the last  
           # scan 
           self.journaler.create(vdi.JRN_RELINK, vdi.uuid, "1") 

       self.lock() 
       try: 
           self.scan() 
           vdi._relinkSkip() 
       finally: 
           self.unlock() 

       vdi.parent._reloadChildren(vdi) 
       self.journaler.remove(vdi.JRN_RELINK, vdi.uuid) 
       self.deleteVDI(vdi)

Actually, I'm running a patched version of 'cleanup.py' which locks the entire function '_coalesce', but it's suboptimal as only some functions need a lock on the SR :

   def _coalesce(self, vdi): 
       if self.journaler.get(vdi.JRN_RELINK, vdi.uuid): 
           # this means we had done the actual coalescing already and just  
           # need to finish relinking and/or refreshing the children 
           Util.log("==> Coalesce apparently already done: skipping") 
       else: 
           self.lock() 
           try: 
               # JRN_COALESCE is used to check which VDI is being coalesced in  
               # order to decide whether to abort the coalesce. We remove the  
               # journal as soon as the VHD coalesce step is done, because we  
               # don't expect the rest of the process to take long 
               self.journaler.create(vdi.JRN_COALESCE, vdi.uuid, "1") 
               vdi._doCoalesce() 
               self.journaler.remove(vdi.JRN_COALESCE, vdi.uuid) 

               util.fistpoint.activate("LVHDRT_before_create_relink_journal",self.uuid) 

               # we now need to relink the children: lock the SR to prevent ops  
               # like SM.clone from manipulating the VDIs we'll be relinking and  
               # rescan the SR first in case the children changed since the last  
               # scan 
               self.journaler.create(vdi.JRN_RELINK, vdi.uuid, "1") 
           finally: 
               self.unlock() 

       self.lock() 
       try: 
           self.scan() 
           vdi._relinkSkip() 

           vdi.parent._reloadChildren(vdi) 
           self.journaler.remove(vdi.JRN_RELINK, vdi.uuid) 
           self.deleteVDI(vdi) 
       finally:
           self.unlock()

Please, don't ask me to provide full logs of XS7.5, as the '_coalesce' function is the same for every version of Xenserver...

I'm posting these bug reports to help you providing a better (open source) product, but I don't understand why there are some nasty bugs never reported before... These are really breaking your reputation and trust in your product.

Regards,
Nicolas Michaux

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cleanup.py.patch
4 kB
2018-10-05 00:35

Coalesce is corrupting volume group metadata

Details

Description

Attachments

Attachments

Activity

People

Dates