Uploaded image for project: 'XenServer Org'
  1. XenServer Org
  2. XSO-887

Coalesce is corrupting volume group metadata

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • 7.3, 7.4, 7.5
    • Storage
    • None
    • XSI-88

    Description

      Hi,

      This bug report is related to -----XSO-837---- and XSO-855-. As I haven't received any constructive answer, I investigated the problem myself.

      Since I resolved some lvm metadata corruption in previous issue (-XSO-883-), I had another lvm metadata corruption this week (not related to vgs) : I losed 3 volume groups this week! It tooks me hours to restore a stable environment (unplug stale vbd, delete stale snaphots, copy some disks, ...).

      This time, the last process which wrote to lvm was a 'lvcreate' command with a tag 'journaler' :

      ['/sbin/lvcreate', '-n', 'coalesce_039c80ce-d70f-4b0d-a185-a95f6ce3b6aa_1', '-L', '4', 'VG_XenStorage-6010cef0-b5ef-a604-bfd3-a1fde94d0d6f', '--addtag', 'journaler']
      

      After investigating your code, it seems that the 'SR._coalesce' function in '/opt/xensource/sm/cleanup.py' is running some lvm commands on SR without any lock! These commands are creating/deleting lvm on SR without lock : 'self.journaler.create', 'self.journaler.remove'. These are trashing the volume groups while they run at wrong times :

         def _coalesce(self, vdi): 
             if self.journaler.get(vdi.JRN_RELINK, vdi.uuid): 
                 # this means we had done the actual coalescing already and just  
                 # need to finish relinking and/or refreshing the children 
                 Util.log("==> Coalesce apparently already done: skipping") 
             else: 
                 # JRN_COALESCE is used to check which VDI is being coalesced in  
                 # order to decide whether to abort the coalesce. We remove the  
                 # journal as soon as the VHD coalesce step is done, because we  
                 # don't expect the rest of the process to take long 
                 self.journaler.create(vdi.JRN_COALESCE, vdi.uuid, "1") 
                 vdi._doCoalesce() 
                 self.journaler.remove(vdi.JRN_COALESCE, vdi.uuid) 
      
                 util.fistpoint.activate("LVHDRT_before_create_relink_journal",self.uuid) 
      
                 # we now need to relink the children: lock the SR to prevent ops  
                 # like SM.clone from manipulating the VDIs we'll be relinking and  
                 # rescan the SR first in case the children changed since the last  
                 # scan 
                 self.journaler.create(vdi.JRN_RELINK, vdi.uuid, "1") 
      
             self.lock() 
             try: 
                 self.scan() 
                 vdi._relinkSkip() 
             finally: 
                 self.unlock() 
      
             vdi.parent._reloadChildren(vdi) 
             self.journaler.remove(vdi.JRN_RELINK, vdi.uuid) 
             self.deleteVDI(vdi)
      
      

      Actually, I'm running a patched version of 'cleanup.py' which locks the entire function '_coalesce', but it's suboptimal as only some functions need a lock on the SR : 

         def _coalesce(self, vdi): 
             if self.journaler.get(vdi.JRN_RELINK, vdi.uuid): 
                 # this means we had done the actual coalescing already and just  
                 # need to finish relinking and/or refreshing the children 
                 Util.log("==> Coalesce apparently already done: skipping") 
             else: 
                 self.lock() 
                 try: 
                     # JRN_COALESCE is used to check which VDI is being coalesced in  
                     # order to decide whether to abort the coalesce. We remove the  
                     # journal as soon as the VHD coalesce step is done, because we  
                     # don't expect the rest of the process to take long 
                     self.journaler.create(vdi.JRN_COALESCE, vdi.uuid, "1") 
                     vdi._doCoalesce() 
                     self.journaler.remove(vdi.JRN_COALESCE, vdi.uuid) 
      
                     util.fistpoint.activate("LVHDRT_before_create_relink_journal",self.uuid) 
      
                     # we now need to relink the children: lock the SR to prevent ops  
                     # like SM.clone from manipulating the VDIs we'll be relinking and  
                     # rescan the SR first in case the children changed since the last  
                     # scan 
                     self.journaler.create(vdi.JRN_RELINK, vdi.uuid, "1") 
                 finally: 
                     self.unlock() 
      
             self.lock() 
             try: 
                 self.scan() 
                 vdi._relinkSkip() 
      
                 vdi.parent._reloadChildren(vdi) 
                 self.journaler.remove(vdi.JRN_RELINK, vdi.uuid) 
                 self.deleteVDI(vdi) 
             finally:
                 self.unlock()
      

      Please, don't ask me to provide full logs of XS7.5, as the '_coalesce' function is the same for every version of Xenserver...

      I'm posting these bug reports to help you providing a better (open source) product, but I don't understand why there are some nasty bugs never reported before... These are really breaking your reputation and trust in your product.

      Regards,
      Nicolas Michaux

      Attachments

        Activity

          People

            Unassigned Unassigned
            sarabanjina Nicolas Michaux
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: