Uploaded image for project: 'XenServer Org'
  1. XenServer Org
  2. XSO-586

XenServer 7 - Master fails to detect new master when reintroduced into pool

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • None
    • 7.0, 7.1
    • Networking
    • None
    • Pool with 2 hosts. DL360 G5s. Fresh install of fully patched XS7

    Description

      On failure of a pool-master, xe-emergency-transition-to-master is successfully executed on a slave.

      When the former (crashed) master is reintroduced to the pool, it fails to detect that a new master is present.

      In reviewing the log, it appears the former master is checking its known slaves for the presence of a new master before it has finished initializing its own management interface causing it to falsely detect no new master (since it has no IP address at the time it was checking for another master). This ultimately results in 2 masters and a broken pool.

      It appears the sequence (timing) of events may have changed in XenServer 7. The check for the new master should be done after the management interface has finished initializing.

      Log lines below show the check for another master 1 second before the original master has initialized its own management IP address

      Original Master: 192.168.1.221
      Original Slave (promoted to master): 192.168.1.222
      Log lines below are from 192.168.1.221

      Jul 18 10:44:54 xs7-dev1 xapi: [debug|xs7-dev1|0 |checking no other known hosts are masters D:7b3c4872ea7c|stunnel] stunnel start

      Jul 18 10:44:54 xs7-dev1 xapi: [debug|xs7-dev1|0 |checking no other known hosts are masters D:7b3c4872ea7c|xmlrpc_client] stunnel pid: 2168 (cached = false) connected to 192.168.1.222:443

      Jul 18 10:44:54 xs7-dev1 xapi: [debug|xs7-dev1|0 |checking no other known hosts are masters D:7b3c4872ea7c|xmlrpc_client] with_recorded_stunnelpid task_opt=None s_pid=2168

      Jul 18 10:44:54 xs7-dev1 xapi: [debug|xs7-dev1|131 |dom0 networking update D:e6ff2cdc0f63|xapi] Signalling anyone waiting for the management IP address to change

      Jul 18 10:44:57 xs7-dev1 xapi: [ warn|xs7-dev1|0 |checking no other known hosts are masters D:7b3c4872ea7c|xmlrpc_client] stunnel pid: 2168 caught Xmlrpc_client.Connection_reset

      Jul 18 10:44:57 xs7-dev1 xapi: [debug|xs7-dev1|0 |checking no other known hosts are masters D:7b3c4872ea7c|stunnel] 2016.07.18 10:44:57 LOG3[2168:140690849445952]: connect_blocking: connect 192.168.1.222:443: No route to host (113)

      Jul 18 10:44:57 xs7-dev1 xapi: [debug|xs7-dev1|0 |checking no other known hosts are masters D:7b3c4872ea7c|xapi] Couldn't contact slave on startup check: Stunnel.Stunnel_error("No route to host")

      Jul 18 10:44:58 xs7-dev1 xcp-networkd: [ info|xs7-dev1|35 |Bringing up managed physical PIFs D:e54921b71705|network_utils] /sbin/ip addr add 192.168.1.221/24 dev xenbr0 broadcast +

      Issue can be reproduced as follows:

      • start with pool with 1 master and 1 slave
      • reset the master host (simulated crash in test "echo b > /proc/sysrq-trigger")
      • perform xe pool-emergency-transition-to-master on the slave while the master is busy rebooting
      • this will result in a split pool with 2 masters. The original master fails to detect that the slave has been promoted while it was away

      Attachments

        Activity

          People

            stephentu Stephen Turner
            sc Salvatore Costantino
            Votes:
            3 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: