Uploaded image for project: 'XenServer Org'
  1. XenServer Org
  2. XSO-846

XS7.2 patch XS72E015 igb bond flapping problem

    Details

      Description

      Hi there!

      We have very often on XS7.2 (applied patch XS72E015) - igb bond flapping problem.

      Problem occurs time to time - and it is impossible to reproduce right now.

      We have this problem almost daily...

       

      What happens?

      One of the network drivers hangs and will not work anymore (sometime flaps UP/DOWN) - until modprobe -r igb and modprobe igb back - after that is everything working again - also server reboot helps

      [root@xen ~]# ethtool -i eth0
      driver: igb
      version: 5.3.5.3
      firmware-version: 1.63, 0x800009fd
      bus-info: 0000:01:00.0
      supports-statistics: yes
      supports-test: yes
      supports-eeprom-access: yes
      supports-register-dump: yes
      supports-priv-flags: no

      DMESG:
      [Mon Feb 19 20:24:23 2018] -----------[ cut here ]-----------
      [Mon Feb 19 20:24:23 2018] WARNING: CPU: 7 PID: 0 at net/sched/sch_generic.c:306 dev_watchdog+0x193/0x260()
      [Mon Feb 19 20:24:23 2018] NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out
      [Mon Feb 19 20:24:23 2018] Modules linked in: tun nfsv3 nfs fscache iptable_filter openvswitch nf_defrag_ipv6 nf_conntrack libcrc32c 8021q garp mrp stp llc dm_multipath ipmi_devintf x86_pkg_temp_thermal coretemp crc32_pclmul dm_mod aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul sg glue_helper i2c_i801 shpchp
      ipmi_si ipmi_msghandler video tpm_tis tpm nls_utf8 isofs nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables hid_generic usbhid hid raid1 md_mod sd_mod ahci libahci libata xhci_pci igb(O) xhci_hcd ptp pps_core scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod ipv6 autofs4
      [Mon Feb 19 20:24:23 2018] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G O 4.4.0+10 #1
      [Mon Feb 19 20:24:23 2018] Hardware name: Supermicro Super Server/B2SS2-F, BIOS 2.0a 06/10/2017
      [Mon Feb 19 20:24:23 2018] 0000000000000000 ffff880087fc3db0 ffffffff8131bb63 ffff880087fc3df8
      [Mon Feb 19 20:24:23 2018] ffffffff8186eb0a ffff880087fc3de8 ffffffff81071d6e 0000000000000000
      [Mon Feb 19 20:24:23 2018] ffff880080d7c000 0000000000000010 ffff880080d7b700 0000000000000007
      [Mon Feb 19 20:24:23 2018] Call Trace:
      [Mon Feb 19 20:24:23 2018] <IRQ> [<ffffffff8131bb63>] dump_stack+0x63/0x90
      [Mon Feb 19 20:24:23 2018] [<ffffffff81071d6e>] warn_slowpath_common+0x9e/0xc0
      [Mon Feb 19 20:24:23 2018] [<ffffffff81071ddc>] warn_slowpath_fmt+0x4c/0x50
      [Mon Feb 19 20:24:23 2018] [<ffffffff815083e3>] dev_watchdog+0x193/0x260
      [Mon Feb 19 20:24:23 2018] [<ffffffff81508250>] ? dev_deactivate_queue.constprop.34+0x60/0x60
      [Mon Feb 19 20:24:23 2018] [<ffffffff810ceb4f>] call_timer_fn+0x5f/0x140
      [Mon Feb 19 20:24:23 2018] [<ffffffff81508250>] ? dev_deactivate_queue.constprop.34+0x60/0x60
      [Mon Feb 19 20:24:23 2018] [<ffffffff810d02b0>] run_timer_softirq+0x220/0x2a0
      [Mon Feb 19 20:24:23 2018] [<ffffffff810761e9>] __do_softirq+0x129/0x290
      [Mon Feb 19 20:24:23 2018] [<ffffffff81076522>] irq_exit+0x42/0x90
      [Mon Feb 19 20:24:23 2018] [<ffffffff813c64d5>] xen_evtchn_do_upcall+0x35/0x50
      [Mon Feb 19 20:24:23 2018] [<ffffffff815a4dae>] xen_do_hypervisor_callback+0x1e/0x40
      [Mon Feb 19 20:24:23 2018] <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
      [Mon Feb 19 20:24:23 2018] [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
      [Mon Feb 19 20:24:23 2018] [<ffffffff8100c460>] ? xen_safe_halt+0x10/0x20
      [Mon Feb 19 20:24:23 2018] [<ffffffff81020d67>] ? default_idle+0x57/0xf0
      [Mon Feb 19 20:24:23 2018] [<ffffffff8102149f>] ? arch_cpu_idle+0xf/0x20
      [Mon Feb 19 20:24:23 2018] [<ffffffff810aadb2>] ? default_idle_call+0x32/0x40
      [Mon Feb 19 20:24:23 2018] [<ffffffff810ab00c>] ? cpu_startup_entry+0x1ec/0x330
      [Mon Feb 19 20:24:23 2018] [<ffffffff81013c18>] ? cpu_bringup_and_idle+0x18/0x20
      [Mon Feb 19 20:24:23 2018] --[ end trace 2e96dee36a582c18 ]--
      [Mon Feb 19 20:24:24 2018] igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
      [Mon Feb 19 20:24:24 2018] igb 0000:01:00.1 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

      Now is the latest version 5.3.5.15 12-19-2017

      https://sourceforge.net/projects/e1000/files/igb%20stable/

      -> but question if it will help...

       

      Also described there:

      https://sourceforge.net/p/e1000/bugs/549/?limit=25

      https://discussions.citrix.com/topic/385033-xen-70-p27-intel-network-flapping-in-lacpslb-bond/

       

      Without bond it crashes very very rarely.

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Gewissler Robert
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: