Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
8.2
-
None
-
XSI-1225
Description
On Citrix Hypervisor 8.2 CU1, we reproduced a user issue involving a SMB ISO SR.
Whenever the SMB share is down (to reproduce, we simply turn the TrueNAS server off), after a small duration (a few minutes max), there is a null pointer dereference in kernel logs. You might need to run `df -h` on the server to provoke it:
[Mon Apr 11 15:17:02 2022] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [Mon Apr 11 15:17:02 2022] PGD 51dce067 P4D 51dce067 PUD 51928067 PMD 0 [Mon Apr 11 15:17:02 2022] Oops: 0000 [#2] SMP NOPTI [Mon Apr 11 15:17:02 2022] CPU: 1 PID: 9654 Comm: df Tainted: G D O 4.19.0+1 #1 [Mon Apr 11 15:17:02 2022] Hardware name: Dell Inc. Vostro 3550/0917G2, BIOS A07 07/18/2011 [Mon Apr 11 15:17:02 2022] RIP: e030:SMB2_query_info_free+0x8/0x10 [cifs] [Mon Apr 11 15:17:02 2022] Code: c0 31 f6 48 c7 c7 80 d0 5e c0 31 c0 e8 55 80 b0 c0 eb d9 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 07 <48> 8b 38 e9 60 22 fe ff 66 66 66 66 90 48 83 ec 30 4d 63 c0 48 8d [Mon Apr 11 15:17:02 2022] RSP: e02b:ffffc90040fbfbc8 EFLAGS: 00010246 [Mon Apr 11 15:17:02 2022] RAX: 0000000000000000 RBX: ffffc90040fbfd50 RCX: 0000000000000000 [Mon Apr 11 15:17:02 2022] RDX: ffff8880529b6170 RSI: ffff8880522d0200 RDI: ffffc90040fbfd78 [Mon Apr 11 15:17:02 2022] RBP: ffffc90040fbfe00 R08: 0000000000000000 R09: 0000000000000000 [Mon Apr 11 15:17:02 2022] R10: 0000000000007ff0 R11: 00000108ebd988bc R12: ffff8880529b0800 [Mon Apr 11 15:17:02 2022] R13: ffffc90040fbfc30 R14: ffff8880525cd600 R15: 0000000000000000 [Mon Apr 11 15:17:02 2022] FS: 00007f2e09fc2740(0000) GS:ffff88805a280000(0000) knlGS:0000000000000000 [Mon Apr 11 15:17:02 2022] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [Mon Apr 11 15:17:02 2022] CR2: 0000000000000000 CR3: 000000004f854000 CR4: 0000000000040660 [Mon Apr 11 15:17:02 2022] Call Trace: [Mon Apr 11 15:17:02 2022] smb2_queryfs+0x13a/0x310 [cifs] [Mon Apr 11 15:17:02 2022] ? lookup_fast+0xcb/0x2b0 [Mon Apr 11 15:17:02 2022] ? __follow_mount_rcu.isra.42+0x3c/0xf0 [Mon Apr 11 15:17:02 2022] ? walk_component+0x48/0x280 [Mon Apr 11 15:17:02 2022] ? legitimize_path.isra.44+0x28/0x50 [Mon Apr 11 15:17:02 2022] ? terminate_walk+0x55/0xb0 [Mon Apr 11 15:17:02 2022] cifs_statfs+0xb0/0x290 [cifs] [Mon Apr 11 15:17:02 2022] statfs_by_dentry+0x99/0x120 [Mon Apr 11 15:17:02 2022] vfs_statfs+0x16/0xc0 [Mon Apr 11 15:17:02 2022] user_statfs+0x50/0x90 [Mon Apr 11 15:17:02 2022] __do_sys_statfs+0x20/0x50 [Mon Apr 11 15:17:02 2022] do_syscall_64+0x4e/0x100 [Mon Apr 11 15:17:02 2022] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Mon Apr 11 15:17:02 2022] RIP: 0033:0x7f2e09acf787 [Mon Apr 11 15:17:02 2022] Code: 2d 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 48 8b 15 fd 66 2d 00 f7 d8 64 89 02 48 83 c8 ff c3 0f 1f 00 b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 66 2d 00 f7 d8 64 89 01 48 [Mon Apr 11 15:17:02 2022] RSP: 002b:00007ffdb6353bf8 EFLAGS: 00000206 ORIG_RAX: 0000000000000089 [Mon Apr 11 15:17:02 2022] RAX: ffffffffffffffda RBX: 00000000010df5a0 RCX: 00007f2e09acf787 [Mon Apr 11 15:17:02 2022] RDX: 00007ffdb6353f30 RSI: 00007ffdb6353c00 RDI: 00000000010df5a0 [Mon Apr 11 15:17:02 2022] RBP: 0000000000000001 R08: 00000000010df501 R09: 0000000000000000 [Mon Apr 11 15:17:02 2022] R10: 0000000000000002 R11: 0000000000000206 R12: 00007ffdb6353d40 [Mon Apr 11 15:17:02 2022] R13: 00007ffdb6353d40 R14: 0000000000000000 R15: 0000000000000001 [Mon Apr 11 15:17:02 2022] Modules linked in: arc4 md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs ccm fscache bnx2fc(O) cnic(O) uio fcoe libfcoe libfc scsi_transport_fc openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat 8021q garp mrp stp llc dm_multipath ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper dm_mod dcdbas dell_smm_hwmon psmouse sg i2c_i801 lpc_ich ip_tables x_tables sr_mod cdrom sd_mod xhci_pci xhci_hcd ahci r8169 libahci serio_raw realtek libata ehci_pci ehci_hcd video backlight scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod xen_wdt ipv6 crc_ccitt [Mon Apr 11 15:17:02 2022] CR2: 0000000000000000 [Mon Apr 11 15:17:02 2022] ---[ end trace 613fe9e5e7f12df4 ]--- [Mon Apr 11 15:17:02 2022] RIP: e030:SMB2_query_info_free+0x8/0x10 [cifs] [Mon Apr 11 15:17:02 2022] Code: c0 31 f6 48 c7 c7 80 d0 5e c0 31 c0 e8 55 80 b0 c0 eb d9 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 07 <48> 8b 38 e9 60 22 fe ff 66 66 66 66 90 48 83 ec 30 4d 63 c0 48 8d [Mon Apr 11 15:17:02 2022] RSP: e02b:ffffc900418b3bc8 EFLAGS: 00010246 [Mon Apr 11 15:17:02 2022] RAX: 0000000000000000 RBX: ffffc900418b3d50 RCX: 0000000000000000 [Mon Apr 11 15:17:02 2022] RDX: ffff8880529b6170 RSI: ffff888004490200 RDI: ffffc900418b3d78 [Mon Apr 11 15:17:02 2022] RBP: ffffc900418b3e00 R08: 0000000000000000 R09: 0000000000000000 [Mon Apr 11 15:17:02 2022] R10: 0000000000007ff0 R11: 0000000000000000 R12: ffff8880529b0800 [Mon Apr 11 15:17:02 2022] R13: ffffc900418b3c30 R14: ffff8880525cd600 R15: 0000000000000000 [Mon Apr 11 15:17:02 2022] FS: 00007f2e09fc2740(0000) GS:ffff88805a280000(0000) knlGS:0000000000000000 [Mon Apr 11 15:17:02 2022] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [Mon Apr 11 15:17:02 2022] CR2: 0000000000000000 CR3: 000000004f854000 CR4: 0000000000040660
This was reported to cause host unresponsiveness in the same situation on XCP-ng in production conditions.
Applying the following patch from kernel.org's 4.19 branch solved it for us and lets `df -h` answer correctly, with a list of devices that don't include the SMB share while it's disconnected, and without blocking other operations:
https://github.com/xcp-ng-rpms/kernel/commit/2dd7b1f8feca463393e01b491d7e95b6fb6b3615