Word of caution about XCP-ng 8.1 and TrueNAS upgrade with iSCSI over Chelsio NICs

I’m not certain what it is about this specific combo, but when I took the outage and updated both XCP-ng and FreeNAS (to TrueNAS) a few weeks back, iSCSI quit communicating fully (the NICs were clearly communicating, but the share would always break before or during an import of all the existing disks). To recover, I switched my iSCSI share to work over my 1Gb NICs, created a new NFS share over the 10Gb NICs, and then migrated everything from the iSCSI share to the NFS share.

Side note: Be sure to run your XenOrchestra VM on local storage!

1 Like

I had this problem too. I didn’t have time to debug it though. I reverted my then truenas server back to the last freenas build. It was weird because I could get xen orchestra and xcp-ng center to see the storage but on the last step while finalizing the connection I would get some logs in truenas stating the the connection was made and then lost over and over.

I am going to wait until U3 to give it another shot as this reddit post suggests. This post links to a schedule that alludes to waiting for U3 for production.

Out of curiosity, what Chelsio NICs are you using? I upgraded from FreeNAS to TrueNAS 12.0 U1 back in November (I think) without issue. It was actually the easiest upgrade I’ve experienced. I’m running 2x Chelsio T520-CR cards in the TrueNAS box, and one each in my XCP-ng hosts. Only a few weeks ago did I update the XCP-ng hosts to 8.2. So all said the upgrades were done a few months apart. I’d hate to recommend introducing more down time, but if it were me, I’d follow the scientific method and set two maintenance windows and only upgrade one system at a time.

Looks like I’m running on older cards: Chelsio Communications Inc T320 10GbE Dual Port Adapter in both machines.

fwiw, when I was troubleshooting a failed NIC a couple years back, I learned that Chelsio NICs happily take new firmware when the OS on the computer provides it. My current theory is that one of the updates that happened at the host OS level (so the underlying FreeBSD or Linux) included new firmware and that the two versions don’t play nice over iSCSI. I don’t really have any supporting evidence for this, though, other than I’ve seen similar issues with SMB shares on these same NICs, so I know that they can be badly behaved at the protocol level.

/me gestures at my Mellanox NICs that all work flawlessly

  • sigh *