Great day everyone.
I have the following setup, XCP-NG hosting 20 windows 10 VMs and storing the VDIs on an NFS share hosted on TrueNas with all SSDs.
I have these connected over a 10G switch and have been monitoring the usage worried that the 10G switch would be a bottleneck, however while the VMs are running a bit slow checking the traffic report on the NIC on truenas as well as the switch the highest it seems to go is 1.78Gb
Does this seem right? For 20 Windows VMs that are actively being used simultaneously it seems like very little traffic.
I’d say that depends on what the VMs are doing, i.e. whether they are running high disk I/O processes. Can you specify what you mean by “actively being used”?
Users are remoting into the VMs to work, Spreadsheets, PDFs, Word documents, web browsing and other office related tasks. The 20 users all work on the same schedules, so they are working at the same time.
I just remember planning this out and being told by a few users that using local storage was my best bet because 10Gb switch would be the bottleneck for 15 Windows VMs, and now that we’re up to 20 it seemed suprising to me to see so little of it actually being used.
Well, I haven’t found the bottleneck yet, but I tried to transfer a VM from TrueNas to local storage. Both Thin provisioned and the VM has 140Gb vdi, the transfer took 1.5 hours which seems excessive. During the transfer I checked insights on the Unifi Aggregation switch and the highest rate was 1.44Gbps so I don’t think its the switch.
I did, all MTU are set to 9000. I moved the VMs to local sr and according to the users they’re seeing the VMs run faster and not freezing so there is definitely some issues with network speeds.
I also noticed during the migration that even though i selected the 10gb network and migrated while the vm was powered off the migration while also moving the sr on a 100GB VM takes close to 2 hours. It seems to be migrating at a fraction of the 10Gb network speed.
Any ideas?
I would say this is a bad implementation of trying to change the MTU to 9000. In most cases the MTU should stay at 1500 unless you have a special use case for jumbo frames. I would almost guarantee it’s because of your MTU setting change.
Thanks for your reply, I just did the MTU change recently to see if it would help, but had the traffic issues beforehand. As per the VM migration, I just tried changing the MTU back to 1500 and it didn’t make a difference. I hadn’t noticed this before as I had my VMs on a shared storage and migrating them from server to server didn’t require migrating the VDIs.
As mentioned before, I just migrated them to local storage to see if it would help with the VMs performance and it did so for now I’ll leave them there, but this creates an issue if I now want to migrate them from one host to another for maintenance on the hosts as the migration process will take hours.
Doing some Google searches, I’ve come across many others experiencing the same issue with migration over 10Gb networks and only about 1Gb transfer speeds. Some have mentioned its a limitation on XCP-NG which doesn’t sound right to me. Unfortunately nobody on any of those posts has replied and confirmed that yes they are able to migrate the VDIs on their 10Gb network at anywhere near 10Gb so I guess it could be a limitation indeed.
Doh… Yes, it goes through the firewall, I don’t know why I never thought about excluding it. I’ll make those changes and let you know.
Thanks for the great idea.
So I tested a migration again bypassing the firewall now and no change. @xMAXIMUSx by chance, do you have a 10gb connection you can use to migrate VDIs? or 2.5, anything faster than a 1Gb will do and can you confirm you are able to transfer VDIs at a faster speed?
I have 2 desktops at home I was going to do some testing on this morning but just realized they only have 1Gb so they won’t be of much help.
We have 10Gb with xcpng and xen orchestra. We set the migration network on the iscsi network. You might give that a try. Also you might want to set your MTU back to 1500 on all network devices if you haven’t done so already. BTW I only get about 1.5 to 2Gb when migrating. We have really good hardware too and configured properly. It’s just a slow process.
Home>Pool>Advanced
Scroll to the bottom and under “Misc” you will set the default migration network.
OK, so im not getting the 1.5-2Gb you do but my hardware is a bit older so its likely related to that. The problem then is as mentioned before that XCP doesn’t have the capability to go anywhere near the 10Gb speed which is what I was trying to achieve for my disk migrations.
As per hosting the VDIs on a NAS with a 10Gbps connection, I wasn’t seeing the same performance I see when using local storage so I need to just keep testing things out and see where the bottleneck might be.
OK, got it. I was just curiouse as my math isn’t adding up, even at 1Gbps a 100 GB VM should take around 20 min I think and its taking close to 2 hours so I was curiouse what you were seeing.