I feel the need, the need for speed (but I don't seem to be getting it)

Hi All,

Looking for some real world advice here regarding expected and actual speeds when moving data around. It’s a bit of a long one (sorry) but I wanted to get as much info down as I could!

I have two older HP servers (ML350 G6) both with P410i RAID cards, a raid 1 array (250gb on one, 480 on the other) and a raid 5 array (5.5tb on both). Both running xcp-ng 8.2.0 using ext storage not LVM. Both servers with a management interface untagged on a unifi gig switch and a “vm” interface tagged on the same switch.

I’m trying to migrate VMs from one server to the other. Live migrate works for the most part but I have a backup server (~2tb of drive space) that consistently fails to migrate without giving a reason. I have previously looked at the failure and think it’s related to the amount of space required to move 2tb is actually more than 5.5.

Either way around, I did some testing on the array before installing xcp-ng and found that my write speeds on the R5 array are circa 135MB/s which seems reasonable (using dd if=/dev/zero of=/dev/sda bs=1024M count=1024 ). 135MB/s is 1080 MBi so I should be able to max out a 1gbi network connection whilst writing data. Read speeds are faster so it will be the write process that is the bottle neck. I can also calculate that moving a 1.6TB disk image (this is the size of one of the disks) should, in theory take 3.5 hours to move…

When I actually come to move the data the sort of time estimates I am seeing are more like 35 hours not 3.5 hours. Currently using clonezila booted in network client/server mode on a source and destination server. Both servers are on the same VLAN so not routing. Tried with and without compression and that made no difference.

Now, I know that all the speeds above are all theoretical Max speeds and that I will never actually achieve that (although the disk test was fairly real world for block copying data) but I would have expected something better than 10% of the calculated speed. I’ve not tried iPerf yet to check my network throughput, I probably should have done before posting but will do so when the current move has finished.

What am I missing? Thoughts, tips, pointers all gratefully received.

For the record I have also run iPerf across various combinations of tagged and untagged connections to different VMs and max out the 1gb connection.