VDI Infrastructure redesign advice

Hi everyone, here is some input first:

I manage IT in a small group of hotels and we have been using VDI since 2012 as our primary workstation solution for the company’s employees. Me and my team have been using Hyper-V for the past decade in a failover cluster and used thin clients with LeafOS for our user workstations that connect through RDP to each employees own Windows 10 VM. On sites we use up to two hosts, we use SAS cables on the storage array. If they are more we use iSCI.

However, while we still prefer to remain on the VDI side of the implementation, as the investment cost to replace everything would be too high to do all at once, we need to re-design the whole thing as we were lacking performance wise.

We also have our own datacenter in one of the premises and all premises are interconnected via private 10gig fiber, so it’s essentially a big LAN between all hotels.

I can understand that a user connecting to their own VM might not be equally fast and responsive as a physical machine, but we are looking into getting very able hardware that can possibly reach the performance and user experience of a physical PC. Video capabilities were also very poor and as Teams/Zoom calls became more of a thing, we would need better graphic intensive workloads so we also would like to look into Nvidia gpu for VDI for example the A16 model.

So, since everything is on the table, i would appreciate some pointer on how to design the requirement so i can reach out to vendors and get more accurate quotes. However, in my country VDI knowledge is very lacking in almost all vendors, so it would be better if we know exactly what we are asking and set it up ourselves and just buy the hardware and licenses.

As far as the hardware goes:

-Physical hosts: Any Dell/HPE/Lenovo servers with AMD EPYC cpu (more cores the better, right?) with around 512GB ram, 10GB+ networking (possibly 25 or 50 NICs), Nvidia A16 GPU and SSD drives

-Storage: I am starting to concider if ditching the storage array (which is a single point of failure) for internal drives on the hosts and using some sort of software clustering method (like Storage spaces direct S2D or something equivalent)

-Hypervisor Platform: As i mentioned, we are used to Hyper-V but I am not sure how this will limit our end result, since we need more advanced things, especially for vGPU on VMs. So seeing that Tom is also a big fan and supporter of XCP-NG with XO and Proxbox, i would also put those on the table. I know that Citrix and VMware are probably the most complete products right now, but reading all those bad comments about their practices, kinda puts me off.

Cost is not a big problem here, i have the support of the board members of our company, but it would be best if we first get a single host as proof of concept to try it out and then deploy everything else.

Let me know if you need any other information that will help. Sorry for the long post but this is an exciting (and expensive lol) project, so i want to nail this and everyone can be happy (both the user that has a solid workstation and the IT team who will have new toys to play with).

Also, if you disagree with anything i wrote and have an opinion of implementing something in a better way, feel free to comment/roast or just state your opinion openly.

I know that each user having their own VM is probably harder to manage and maybe RDS might be easier, so i am concidering this.

Basically, this implementation would be our middle step towards the cloud before migrating everything there.

Yes, RDS would be easier but I really don’t care for VDI solutions because of the issues you mentioned. Even with GPU acceleration latency and bandwidth become an issue contributing to a poor experience for the users.

If you are looking for an HA setup with XCP-ng we do a lot of high end IXsystems / TrueNAS storage systems that offer high availability. For this to work you would need at least three XCP-ng hosts for the HA cluster and a IXsystems TrueNAS based storage server with HA and then a switch that supports MCLAG so you have redundancy on the switching fabric.

This can be achived via a hyperconverged setup as well but would require really fast hardware for the storage setup to get the iops you want.

Thanks a lot for your prompt reply Tom, it’s a pleasure to chat with you in person. Great job on your video content. Would you advise me to look at another direction? I doubt we can change direction and look away from VDI at this point though :frowning:

I personally would investigate cloud hosted VDI solutions from the likes of Azure. AWS has a similar offering but I used it when I worked there, and wasn’t impressed. I think Microsoft and possibly Google are the tops in this space. Financially, imagine telling your CFO that he/she can have their hardware refresh budget back? Might make you a hero. And you would never have to refresh your infrastructure (for this stuff anyway) ever again.

We are in a country and specifically area which does not have good internet connections yet, so i don’t believe we are ready to move to the cloud right now. At most we have a 500mpbs connection with fiber for each premise, so i think that would affect both the hotel guest and also employee experience.

1 Like

That doesn’t make any sense to me. If the internet isn’t good enough for cloud, how is it good enough for your data center to do the exact same thing?

Because the hotels are connected with 10Gb fiber between each other, so the connection to the VMs will be a lot faster. It will not be through the internet. Any other hotel that is not connected to the central data center, will have its own servers.

I still think you should investigate it. I bet the cloud providers could use your 10gb fiber link. I know AWS could, not 100% sure about Azure. I never worked there.