Are you planning on virtualization of CAD, 3D or graphics workloads to a virtual solution like Citrix XenDesktop or VMWare View? How do you know what vGPU solution that may fit your workloads? In this blogpost I would like to focus on how we can predict the size of the vGPU profile based on existing GPU from physical PC’s.
It is extremely important to do the sizing right. If you size incorrectly, you may get a solution with poor user experience or you may end up with an underutilized system.
Looking at the XenApp and XenDesktop 7.x design handbook the recommendation is to create 3 categories of users: Designer, Power user and Knowledge worker. Then select a profile that is designed for that kind of user. But how do we know the category of the users and will the profile fit the user? If you are moving from physical workstations to virtual, the best way would be to look at the existing GPU workloads of the users.
How can we predict the target vGPU profile from a physical machine that is going to be virtualized?
For large projects I recommend to use Systrack from Lakeside Software to monitor average GPU usage on an existing machines, and create a report predictive vGPU profile report based on existing workloads.
But if you want to do it yourself and learn about how applications are using the GPU here is some advice for methods you can use based on experiments from some projects I’ve been running.
I’m using a tool I’ve written that can get GPU Compute and Framebuffer usage into a CSV file and then analyzing it in excel. You can pull the numbers for NVidia cards using performance monitor, for GPU’s like AMD or Intel, you can use GPU-Z. Once you have this data, you need to know the difference in computing power between GPU type X to NVidia Grid K1 or K2. We need a GPU Factor
I got this idea from Helge Klein. To get the GPU factor, we need to run the exact same workload on both machines and then compare the results. This is called benchmarking.
“Benchmark is a standard or point of reference against which things may be compared or assessed”
The ideal benchmark would be the application that you are going to virtualize, but if the application does not have a way to do this, you need a benchmark tool. I know cadalyst has made a benchmark tool for autodesk products. But for this blogpost I’m using Passmark. Passmark is mostly used for gaming graphic benchmark, but you will get an idea of how you GPU is performing. And the good thing, they have an online database where you can search for benchmark result for all kinds of GPU’s. This way, you can actually save a lot of time by using numbers from existing benchmarks. Just note that the benchmark score will also depend on the rest of your computer hardware so this test would not be 100% fair. A GPU in a powerful computer would perform better than on a weak computer. But still you will get a number that somehow shows the difference in GPU computing power between different GPU types and GPU vendors.
So let’s take a scenario. You have workstations with nVidia Quadro 5000 GPU’s. You want to predict what vGPU profile to use for the workstations.
Looking at passmark graphics results we get:
- NVidia Quadro 5000: 2662
- NVidia Grid K1 pass-through: 840
- NVidia Grid K2 pass-through : 3722
This will give us the following target vGPU profile factors:
- K100 : 840/8=105
- K120Q: 840/8=105
- K140Q: 840/4=210
- K160Q: 840/2=420
- K180Q: 840
- K200 : 3722/8=465
- K220Q: 3722/8=465
- K240Q: 3722/4=930
- K260Q: 3722/2=1861
- K280Q: 3722
If the workload on the source machines were 100% it will require the following vGpu profiles:
- NVidia Quadro 5000 => K280Q
But the GPU workload for a CAD user very often look like this (screenshot from real workload on a Quadro 5000):
With CAD workloads the GPU usage is very spikey. This could be when the user is rotating an object. On average, this workload is 9%, but peaks to 94%. If I use the average of 9% to predict the vGPU profile, I could select a K220Q profile, running 8 workloads like this on one K2 pGPU. What happens when all of these workloads peaks at the same time? Performance is poor! If I select a max profile, user density will be low. How can I analyze this correctly?
Looking at the same data in a histogram, counting how many times the workload hits 10%, 20%, 30% etc, I get a diagram like this:
Now we can see that most of the time, the load is not more than 40%. The rest of the peaks will be handled by available GPU power in the other VM’s on that pGPU. Calculating with a 40% load allows me to use a K240Q or K260Q profile for this kind of workload. It will give me better user density, and still good performance for the users.
There is another way of finding the correct profile. I count how many times the load is using more than 100% of a vGPU profile. If the load is above 100% of a vGPU profile more than 10% of the time, I select a higher profile.
This sample shows that K240Q and K180Q is not overcommitted more than 10% of the time.
Now there is another factor that will determine what vGPU profile that will fit this workload. The framebuffer. A Quadro 5000 has a framebuffer of 2500MB. The workload is using 50% of that framebuffer. We need at least 1250MB framebuffer for the vGPU profile. The vGPU profiles has a dedicated framebuffer:
- K100 : 256
- K120Q: 512
- K140Q: 960
- K160Q: 2048
- K180Q: 4096
- K200 : 256
- K220Q: 512
- K240Q: 960
- K260Q: 2048
- K280Q: 4096
Both K160Q and K260Q has a framebuffer like this but only the K260Q can handle the computation workload. If you have to choose between a high K1 profile and a medium K2 profile, I would suggest to go for a K2 profile to allow for growth. I’ve also seen that upgrading an application to a new version almost doubles the load on the GPU, that’s why you should always account for growth. A K1 most of the time will not work for high end graphics. There are also some applications like Adobe illustrator that requires a minimum framebuffer size to enable hardware acceleration, so in those cases, select a vGPU profile that gives you a large enough framebuffer.
You also have to count the number of monitors and screen resolution on the end user computer. I recommend this chart from Poppelgard.com for an overview of vGPU capabilities
So for the Quadro 5000 users we end up with a K260Q profile, giving us 2 users per pGPU, 4 users per K2 Board, and 8 users per server on a server model with two K2 cards.
There is also another question you need to ask: Does the application use OpenCL or CUDA? If yes, you may forget about vGPU, because it’s not supported to run OpenCL and CUDA with vGPU.
The data above is based on real data from a project I’m working on. After moving the users to the virtual machine you can verify the result by monitoring the workload again and do a compare of the results.
In the end I need to say that the method with basing predictions on benchmark score is not perfect. That’s why you need to verify you workload in a POC environment. For large projects I would recommend a scale test with a recorded user workload using LoginVSI. It would give you a better understanding if you have sized it correctly.
And the most important thing to do for any data analyst: Look at the data! The more you study the data, the more you will understand. And the more data you have, the more correct will your predictions become. Collect data for several weeks to be sure you get the whole picture.
Thanks to Thomas Poppelgaard from Poppelgaard.com and Helge Klein from HelgeKlein.com for helping me with this blogpost. Also thanks to Jason Southern from NVidia for verifying the content of this blogpost.