UPDATE 2. SEPTEMBER 2014! I’ve received updated information about this topic from NVidia and Citrix, so this is a completely rewritten blogpost.
With Nvidia and Citrix Xenserver vGPU, a GPU is shared between a numbers of virtual machines.
Looking at the Nvidia vGPU profiles, it’s quite clear that the GPU memory is split into a dedicated amount of memory, equal size per virtual machine. But what happens with computing power? My first thought was that GPU cores were dedicated in the same way as the memory as shown in this table from http://www.poppelgaard.com/nvidia-grid-vgpu-1-1-for-citrix-xenserver-6-2sp1
But are we really limited to a number of GPU cores? I asked NVidia and here is the explanation of how vGPU is sharing computing power. Thanks to Jason Southern and Jeremy Main from NVidia for sharing this information with me. Also thanks to Rachel Berry from Citrix for feedback on this subject. Thanks to Thomas Poppelgaard for valuable information and Shawn Bass for discussing the topic.
Each virtual GPU is allocated a number of GPU channels. Channels are the means by which the driver posts work directly to the GPU hardware, and they are accessed via the GPU’s PCI Express Base Address Register, or BAR. The CPU’s memory management unit (MMU) is used to partition access to the GPU’s BAR, so that a VM can only access the channels of the vGPU that it owns, and not those of any other VM
The GPU has multiple executions engines, for example, 3D, the massively parallel processor used for rendering, dedicated Copy Engines used for data movement, and NVENC/NVDEC processors used for video encode and decode. Each of these engines is timeshared among VMs, similarly to the way the GPU handles multiple contexts on a single OS.
Time sharing provides a guaranteed minimum of time on the GPU, however if the GPU is not fully utilized the additional resource is available to those VM’s connected. This is handled in the scheduler and defined by the policy.
i.e. 240Q guarantees 25% GPU time, but if more is available it can be utilized.
Think of it in a similar way to CPU fair share in XenApp, guaranteed minimum, but more performance if it’s available.
Profile defines your minimum resource level, but if there’s more available the VM can take advantage. This is why monitoring on a per VM basis is challenging and GPU Utilization currently reports the value of the whole GPU not the profiles “share”.
So, the important information here, is that your vGPU computing power is not fixed, if available you can have more computing power available, but you are guaranteed a minimum share of the GPU, when the GPU is loaded. However the dedicated vgpu core count in the diagram from Thomas Poppelgaard is correct for sizing the environment and also for comparing with other physical GPUs.
More about vGPU monitoring later in this post. Now let’s look at the the FRL.
Frame rate limiter (FRL)
What is the FRL and how is it affecting the vGPU computing power?
The FRL is there to prevent the user experience wildly varying between when they’re the only user on the GPU, to where the GPU is fully loaded with all the connected VM’s. It acts as a governor to minimize fluctuations and to also keep resource demand under control. There’s only 2 options, on or off. Off should only be used for Benchmarking and isn’t validated for production.
60fps is chosen as it’s the target most Pro-Vis users expect to hit and delivering higher than that over a network becomes very challenging. Delivering 30fps to the client can be challenging in some scenarios but having additional frames in VM is beneficial as it helps to smooth visual performance, 30 FPS is around 1 frame every 35ms. HDX isn’t locked to the FRL or V-Sync and if you’re not perfectly matched packet to frame you can perceive judder or lag, which is very noticeable with software cursors. With 60FPS you get to 1 frame every 17ms so visual latency is dropped and the remote session appears smoother even if the HDX frame rate is still at 30 or even lower on the client side.
In the non Q profile (K100 / K200) the FRL is set to 45fps.
Also, important to note that FRL only limits the frame render rate, if the GPU is being used for other activity, the FRL will not limit that activity, and the scheduler / time slice enforces the limit.
Originally in this blog post, I was talking about tuning frame rates limiter when HDX is delivering lower frame rates than the GPU, but as you can see from the information from NVidia and Citrix, there is no point in doing this. But if you want benchmark your vGPU you can disable it with this command.
- xe vm-param-set uuid=<VM-UUID> platform:vgpu_extra_args=”frame_rate_limiter=0″
How can we monitor vGPU computing power and frame rates produced by the GPU and delivered by the remote protocol?
For computing usage, we could be using nvidia-smi or some other GPU tool from within the virtual machine, but this only works with passthrough GPU’s. Computing power can NOT be monitored from within the virtual machine. Why? Because the GPU is shared and the % GPU usage will not be correct. This means that tools like GPU-Z cannot be trusted in vGPU monitoring. To monitor vGPU computing usage, we have to do it from the Xenserver host. We can use XenCenter, nvidia-smi from xenserver command line, or rrd2cv.
Remember that the vGPU computing usage monitored with XenCenter, nvidia-smi and rrd2cv is for the entire GPU, not per vGPU.
Here is where it’s getting complicated. For passthrough virtual machines, you can NOT use XenCenter or nvidia-smi on the xenserver host to monitor computing power. This is because with passthrough, the entire GPU is unavailable from the host, dedicated to the virtual machine. With passthrough, you have to monitor computing power from within the VM. If you have a mixed environment with passthrough and vGPU, you need two different ways to monitor it.
Monitoring GPU frame rate can be done with FRAPS from within the virtual machine.