How I increased IOPS 200 times with XenServer and PVS


In my previous blogpost I’m describing how the PVS 7.1 new Cache to Ram with overflow to disk does not give you any more IOPS than cache to disk.

During that test, I discovered that Intermediate Buffering in the PVS device can improve your performance 3 times on xenserver, but with some more experimenting I got up to 200 times the IOPS on a PVS device booted on Xenserver. Here is a little description of what I did and how I measured it.

First, a little disclaimer. This is observations done I a LAB environment, DO NOT implement this in production without testing properly.

I’m using IO meter to test, and here is my IO meter setup for this test and my previous cache to RAM test.

  • Target Disk: c:
  • Maximum disk size 204800 (100MB)
  • Default Access specification
  • Update frequency 10 secs
  • Run time 1 minute

I know there has been a lot of discussions on around wether or not to enable intermediate buffering in the PVS device or not. Citrix has a KB article that describes this:

Basically; some disk controllers will give you better performance and some will give you worse with buffering enabled. In a virtual environment, it’s actually the virtual disk controller that counts. So I’ve tested on Hyper-V 2012 R2 and Xenserver 6.2 if the virtual disk controller was able to get better performance with this option. I did a baseline with a Windows 2008 R2 image on the same hardware on both hypervisors. Without buffering enabled, Hyper-V was 2x faster than Xenserver. When enabling buffering, the VM got 3x performance on Xenserver, but on hyper-v it actually got slower. So the virtual disk controller on Hyper-V was reacting negatively in my test to disk buffering. Please let me know if you know a way to tune this on Hyper-V.

Reading further on the Citrix KB description of disk buffering, you will see that the RAM on the disk controller is being used when intermediate disk buffering is enabled. Randomly I stumbled upon this article that describes how to increase Xenserver performance by adding more RAM to the dom0:

Reading the documentation of the dom0 RAM it’s clearly that the virtual disk controller is using dom0 RAM and that with many VM’s you have to increase the dom0 RAM to avoid it becoming a bottle neck for disk IO. The only reason not to increase it is if you don’t have much RAM available and want to save RAM. Well, today most servers are on 128 or 256GB ram, I Guess increasing from 752 to 2940 is okay.

I followed the guide to increase dom0 RAM to 3GB.

Booting a local VM running IO meter showed no difference. Booting a PVS VM without intermediate buffering was still not better. About 150 total IOPS as my PVS cache was on a slow SATA disk. Then I enabled disk buffering in the PVS image: REG ADD HKLMSYSTEMCurrentControlSetServicesBNIStackParameters /v WcHDNoIntermediateBuffering /t REG_DWORD /d 2 /F

Booting up, running IO meter again I was shocked. 35000 IOPS! That is more than 200 times better! Moving the cache to SSD disk, I got 55000 total IOPS. Wow! I had to test several times, with different OS’es but with the similar results. I also tested this with “Cache to RAM with overflow to disk”, and now I was able to get 55000 IOPS here too. It means that cache to device RAM with overflow to disk, is actually, “cache to host RAM”, and give the same result as cache to device disk.

Checking with Uberagent for splunk, it gives me 1,5 sec total login time! (No GPO, local profile, no loginscripts)

So the conclusion to my test is:

Intermediate disk buffering together with Xenserver with lots of dom0 RAM, performs much better that PVS cache to RAM but without the risk of crash when RAM is full. How this will work in a production environment however I don’t know yet. I guess LoginVSI could be used to simulate workloads a get a better picture of this, but I haven’t been able to do so yet.

Please test this yourself and verify if you get the same result as me.

MCS will not be able to get the same boost, but adding more DOM0 RAM is still a good idea. I don’t know if it’s possible to enable similar buffering with MCS, I guess this is something for Citrix to look into.

I’ve tested with the following OS’es:

  • Windows 7 X64
  • Windows 2008 R2 x64
  • Windows 2012 R2 x64
  • Windows 8.1 x64

Below is a diagram showing the data from all my tests.

18 thoughts on “How I increased IOPS 200 times with XenServer and PVS

  1. Thanks for publishing your efforts, I have a couple of improvements you might like to use next time. First, please don’t link to cliffdavies, he C&Ps content with no link to the original author. If google leads you to his site, just search on the text and it will lead you to the original author, in this case CTX themselves.

    Also, your tiny test file for Iometer is skewing your results, essentially your are over-emphasising the role of the Dom 0 cache by being able to store all 100MB in cache and reading writing from there. Here are my settings, and why, for Iometer.

    At some point I need to do exactly the same set of tests in my lab, I’ll link the article here as soon as I have time to do the research.

  2. In XenServer 6.2 it changed the way Dom0 memory is allocated to scale based on the amount of host memory. You can of course override is as was documented in that old KB. However it’s probably best to refer to the XenServer 6,2 Admin guide - Section 7.1 (Page 103) to ensure you aware of an understand the changes in 6.2.
    The Admin guide can be found here:

    • Thank you Martin for the information. A good thing that it’s auto configured in 6.2 as I think a lot of people forget to configure this. My lab however just had 16GB RAM so I had to do it manually. It would be nice if the xenserver tools was able to use disk buffering the same way as PVS is doing, would help a lot on MCS installations. I don’t know how that is working, maybe it’s not possible?

  3. Have you looked at the performance gains when using PVS 7.1 with Server 2012 R2 storage pools auto-tiering? I am testing using ssd in my local xenserver then using server 2012 r2 storage tiering to setup tiers of storage for the vdisk volume. I know this isn’t better than ram cache but every little bit helps.

  4. Howdy Magnar,
    Thanks for writing this article. We were able to reproduce your results of massively increased IOPS, disk throughput and much lower latency. Awesome! However, in our testing, user performance didn’t change much. Login times might have been a bit faster (10 - 15%). I would have thought this should have improved dramatically?

    Also, we loaded up about 10-15 VDI Win7’s with 4GB of RAM assigned, and after some number of VMs started, all the machines with that image blue screened and started a blue screen cycle. We rolled back the change, so didn’t have lots of time to diagnose. Will try again this week. I’m wondering whether once they hit the dom0 memory allocation they blue screened, just like Cache to RAM used to do?

    • You will not have 200 times better performance in reality as the IO meter test is just a small file. A real VDI workload will not perform 200 times faster, but you will see some performance enhancements. I’m mostly using this with XenApp workloads, VDI/Desktop workloads would probably have better results with third party IO optimization tools like Atlantis. Remember that in XS 6.2 the memory is sized dynamically with your total RAM:
      About the bluescreen, I’ve not seen this, but as I said, I’m only using this option with XA not XD.

  5. Pingback: An update about my experience with PVS. | Virtual eXperience

  6. Has anyone had performance like this on hyper-v and how did you get there?

    We have a ssd array and are able to pull ~4.5gbs of read on the write cache disk, but we are only able to read ~ 25 mbs on the pvs disk.

    Xendesktop 7.5, pvs 7.1, all 2012 r2, hyper-v, 10gb lan, solid state storeage pool and I was saying ~ 4.5gigabytes a second for write cache.

    Tried nearly everything under the sun.

  7. Pingback: An update about my experience with PVS. | Virtual Experience

  8. Pingback: IOPS increased in Citrix XenServer and PVS article by Magnar Johnsen | Tannyahmad's Weblog

  9. For some reason the default setting for MAXIMUM available RAM in DOM0 at only around 700M seems like a bad design decision from Citrix. Not sure why they would do that, and I’ve never been able to find a clear explanation of what they were thinking. You need to reboot the whole lot in order to change it, which means you want to try and get it right early.

    If you have perhaps 10 guests doing typical I/O then that might well fill up the default DOM0 memory, and if you push your luck DOM0 will start thrashing the swap. Performance will immediately collapse. Very likely you will hit this problem before you get to 20 guests, and that does not seem like a whole lot by today’s standards.

    I can see that recent Xenserver versions have improved on this, but I guess a lot of people will have been bitten by it by now (I know I was).

    I should also point out that the normal caution applies with overly large disk cache. Potentially a crash of DOM0 or a power failure will quite likely lose data that was sitting in cache and never got copied to the disk (because the disk is much slower than the RAM). People argue all day about the value of battery backed RAID controller cache and whether to turn on cache RAM in the drives themselves. No point re-running all those arguments here, but at least be aware of the risks involved.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>