I found out that some workloads perform much faster on an ESXi 5 guest than when using the same hardware natively without VMware. I thought this must be some kind of caching and investigated further.
I use a SATA disk directly connected to the chipset of my HP DL320G6, no separate RAID controller or the like. I have enabled the write cache of the disk. I use RHEL 6.3 as guest. The filesystems are ext4, mounted with barrier=1,data=ordered. This means that the linux kernel will ensure that filesystem metadata is journaled and regular data not journaled. Writes to the journal are really flushed to disk in the right order before processing will continue. This is a compromise between system performance and data integrity on unclean shutdown or crash.
When the system runs natively, this works as expeced. But when I run as guest in ESXi 5.0 update 1 it seems like VMware ignores all cache flushes and barriers.
You can see this in the benchmarks which I have done using bonnie++ (command bonnie++ -d /root -u 0 -r 2000 -s 4000 -b):
native:
Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP native 4000M 913 97 116908 8 131299 7 4272 97 8057106 99 405.8 7 Latency 8857us 406ms 126ms 2515us 120us 217ms Version 1.96 ------Sequential Create------ --------Random Create-------- native -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 39 0 +++++ +++ 58 0 39 0 +++++ +++ 52 0 Latency 150ms 604us 133ms 116ms 19us 358ms
as guest:
Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP guest 4000M 918 98 40257 4 70786 5 4625 99 7115272 99 2601 37 Latency 15894us 831ms 467ms 3654us 121us 56414us Version 1.96 ------Sequential Create------ --------Random Create-------- guest -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 784 6 +++++ +++ 975 3 823 4 +++++ +++ 981 3 Latency 58272us 521us 45372us 52726us 133us 55893us
While sequential input and output are, as expected, a bit lower virtualized, the creates and deletes per second are 16 to 21 times faster when virtualized. This is only possible when the cache flushes from writes to the filesystem journal are ignored.
I tried different virtual scsi devices (pvscsi, lsi sas, lsi parallel) and several advanced guest options I found on some vmware blog (scsi0.unsafeReordering=false, scsi0:0.writeThrough= false). But that didn't change anything.
When reading through the forums here I found some people having a similar problem with windows guests and the Forced Unit Access (FUA) feature. But here it is stated that FUA should be supported by VMware ESXi: Forced Unit Access (FUA). But this seems not to work like expected on linux guests.
When completely disabling the disk write cache I get consistent numbers again, but much slower of course. I'd like to be able to use the compromise between speed and integrity. This is because I can control it on a per filesystem basis (mount with data=journal) and not only on per esxi host basis.
How can I get that?