Azure VM Storage Performance - Part 2: Storage Spaces

In the previous post in this series we looked at the performance capabilities of a Azure Standard storage for VMs as a single disk. In this post, we are going to examine how well the performance of Azure Standard Storage can be scaled by pooling multiple disks together using Storage Spaces.

Storage Spaces is one of the parts of Microsoft's Software Defined Storage stack. Using Storage Spaces we can create pools of physical disks onto which logical disks can be deployed, allowing the logical disk to use the aggregate performance of the pool. Storage Spaces can also be used to increase the protection levels of data by creating multiple copies or using parity while distributing data across disks, however this capability isn't particularly relevant as Azure provides multiple copies of all blob storage by default.

To test the performance characteristics of Storage Spaces on Azure we will repeat the tests that were conducted against a single Standard LRS VHD against a two disk and a four disk storage spaces pool to see how well Storage Spaces on Azure scales when you throw more disks at it.

Test Setup

The diskspd tests were conducted in the following Azure VM configuration.

  • Azure Platform: Resource Manager
  • Azure Region: Australia SouthEast
  • VM Size: A2_Standard
  • Storage Accounts: 7 (1 OS / 6 Data, each VHD was deployed to its own storage account)
  • Storage Account Configuration: Standard LRS
  • VHD Caching: None
  • NTFS Cluster Size: 64KB
  • Storage Spaces Configuration:
    • Column Count: Same as number of disks
    • Interleave Size: 256KB
    • Resiliency: Simple (Stripe)
    • Size: Max available

Disk Configurations Tested

  • 2 VHDs
  • 4 VHDs

NB: "A series" VMs in Azure Australia Southeast share the same underlying hardware family as "Dv2" (most of the time).

Tests

The test workload being run is the same as was used for the single VHD test.

  • IO Patterns: Random, Sequential
  • IO R/W: 0/100, 20/80, 50/50, 80/20, 100/0
  • Block Size: 4K, 8K, 64K, 256K, 1M
  • Queue Depth per thread: 1, 2, 16, 32
  • Worker Threads: 2
  • Duration: 300s
  • Warmup/Cooldown: 60s
  • Buffer Size: 1G
  • Software caching: Disabled
  • Hardware caching: Disabled

During the initial testing, each workload run was conducted with only a single Storage Spaces pool deployed. At the end of the first test run, the 2 disk pool was destroyed, the VHDs were detached, and 4 new VHDs attached to create the new pool for second test run.

Test Results - A2_Standard - Storage Spaces

At first glance, the results of running Storage Spaces on Azure look very promising.

  • Max throughput: 398.66 MB/s (4 disk / random / 100% read / 256k block / depth 32)
  • Max IOPs: 1984.41 IO/s (4 disk / sequential / 20% write / 64k block / depth 32)

Looking at these results it would appear that Storage Spaces on Azure performance scales linearly with the number of disks. The total performance of the pool being equal to the sum of the performance of the disks. Great, but lets take a look at the details.

4K Block

If we look at how the 4K block workloads behave across the 3 different disk configurations, some characteristics stand out immediately.

  • Not all workloads see a performance boost by using storage spaces to pool disk performance
  • Workloads which cannot generate significant IO depth do not see linear gains in performance based on the number of underlying disks in the pool.
  • Workloads which were below the 500IOPS target for the 1VHD test saw their performance get worse under storage spaces. The 4K block, 100% read, 1 queue depth test fell to 98 IOPS using a 4 disk storage space.
  • Storage spaces appear to perform better for sequential workloads, although the correlation is incredibly weak.
8K Block

The 8K workload tests followed the pattern of the 4k tests. Things that were very good under a single disk got a lot better with 2 and 4 disks. Things that were not very good under a single disk got worse. It is worth pointing out out that moving up to 8K blocks we see a much significantly greater performance drop off moving from high IO depth to low IO depth.

64K Block

64K workloads again follow the patterns seen before. Good is good. Bad gets worse.

  • When we move up in block size with Storage Spaces the performance between Read/Write and Random/Sequential on Azure Standard LRS becomes more pronounced.
  • At 64K block with 4 disks there is a performance penalty on random write workloads, even with high IO depth.
  • The number of workloads that can reach the peak performance of the disks in the 4 disk pool has fallen significantly.
    • At 8K block, there are 17 workloads that achieve greater than 1900IOPS.
    • At 64K block, there are only 9 workloads that can achieve greater than 19000 IOPS. These workloads are sequential/read heavy.
  • The importance of queue depth is incredibly pronounced when we start moving to larger blocks.
256K Block

At 256K things start to get whacky! In the single disk testing we saw the MB/s ceiling of Azure VHDs come into play. From those tests we found that a single disk could read at ~100MB/s and write at ~55MB/s. However these ceilings don't appear to scale out linearly, especially not for the write throughput limits.

If we focus on the write workloads, we see that the write performance of two disks peaks at 91MB/s, and the write performance of 4 disks can only reach 106MB/s. This is significantly less than the ~200MB/s I was expecting for the write performance.
Focusing on the write workloads we can also, again, highlight the importance of IO depth to performance when using storage spaces. Poor queue depth actually causes the four disk pool to deliver lower write performance than the two disk pool.

1M Block

At 1M, we see the read workloads reach the throughput limit of the storage spaces pools at ~400MB/s. The write workloads also peak at 105MB/s.

At this block size we also see a penalty to random write performance regardless of IO depth on the four disk pool. The random 100% write workload with 32 IO depth shows worse performance on the four disk pool compared to the two disk pool.

Something isn't right here.

So as I worked my way through these results something just didn't feel right about the Storage Spaces results. My initial thought was that I had incorrectly configured the storage pools or virtual disks. After reviewing the configuration that I had applied, everything appeared correct. I had the number of columns matching the number of underlying Azure VHDs, I was using Simple resiliency, and my interleave size was 256KB.

I did experiment with adjusting the interleave size. I rebuilt the 4 disk pool using a 64KB interleave size and reran a subset of the tests that showed unusual performance. There was no significant difference between the results.

I then found an article suggesting that Storage Spaces on Azure should format their volumes with the -UseLargeFRS switch. I rebuilt the storage space back to 256KB interleave and formatted my disk with this additional switch, then reran some of the tests. Still no significant change.

Then it occurred to me that I may have hit a write throughput ceiling of the A2 VM that wasn't documented anywhere. So I scaled the VM up to an A7 (8 Core/56GB) and reran some of the high throughput tests. Again, to my disappointment, there was no significant difference.

What else can I tune? Well, I had initially picked my worker thread count based on the number of CPUs available to the A2 VM. Now that I was testing on a A7, I decided to test it with more worker threads. So I reran the write intensive tests with 8 threads. Maybe Storage Spaces needs more worker threads to drive the performance of all the disks in the pool, that sort of makes sense right? Nope. No significant improvement.

Maybe storage spaces has some limit on the amount of throughput it can push through a single file. My tests run against a single 100GB data file. Maybe if we do it with 4 files we'll see an improvement? Nope. Still ~110MB/s write throughput. I actually tried 4 files with an assortment of different thread counts and IO depths to see if there was some sort of optimal configuration of workers/total IO depth. Turns out whatever the configuration I ran I couldn't get past ~115MB/s

Well I'm out of ideas now. So lets do what we do in the world of traditional SANs: throw more disks at the problem. I attach 4 additional VHDs to my test VM and build an 8 disk pool. Fancy that. I'm getting slightly more performance.
With an 8 disk pool running a 256K / Sequential / 100% Write / 8 thread / 16 IO Depth test against 4 100GB test files I can get from 120-160MB/s write performance against an expected ~400MB/s.
Just to confirm that I'm running into a write specific issue I rerun a read workload. The outcome is as expected, I'm getting 800MB/s read.

There is something very unusual about how write performance scales with Storage Spaces on Azure.

Inconclusive Conclusions

Based on the data I have so far there is some unexpected behavior when working with Storage Spaces on Azure Standard Storage.

  • Read performance is as expected and scales linearly with the number of disks used.
  • Write performance gains by adding more disks taper off dramatically after 2 disks. The performance gains of going from 4-8 disks inside a single pool do not scale to the expected performance of the number of disks.

Whats next? I'm thinking it might be time to try multiple pools inside a single VM and see if this is a Storage Spaces problem or an Azure VM to Azure Storage throughput issue.

comments powered by Disqus