Performance Testing Series
This post is part of a series of blog posts about my performance tests in Azure SQL Database. For the first post in this series (and links to all of the other posts) please see here. For a summary of all the results please see here.
For a general overview of the test architecture, test components and test types, please see here.
Bulk Load Test Overview
The Bulk Load tests involve using the .NET SqlBulkCopy class to load data. Loading batches of rows in this way is significantly more efficient than loading single rows via stored procedure. This also allows the Bulk Load tests to provide a better estimate of the log write rate limit of each service tier.
The Cloud Service fires up a number of worker threads (as specified in the test definition). The worker threads generate random data to be inserted. Row Id values (the primary key field) are generated sequentially. The worker threads then load the rows in batches into the test database.
The worker threads in the UT tests are not limited to a specified request rate. As soon as each bulk load batch of rows has completed another is generated (i.e. each request = one batch = multiple rows). The worker threads in the LT tests are constrained to generate batch load requests at a limited rate specified in the test definition. The LT tests gradually increase the batch size over the course of a number of tests. This is in contrast to the other LT tests which increase the request rate over the course of a number of tests. A few different permutations of thread count and batches per second are also tested.
UT Test Results
Results from the 30 UT Bulk Load tests are shown in the two charts below. In these charts, the “T” values are the number of worker threads generating requests against the database, e.g. “Business 4T” = a test against Business edition, running with 4 threads continuously generating requests.
The two charts are similar because the average row size was around 400 bytes throughout. The data volume figures here are based on the data content (i.e. summing the sizes of the data values, according to the number of bytes each data value requires in the SQL Server Data Type scheme but ignoring internal SQL Server overheads).
Several different tests of the current Web/Business edition were carried out. The Web 1T test was carried out on a A1 sized cloud instance, the Business 1T test on an A2 sized instance. This explains the higher throughput on the Business 1T test. As the number of threads was increased in the Business edition tests, the throughput only increases marginally.
Performance of Business Edition is considerably more variable than the new service tiers which show good consistency. Business edition also considerably outperforms all of the new service tiers except P2 which generally equals Business edition here, with the exception of one Business Edition test which managed to hit 6000 rows per second.
The performance difference between S2 and P1 in this test is surprisingly small. This was also seen in the Inserts test (see previous post).
I will demonstrate below that all of the test results for the new service tiers shown above are limited by the SQL Server Log Write Rate limit, i.e. this is the determining factor in the performance.
The charts below show the performance profiles over the lifetime of the tests in the charts above. Since 30 lines is rather a lot to fit onto one chart, data for the different editions / tiers has been split across several charts.
It is also worth noting that due to the way the Azure SQL Database logging into sys.resource_stats works, data is only available at five minute intervals and the last data point in each test (which would have been plotted as minute 27 in these charts) is often not useful and so is generally omitted.
Basic to P1 Tiers
Performance of the new tiers is very consistent. The charts show that these tests weren’t constrained on CPU but were clearly constrained on log write quota. CPU Utilisation % is significantly less for the higher tier tests, despite the workload also increasing for these tests, indicating these premium tiers have significantly higher CPU quota than the lower tiers.
Std2 to P2 Tiers
Again, these profiles illustrate good consistency.
These profiles show the much more variable nature of the Business Edition. Those tests where multiple threads are executing show considerably more variable performance, sometimes wildly varying. Again, crudely speaking, this is the Wild Wild West of performance.
In addition, the Log Write profile looks suspiciously odd, being at exactly 100% for all data points in all tests. More on this below.
LT Test Results
Three series of LT tests were carried out, against the Web, Basic and S1 editions / tiers.
The chart below illustrates how the performance of the Basic Tier varies as the workload (batch size, i.e. number of rows per batch) is increased. A 30 minute test was conducted at each batch size.
As the batch size is increased, the CPU utilisation and log write utilisation increase. At a batch size of around 500 rows, we have hit the log write limit and the database is not able to keep up with the incoming volume of data.
Performance is very consistent throughout.
Again, a similar type of profile to the Basic tier database – in this case we hit the log write rate limit around a batch size of 1000 rows.
The first point to make about this profile is that the maximum row rate reached (around 1000 rows per second) is considerably less than the maximum reached in the equivalent Web Edition UT tests (around 2500 rows per second). This difference is because this LT test was conducted using an A0 cloud instance, where as the UT test was conducted using an A1 cloud instance which offers around four times the CPU power. Clearly, an A0 instance was insufficient and we actually start missing requests not because SQL Database is blocking, but because the cloud service cannot generate the requests fast enough.
Noting that only one batch is sent per second, it is likely that having additional threads running would have also mitigated this (to remove the sequential nature of activity on a single thread – i.e. to allow a second thread to prepare the test data for the next batch whilst the first thread was awaiting the response from the current batch request).
It is also interesting to note that this chart proves the well known fact that there is no clearly enforced log write limit per database in the Web Edition. I.e. we apparently hit the reported log write limit at a batch size of 200 rows, but we can actually continue increasing to 1000 in this test (and up to 2500 rows per second in the UT test).
Bulk Load Test Conclusions
The Bulk Load Tests have shown that, for a bulk load workload (of average row size 400 bytes), performance of the new P2 tier generally equals or exceeds the current Business Edition. Business edition generally outperforms P1. The new tiers also show an impressive level of consistency compared to Web / Business edition.
|Service Tier||Bulk Load Limit||Bulk Load Limit Comparison|
|Web / Business =||93.7 MB per minute||–|
|Basic =||9.3 MB per minute||10% of Web / Business|
|Standard S1 =||18.7 MB per minute||20% of Web / Business|
|Standard S2 =||37.5 MB per minute||40% of Web / Business|
|Premium P1 =||46.9 MB per minute||50% of Web / Business|
|Premium P2 =||91.3 MB per minute||97% of Web / Business|
Appendix – UT Test Configuration
|Cloud Svc Inst. Size||A1||A2||A2||A2||A3||A1||A1||A2||A2||A2|
|Req. Gen. Thread Count||1||1||2||4||8||1||1||1||2||4|
|Batch Size (Rows / Batch)||1000||1000||1000||1000||1000||1000||1000||1000||1000||1000|
|Initial Test Data (MB)||0||0||0||0||0||0||0||0||0||0|
|Test Tables Used||1||1||2||4||8||1||1||1||2||4|
Appendix – UT Test Results
|Configuration||Avg Rows Per Second||Avg MB Per Minute|
|Std S1 1T||821||908||909||17.5||19.4||19.4|
|Std S2 1T||1641||1817||1816||34.9||38.7||38.7|
|Prem P1 2T||2057||2268||2273||43.9||48.5||48.5|
|Prem P2 4T||3878||4458||4502||82.8||95.2||96.0|
|Configuration||SQL Avg Log Write %||SQL Avg CPU %|
|Std S1 1T||99.7||99.8||99.9||36.5||36.5||35.5|
|Std S2 1T||99.5||99.9||92.9||34.7||34.7||33.3|
|Prem P1 2T||99.7||99.9||99.9||14.4||10.7||11.4|
|Prem P2 4T||96.7||98.3||99.0||13.6||13.8||14.1|
|Configuration||Cloud Svc Avg CPU||Error Count|
|Std S1 1T||6.7||6.9||7.0||0||0||0|
|Std S2 1T||6.8||7.6||6.9||0||0||0|
|Prem P1 2T||9.1||8.8||9.5||0||0||0|
|Prem P2 4T||16.4||18.8||18.3||0||0||0|
Appendix – LT Test Configuration
|Cloud Svc Inst. Size||A0||A0||A0|
|Initial Test Data (MB)||0||0||0|
As discussed in the LT results above, A0 offered insufficient CPU power to reach the throughput maximum seen in the UT Web Edition test.
|Thread Count||Batches Per Second||Web Min BS||Web Max BS||Basic Min BS||Basic Max BS||Std1 Min BS||Std1 Max BS|
Appendix – LT Test Results
|Req. Per Sec||Batch Size||Avg Rows Per Second||Avg MB Per Minute|
|Web||Basic||Std 1||Web||Basic||Std 1|
|Req. Per Sec||Batch Size||SQL Avg Log Write %||SQL Avg CPU %|
|Web||Basic||Std 1||Web||Basic||Std 1|