SQL Performance Benchmark of InterSystems IRIS Data Platform
InterSystems has a long-standing goal of making every release of our products faster than the ones that came before. In addition to functionality for leveraging distributed data, InterSystems IRIS Data Platform™ includes enhancements to its single-instance SQL processing, namely:
- Pervasive use of parallel query execution
- Improved heuristics and logic for the query optimizer
- Enhancements to code generation
- Kernel updates for more efficient data access
In this report, we will give the SQL performance results of a benchmark test derived from a real-world solution developed by one of our application partners. Without changing any application code, we observed a 50% increase in throughput between the 2017.2.2 release of InterSystems Caché® and a 2019.1 alpha release of InterSystems IRIS.
To create the benchmark test, we worked with one of our European application partners. Their ERP application has a Java-based front end accessing a complex SQL schema spread across seven databases hosted on a data platform technology from InterSystems. In addition to the SQL engine, the application leverages InterSystems Business Intelligence technology, embedding ad-hoc analysis and dashboard capabilities.
In preparation, we logged over 1 million SQL statements from a production system, including hand-coded and application-generated queries submitted over JDBC and queries generated by Hibernate. This workload accounted for about 800 different SQL queries with varying run-time arguments representing real-world application usage. The test set contained simple queries as well as highly complex ones, with complex aggregates, WHERE clauses using IN and %INLIST constraints, and extensive use of subqueries. The corresponding dataset, against which the queries were run, measured 40 GB.
From this set, we selected 500,000 consecutive statements to make up a standard benchmark. To execute the benchmark, the queries were compiled and then the whole selection was run twice, using the throughput numbers from the second run to represent a “warm” cache as in normal operation. The sequence of queries was replayed from a single process, so as to preserve their natural order and any dependencies on updates to the data happening between SELECT queries. In order to focus on the SQL performance, queries were issued from within the database, thereby discounting the impact of the JDBC driver and Hibernate.
We ran the benchmark on three InterSystems data platform versions:
- InterSystems Caché version 2017.2.2
- InterSystems IRIS version 2018.2, with parallel query execution explicitly turned off
- InterSystems IRIS version 2019.1, with parallel query execution explicitly turned off
- InterSystems IRIS version 2019.1 with parallel query execution turned on (default)
- All tests ran on the same Virtual Machine1, allocating 24GB of global buffers and 512MB of routine buffers.
Figure 1 plots the total duration of the test on the three product releases and shows how the most recent version of InterSystems IRIS takes 55% less time to process the entire set of queries with parallel execution switched on, and 28% with parallel execution explicitly switched off2. None of the performance improvements required changing anything to the schema, queries, or underlying hardware. Therefore, we expect SQL users moving to InterSystems IRIS will see similar results.
One interesting observation from more detailed measurements taken during the test is that, while throughput increased with more than 30%, the number of raw data accesses (global references) also increased by more than 15%. We conclude two things. First, the kernel updates successfully speed up raw data access at the system level. Second, our query optimizer is now making smarter choices, preferring certain query plans that at first sight would require more data to be read and thus be possibly slower, but correctly estimating the impact of the actual data layout, as certain data structures can be read more efficiently than others.
Additional Performance Gains Possible
As stated before, this benchmark quantifies the benefits of moving to InterSystems IRIS without making any changes to the schema or application code. InterSystems expects our partner would achieve further performance gains by making several small changes to their current schema, including adopting a newer storage format3 and switching to a different data type for dates4. Also, this benchmark measured the performance improvements on the SQL side only, whereas InterSystems IRIS introduced certain important enhancements to the JDBC driver as well.
This customer benchmark clearly illustrates how InterSystems continued investment in improving platform performance pays off for our users. The tests quantify performance gains that stem strictly from upgrading, without requiring any changes to the schema or application code. We encourage our customers and partners to migrate to new releases of InterSystems IRIS and recommend evaluating not only the benefits of a raw upgrade, but also of adopting the new capabilities introduced with those releases.
1 – A Windows Server 2012 RS standard with 8 vCPUs at 2.4GHz and 32GB of RAM
2 – Parallel query execution leverages multiple processes to compute the results of a single query. It is switched on by default as of InterSystems IRIS 2018.2.
3 – The USEEXTENTSET class parameter enables a more efficient data structure without any impact on the SQL interface to the data. While it is on by default in the current version, it wasn’t enabled in the partner’s schema.
4 – Posix timestamp format offers a more packed way of storing date information and has been shown to offer small but measurable benefits over regular date formats.