A New Model for Scaling SQL Applications with InterSystems IRIS
When we set out to create the InterSystems IRIS Data Platform, we were inspired by new challenges we saw in delivering data-rich applications. Looking from one market to another, we consistently found application developers eager to compete by offering their customers access to more data, faster, while scaling to support an ever-growing user base. Our existing data platform products represented decades of innovation. Developers had taken advantage of our flexible data model and high-throughput ingestion and query capabilities to build powerful, reliable applications with unmatched performance. Could we build on these successes to give our customers and partners a data platform that could support 10 times the data ingestion and query volume on commodity infrastructure without sacrificing performance, reliability, or intuitive development?
With the release of the InterSystems IRIS Data Platform, we’ve done just that. I can’t wait to see what our customers and partners build on it.
For decades, the answer to every scaling question for a traditional database was “bigger”: This vertical scaling meant heftier CPUs, larger memories and disks in a single system. But year-over-year, single-threaded performance gains have slowed dramatically. Virtualization and container technology have amplified the economies of scale for commodity hardware. Increasingly, to deal with a growing data and user volume, the answer is “more”: horizontal scaling across a cluster of cooperating systems is required.
While developing what would become the InterSystems IRIS Data Platform, our first model of a horizontal scaling challenge came from the world of finance: Investment banks are increasingly dedicated to standardizing their computing infrastructure, both to reduce costs and to ensure that new applications were “built to scale” across virtualized commodity hardware rather than requiring repeated, disruptive rip-and-replace upgrades to larger and larger non-commodity systems.
Many of these applications had built their data access layer on SQL for its long-proven elegance and reliability, but traditional SQL databases were ill-suited to scale horizontally. Developers were confronted with a choice: Abandon SQL for a more code-heavy, complex “NoSQL”-based solution — re-implementing over and over the query planning and processing that an SQL database provides — or adopt a traditional SQL partitioning scheme that would place the burden of query distribution and re-aggregation on their application.
The InterSystems IRIS Data Platform offers a powerful alternative. Our sharded SQL architecture transparently distributes data and queries across a cluster. Large tables can have their data “sharded,” distributed automatically either based on a system-assigned identifier, or by a user-specified shard key. Queries are transparently re-written to execute in parallel on each shard’s data, and the results are then re-aggregated as needed. Using InterSystems high-performance distributed caching technology, the data in non-sharded tables or sharded tables distributed using any strategy can be accessed locally anywhere in the cluster—unlike traditional sharded SQL technologies, no physical replication is needed. The resulting panoramic view of the data allows for arbitrary joins between all database tables, whether sharded or not. Massive quantities of historical data can be directly combined with incoming real-time data in a single query. Query processing will always occur as close to the data required as possible—and in parallel across the cluster—to minimize latency and maximize throughput.
Transparently distributed SQL queries are only the beginning of what the InterSystems IRIS Data Platform provides for data-rich applications—and we’re just getting started. We have always believed in giving developers access to their data through whatever tools suit their application best. Applications demanding the ultimate in low-latency, high-bandwidth ingestion and extraction can use a shared memory connection directly to the core data structures of InterSystems IRIS. “Big data” cluster computing applications can connect InterSystems IRIS to Apache Spark to produce and consume data in parallel at tremendous rates. Java applications can integrate the InterSystems IRIS JDBC driver to take advantage of automatic multicore execution of SQL queries and a high-speed result-set transfer mode.
There’s so much technology in InterSystems IRIS that I’m proud of and excited about, but I’m most excited about the incredible high-performance, scalable applications developers will use that technology to create.
Lastly, we recently did a study with Enterprise Strategy Group, who executed in a proof-of-concept environment with comparisons to three common databases. Their results were exceptionally positive for InterSystems IRIS, and we invite you to download the analyst report here.
Tom Woodfin is a Development Manager at InterSystems, responsible for the InterSystems IRIS SQL engine and other query and analytics technologies. A 15-year veteran of InterSystems and database technology development, he holds degrees in Computer Science and Philosophy from the Massachusetts Institute of Technology.