InterSystems Caché Technology Guide

Chapter Two:
Caché's Multidimensional Data Server

Caché’s high-performance database uses a multidimensional data engine that allows efficient and compact storage of data in a rich data structure. Objects and SQL are implemented by specifying a unified data dictionary that defines the classes and tables and provides a mapping to the multidimensional structures – a mapping that can be automatically generated.

INTEGRATED DATABASE ACCESS

Caché gives programmers the freedom to store and access data through objects, SQL, or direct access to multidimensional structures. Regardless of the access method, all data in Caché’s database is stored in Caché’s multidimensional arrays.

Multidimensional Access
Multidimensional Access

Once the data is stored, all three access methods can be simultaneously used on the same data with full concurrency.

A unique feature of Caché is its Unified Data Architecture. Whenever a database object class is defined, Caché automatically generates a SQL-ready relational description of that data. Similarly, if a DDL description of a relational database is imported into the Data Dictionary, Caché automatically generates both a relational and an object description of the data, enabling immediate access as objects. Caché keeps these descriptions coordinated; there is only one data definition to edit. The programmer can edit and view the dictionary both from an object and a relational table perspective.

Caché automatically creates a mapping for how the objects and tables are stored in the multidimensional structures, or the programmer can explicitly control the mapping.

THE CACHÉ ADVANTAGE

Flexibility: Caché’s data access modes – Object, SQL, and multidimensional – can be used concurrently on the same data. This flexibility gives programmers the freedom to think about data in the way that makes the most sense and to use the access method that best fits each program’s needs.

Less Work: Caché’s Unified Data Architecture automatically describes data as both objects and tables with a single definition. There is no need to code transformations, so applications can be developed and maintained more easily.

Leverage Existing Skills and Applications: Programmers can leverage existing relational skills and introduce object capabilities gradually into existing applications as they evolve.

MULTIDIMENSIONAL DATA MODEL

At its core, the Caché database is powered by an extremely efficient multidimensional data engine. The built-in Caché scripting languages support direct access to the multidimensional structures – providing the highest performance and greatest range of storage possibilities – and many applications are implemented entirely using this data engine directly. Direct "global access" is particularly common when there are unusual or very specialized structures and no need to provide object or SQL access to them, or where the highest possible performance is required.

There is no data dictionary, and thus no data definitions, for the multidimensional data engine.

Rich Multidimensional Data Structure

Caché’s multidimensional arrays are called “globals”. Data can be stored in a global with any number of subscripts. What's more, subscripts are typeless and can hold any sort of data. One subscript might be an integer, such as 34, while another could be a meaningful name, like “LineItems” – even at the same subscript level.

For example, a stock inventory application that provides information about item, size, color, and pattern might have a structure like this:

^Stock(item,size,color,pattern) = quantity

Here’s some sample data:

^Stock(“slip dress”,4,”blue”,”floral”) = 3

With this structure, it is very easy to determine if there are any size 4 blue slip dresses with a floral pattern – simply by accessing that data node. If a customer wants a size 4 slip dress and is uncertain about the color and pattern, it is easy to display a list of all of them by cycling through all the data nodes below:

^Stock(“slip dress”,4)

In this example, all of the data nodes were of a similar nature (they stored a quantity), and they were all stored at the same subscript level (four subscripts) with similar subscripts (the third subscript was always text representing a color). However, they don't have to be. Data nodes may have a different number or type of subscripts, and they may contain different types of data.

Here’s an example of a more complex global with invoice data that has different types of data stored at different subscript levels:

^Invoice(invoice #,”Customer”) = Customer information
^Invoice(invoice #,”Date”) = Invoice date
^Invoice(invoice #,”Items”) = # of Items in the invoice
^Invoice(invoice #,”Items”,1,”Part”) = part number of 1st Item
^Invoice(invoice #,”Items”,1,”Quantity”) = quantity of 1st Item
^Invoice(invoice #,”Items”,1,”Price”) = price of 1st Item
^Invoice(invoice #,”Items”,2,”Part”) = part number of 2nd Item
etc.

Multiple Data Elements per Node

Multiple Data Elements per NodeOften only a single data element is stored in a data node, such as a date or quantity, but sometimes it is useful to store multiple data elements together in a single data node. This is particularly useful when there is a set of related data that is often accessed together. It can also improve performance by requiring fewer accesses of the database.

For example, in the above invoice, each item included a part number, quantity, and price all stored as separate nodes, but they could be stored as a list of elements in a single node:

^Invoice(invoice #,”LineItems”,item #)

To make this simple, Caché supports a function called $list(), which can assemble multiple data elements into a length delimited byte string and later disassemble them. Elements can in turn contain sub-elements, etc.

Logical Locking Promotes High Concurrency

In systems with thousands of users, reducing conflicts between competing processes is critical to providing high throughput. One of the biggest conflicts is between transactions wishing to access the same data.

Caché processes don't lock entire pages of data while performing updates. Instead, because transactions require frequent access or changes to small quantities of data, database locking in Caché is done at a logical level. Database conflicts are further reduced by using atomic addition and subtraction operations, which don't require locking. (These operations are particularly useful in incrementing counters used to allocate ID numbers and for modifying statistics counters.)

With Caché, individual transactions run faster, and more transactions can run concurrently.

Variable Length Data in Sparse Arrays

Because Caché data is inherently variable length and is stored in sparse arrays, Caché often requires less than half of the space needed by a relational database. In addition to reducing disk requirements, compact data storage enhances performance because more data can be read or written with a single I/O operation and data can be cached more efficiently.

Declarations and Definitions Aren’t Required

Caché multidimensional arrays are inherently typeless, both in their data and subscripts. No declarations, definitions, or allocations of storage are required. Global data simply pops into existence as data is inserted.

Namespaces

In Caché, data and code are stored in disk files with the name CACHE.DAT (only one per directory). Each such file contains numerous globals (multidimensional arrays). Within a file, each global name must be unique, but different files may contain the same global name. These files may be loosely thought of as databases.

Rather than specifying which database file to use, each Caché process uses a “namespace” to access data. A namespace is a logical map that maps the names of multidimensional global arrays and code to databases. If a database is moved from one disk drive or computer to another, only the namespace map needs to be updated. The application itself is unchanged.

Usually, other than some system information, all data for a namespace is stored in a single database. However, namespaces provide a flexible structure that allows arbitrary mapping, and it is not unusual for a namespace to map the contents of several databases, including some on other computers.

THE CACHÉ ADVANTAGE

Performance: By using an efficient multidimensional data model with sparse storage techniques instead of a cumbersome maze of two-dimensional tables, data access and updates are accomplished with less disk I/O. Reduced I/O means that applications will run faster.

Scalability: The transactional multidimensional data model allows Caché-based applications to be scaled to many thousands of clients without sacrificing high performance. That’s because data access in a multidimensional model is not significantly affected by the size or complexity of the database in comparison to relational models. Transactions can access the data they need without performing complicated joins or bouncing from table to table.

Caché's use of logical locking for updates instead of locking physical pages is another important contributor to concurrency, as is its sophisticated data caching across networks.

Rapid Development: With Caché, development occurs much faster because the data structure provides natural, easily understood storage of complex data and doesn’t require extensive or complicated declarations and definitions. Direct access to globals is very simple, allowing the same language syntax as accessing local arrays.

Cost-Effectiveness: Compared to similarly sized relational applications, Caché-based applications require significantly less hardware and no database administrators. System management and operations are simple.


SQL ACCESS

SQL is the query language for Caché, and it is supported by a full set of relational database capabilities – including DDL, transactions, referential integrity, triggers, stored procedures, and more. Caché supports access through ODBC and JDBC (using a pure Java-based driver). SQL commands and queries can also be embedded in Caché ObjectScript and within object methods.

SQL accesses data viewed as tables with rows and columns. Because Caché data is actually stored in efficient multidimensional structures, applications that use SQL achieve better performance with Caché than with traditional relational databases.

Caché supports, in addition to the standard SQL syntax, many of the commonly used extensions in other databases so that many SQL-based applications can run on Caché without change – especially those written with database independent tools. However, vendor-specific stored procedures will require some work, and InterSystems has translators to help with that work.

Caché SQL includes object enhancements that make SQL code simpler and more intuitive to read and write.

Traditional SQL

SELECT
SC.FullName, SM.Descr, MS.Value, SI.InvDate, SI.InvNumber

FROM
MainSales MS, SalesItemSI, SalesProduct SP, Sales Customer SC, SalesMarket SM

WHERE
SI.SalesItemID *=MS.SalesItem
AND SP.SalesProductID *=MS.Product
AND SC.SalesCustomerID *=MS.Customer
AND SM.SalesMarketID *= SC.SalesMarket
AND SP.DESCR = "Hammer"

Object Extended SQL

SELECT
Customer->FullName,
Customer->SalesMarket->Descr, Value,
SalesItem->InvData, SalesItem->InvNumber

FROM
MainSales

WHERE
Product->Descr = 'Hammer'


Relational Gateway
Relational Gateway

Accessing Relational Databases with Caché Relational Gateway

The Caché Relational Gateway enables a SQL request that originates in Caché to be sent to other (relational) databases for processing. Using the Gateway, a Caché application can retrieve and update data stored in most relational databases.

Additionally, if Caché database classes are compiled using the CachéSQLStorage option, the Gateway allows Caché applications to transparently use relational databases. However, applications will run faster and be more scalable if they access Caché.

 

THE CACHÉ ADVANTAGE

Faster SQL: Relational applications can enjoy significantly enhanced performance by using Caché SQL to tap into Caché.

Faster Development: In Caché, SQL queries can be written more intuitively, using fewer lines of code.

Compatibility with Relational Applications and Report Writers: Caché’s native ODBC and JDBC drivers provide high-performance access to the Caché database for relational applications and reporting tools. The Caché Relational Gateway enables Caché applications to use SQL to access other (relational) databases.

CACHÉ OBJECTS

Caché’s object model is based upon the ODMG standard. Caché supports a full array of object programming concepts, including encapsulation, embedded objects, multiple inheritance, polymorphism, and collections.

The built-in Caché scripting languages directly manipulate these objects, and Caché also exposes Caché classes as Java, EJB, COM, .NET, and C++ classes. Caché classes can also be automatically enabled for XML and SOAP support by simply clicking a button in the Studio IDE. As a result, Caché objects are readily available to every commonly used object technology.

There are several ways for a program outside of the Caché Application Server to access Caché classes:

  1. Any Caché class can be projected as a class in the native language. When a Java, C++, C#, or other program accesses a Caché object, it calls a template of the class in the native language. That template class (which is automatically generated by Caché) communicates with the Caché Application Server to invoke methods on the Caché server and to access or modify properties. State for the Caché objects is maintained in the Caché Application Server. To speed execution and reduce messaging, Caché caches a copy of the object's data on the client and piggybacks updates with other messages when possible.

  2. A “lighter-weight” projection can be used for database classes in which the native language template class directly accesses the database – bypassing the application server. The object’s state is not kept on the application server; the in-memory properties are only maintained in the client. This approach provides significantly higher throughput but less functionality, since server-side instance methods of the class (i.e., methods that need access to the in-memory properties) cannot be invoked.

  3. InterSystems Jalapeño™ technology allows Java developers to first create Java database classes just like any other POJO (plain old Java object) class in their IDE of choice and then have Caché automatically generate a database schema and corresponding Caché class. Using this approach, the Java class is unchanged, and the application continues to access its properties and methods. Caché provides a library class (“Object Manager”) with an API that is used to store and retrieve database objects and issue queries.

With each of these three approaches, the object appears to be local to the user program. Caché transparently handles all communications, using either call-in or TCP.

The Java template and supporting library is completely Java-based, so it can be used across the Web or on specialized Java devices.

Method Generators

Caché includes a number of unique advanced object technologies – one of which is method generators. A method generator is a method that executes at compile time, generating code that can run when the program is executed. A method generator has access to class definitions, including property and method definitions and parameters, to allow it to generate a method that is customized for the class. Method generators are particularly powerful in combination with multiple inheritance – functionality can be defined in a multiply inherited class that customizes itself to the subclass.

THE CACHÉ ADVANTAGE

Caché is fully object-enabled, providing all the power of object technology to developers of highperformance transaction processing applications.

Rapid Application Development: Object technology is a powerful tool for increasing programmer productivity. Developers can think about and use objects – even extremely complex objects – in simple and realistic ways, thus speeding the application development process. Also, the innate modularity and interoperability of objects simplifies application maintenance and lets programmers leverage their work over many projects.

Natural Development: Database objects appear as objects native to the language being used by the developer. There is no need to write tedious code to decompose objects into rows and columns and later reassemble them.


WORD-AWARE TEXT SEARCHING

Caché supports free text searching in which queries can search for text containing words of interest, even though the actual words in the text may be variants of the search words.

To utilize Word-Aware searching, the text field must be Word-Aware indexed, which occurs in the following steps:

  1. Discrete words in the text field are first identified.

  2. Words that are so common as to provide little search value are removed (e.g., words such as “the” or “for” are removed).

  3. The remaining words are reduced to their stem words (e.g., “searching” becomes “search” and “flowers” becomes “flower”).

  4. The resulting words are indexed.

In Word-Aware searching, the search text is usually first processed in a similar manner, and then the index is used to produce matches.

Word-Aware Indexing

Word-Aware indexes are maintained by object and SQL updates. Searching is most commonly done through SQL queries, although procedural code can use the indexes directly. Such queries may include AND/OR logic for more sophisticated searches.

Word-Aware algorithms are specific to the natural language being used. Word-Aware searching is available in Caché for a wide range of natural languages, including English, French, German, Italian, Japanese, Portuguese, and Spanish. Others are being added.

Word-Aware Searching

THE CACHÉ ADVANTAGE

Powerful Unstructured Text Searches: Unstructured text, such as physician’s notes or documents, can be easily searched for keywords and related words.

Extremely Rapid Searches: Coupling Word-Aware with Caché bit-map technology, searching of massive quantities of text can be performed in a fraction of a second.

 

TRANSACTIONAL BIT-MAP INDEXING

Caché uniquely provides Transactional Bit-Map Indexing, which can radically increase performance of complex queries giving fast data warehouse query performance on live data.

Traditional and Bit-Map Indexing

Database performance is critically dependent on having indexes on properties that are frequently used in searching the database. Most databases use indexes that, for each possible value of the column or property, maintain a list of the IDs for the rows/objects that have that value.

A bit-map index is another type of index. Bit-map indexes contain a separate bit map for each possible value of a column/ property, with one bit for each row/ object that is stored. A 1 bit means that the row/object has that value for the column/property.

The advantage of bit-map indexes is that complex queries can be processed by performing Boolean operations (AND, OR) on the indexes – efficiently determining exactly which instances (rows) fit the query conditions, without searching through the entire database. Bit-map indexes can often boost response times for queries that search large volumes of data by a factor of 100 or more.

Bit-maps traditionally suffer from two problems: a) they can be painfully slow to update in relational databases, and b) they can take up far too much storage. Thus, with relational databases, they are rarely used for transaction processing applications.

Caché has introduced a new technology – Transactional Bit-Map Indexing – that leverages multidimensional data structures to eliminate these two problems. Updating these bit-maps is often faster than traditional indexes, and they utilize sophisticated compression techniques to radically reduce storage. Caché also supports sophisticated “bit-slicing” techniques. The result is ultra fast bit-maps that can often be used to search millions of records in a fraction of a second on an online transaction-processing database. Business intelligence and data warehousing applications can work with “live” data.

Caché offers both traditional and transactional bit-map indexes. Caché also supports multi-column indexes. For example, an index on State and Car Model can quickly identify everyone who has a car of a particular type that is registered in a particular state.

THE CACHÉ ADVANTAGE

Radically Faster Queries: By using transactional bit-map techniques, users can get blazing fast searches of large databases – often millions of records can be searched in a fraction of a second – on a system that is primarily used for transaction processing.

Real-Time Data Analytics: Caché’s Transactional Bit-Map Indexing allows real-time data analytics on up-to-the-minute data.

Lower Cost: There is no need for a second computer dedicated to data warehouse and decision support. Nor is there any need for daily operations to transfer data to such a second system or database administrators to support it.

Scalability: The speed of transactional bit-maps enhances the ability to build systems with enormous amounts of data that need to be maintained and periodically searched.


ENTERPRISE CACHE PROTOCOL FOR DISTRIBUTED SYSTEMS

Scalable Performance in Distributed Systems

InterSystems’ Enterprise Cache Protocol (ECP) is an extremely high-performance and scalable technology that enables computers in a distributed system to use each other’s databases. The use of ECP requires no application changes – applications simply treat the database as if it was local.

Here’s how ECP works: Each Caché application server includes its own Caché data server, which can operate on data that resides in its own disk systems or on blocks that were transferred to it from another Caché data server by ECP. When a client makes a request for information that is maintained on a remote data server, the application server will attempt to satisfy the request from its local cache. If it cannot, it will request the necessary data from the remote data server. The reply includes the database block(s) where that data was stored. These blocks are cached on the application server, where they are available to all applications running on that server. ECP automatically takes care of managing cache consistency across the network and propagating changes back to data servers.

The performance and scalability benefits of ECP are dramatic. Clients enjoy fast responses because they frequently use locally cached data. And caching greatly reduces network traffic between the database and application servers, so any given network can support many more servers and clients. However, while most applications benefit from ECP, there are some whose architecture does not readily support such scaling. Benchmarking is recommended, and often a few simple changes will increase performance.

Enterprise Cache Protocol
DataServer

Easy to Use – No Application Changes

The use of ECP is transparent to applications. Applications written to run on a single server run in a multi-server environment without change. To use ECP, the system manager simply identifies one or more data servers to an application server and then uses Namespace Mapping to indicate that references to some or all global structures (or portions of global structures) refer to that remote data server.

Configuration Flexibility

Every Caché system can function both as an application server and as a data server for other systems. ECP supports any combination of application servers and data servers and any point-to-point topology of up to 255 systems.

THE CACHÉ ADVANTAGE

Massive Scalability: Caché’s Enterprise Cache Protocol allows the addition of application servers as usage grows, each of which uses the database as if it was a local database. If disk throughput becomes a bottleneck, more Data Servers can be added and the database becomes logically partitioned.

Higher Availability: Because users are spread across multiple computers, the failure of an application server affects a smaller population. Should a data server “crash” and be rebooted, or a temporary network outage occur, the application servers can continue processing with no observable effects other than a slight pause. Configuring data servers as a failover hardware cluster with backup data servers can significantly enhance availability.

Lower Costs: Large numbers of low-cost computers can be combined into an extremely powerful system supporting massive processing – “grid computing”.

Transparent Usage: Applications don’t need to be written specifically for ECP – Caché applications can automatically take advantage of ECP without change.


FAULT TOLERANCE

Even in the most rigorous environments unexpected events can occur – hardware failure, power loss, or something as severe as a flood or other natural disaster – yet hospitals, telecommunications, financial services and other critical operations cannot afford to be “down”. To meet such exacting standards, Caché is designed to recover gracefully from outages and offers a variety of fail-over and other options to reduce or eliminate the impact on users.

Caché Write-Image Journaling and other integrity features ensure database integrity for most types of hardware failures – including power outages – allowing rapid recovery while minimizing the impact on users.

Caché also provides advanced high-availability configuration options to further reduce or eliminate user impact, including:

  • Fail-over Clusters
  • Shadow Servers
  • Distributed ECP

Fail-over Clusters

Using fail-over clustered hardware, data servers share access to the same disks, but only one is actively running Caché at a time. If the active server fails, Caché is automatically started on another server that takes over the processing responsibilities. The users can immediately sign back on to the new server.

Shadow Servers

Caché Shadow Servers are backup servers that are “loosely connected” through TCP. The primary server is constantly sending a logical record of database updates to the Shadow Server so that the Shadow Server always has an “almost-up-to-date” copy of the database. Switching to the Shadow is less automated than with fail-over clusters, but survivability is improved because the hardware is not physically connected – the Shadow Server may even be at a different location.

A Shadow Server can be mixed with a Fail-over Cluster, further enhancing fault tolerance.

Distributed ECP

For distributed systems using Enterprise Cache Protocol (ECP), upon a temporary network outage or a data server crash and reboot, the application servers attempt to reconnect. If a successful reconnect occurs within a specified time period, the application servers resend any uncompleted requests and operations continue with no observable effect to users other than a slight pause.

If an ECP application server fails, only the users on the failed application server are affected. They can then sign on to another application server to continue working.

An ECP data server is frequently configured as a Fail-over Cluster. If the primary data server crashes, the backup data server takes over for the failed data server, allowing uninterrupted operation with users experiencing only a slight pause.

An ECP Fail-over Cluster
An ECP Fail-over Cluster

THE CACHÉ ADVANTAGE

Bullet-Proof Database: Caché Write-Image Journaling and other integrity features ensure database integrity for most types of hardware failures, including power outages.

High-Availability Fault- Tolerant Configurations: The use of Caché Shadow Servers, ECP, and/or Fail-over Clusters allow rapid recovery from outages while minimizing, or in some cases eliminating, their impact on users.

 

SECURITY MODEL

Caché is certified for Common Criteria EAL 3. Caché has a modern security model, designed to support application development in three ways:

Caché Security Model

Caché provides these security capabilities while minimizing the burden on application performance.

Users, Roles, Resources, and Privileges

There are a variety of resources (such as databases, applications, and system services) and users must be granted permission (such as READ, WRITE, or USE) to use them by the security administrator. In addition to the system-defined resources, the security administrator can create application-specific resources and use the same mechanisms for granting and checking permissions.

For simplicity, users are usually assigned one or more “roles” (e.g., “LabTech”, or “Payroll”), and the security administrator then grants privileges for a particular resource to those roles rather than to individual users. The user inherits all of the privileges granted to the roles it is assigned.

Every process has an associated username, even if it is only “UnknownUser”. The username is established during “authentication”. A simple example of authentication is when a user enters a username and password and the system checks to see that the correct password was entered. Following authentication, the username is assigned to the process and the permissions associated with that username are granted. (A “user” is not necessarily a human being. It could, for example, be a measurement device generating data or an application running on another system that is connected to Caché.) If a user does not go through authentication, it has a username of “UnknownUser”, which only entitles that process to the permissions granted to everyone.

Connection to Caché is controlled by a set of Services. Each Service specifies whether it is Public – which means anyone can use it – or whether it requires authentication and, following authentication tests, whether the user has the required access privilege. Services can also be individually disabled, so that access is denied to everyone.

The assignation and management of privileges is normally performed through the Caché Management Portal.

Application-Assigned Roles

It is often useful for a user to temporarily gain additional privileges rather than have them permanently assigned. For example, rather than the security administrator granting a broad set of privileges to a user (such as the ability to access and modify the payroll database), the user can instead be given just the privilege to access the payroll application, and that application can then elevate the user’s privileges while that application is being used.

To accomplish this elevation, roles can be assigned to applications. When that application is accessed, the user temporarily acquires additional roles. The additional roles may be simply a list that everyone authorized to use the application acquires, or the additional roles may be more customized, based on the roles the user already has.

This feature is particularly useful for browser-based applications using CSP (Caché Server Pages). With CSP, a portion of every URL specifies an application name. Following authentication and a determination that the user is authorized to use that CSP application, the user temporarily gains the additional roles assigned to that application for the duration of that page request.

The security administrator can also designate specific routines as capable of performing role elevation to gain the additional roles of specified applications, after passing user specified security tests. This facility is tightly controlled, and it is the mechanism by which non-CSP applications perform role elevation.

Authentication

Caché supports various levels of authentication, ranging from none, to the use of passwords, to the use of the Kerberos protocol to authenticate the identity of users. Kerberos provides very strong authentication and has the advantages of being fast, scalable, and easy to use. With Kerberos, passwords are never transmitted over the network, which provides an extra measure of security.

Caché supports the implementation of a single sign-on.

Database Encryption

Caché supports two forms of database encryption:

  • The security administrator can designate one or more CACHE.DAT files (databases) to be encrypted on disk. Everything in those files is then encrypted.
  • Developers can use system functions to encrypt/decrypt data, which then may be stored in the database or transmitted. This feature can be used to encrypt sensitive data to protect it from other users who have read access to the database but not the key.

By default, Caché encrypts data with an implementation of the Advanced Encryption Standard (AES), a symmetric algorithm that supports keys of 128, 192, or 256 bits. Encryption keys are stored in a protected memory location. Caché provides full capabilities for key management.

The journal can also be encrypted.

Auditing

Many applications, especially those that must comply with government regulations like HIPAA or Sarbanes-Oxley, need to provide secure auditing. In Caché, all system and application events are recorded in an append-only log, which is compatible with any query or reporting tool that uses SQL.

Tech Guide

 

Previous Page

Previous Page
Chapter 1

Table of Contents

Next Page
Chapter 3

Next Page