The challenge of storing XML-formatted data
XML (and the standards based on it) is useful for sharing data between applications because the data contains tags that describe what the data represents. For example, Figure #1 shows how some simple information might be rendered in XML. From the tags, it can be seen that, in this case, Boston is the name of a city, not a person, and it is where John Smith lives, as opposed to where he works.
Figure #1: Some simple XML
<street>1234 Jones St.</street>
<street>56 Brown St.</street>
Figure #1 also shows that the structure of XML documents is hierarchical and multidimensional, and that the tags in an XML documents typically take up more space than the data itself. In addition, data in XML format cannot easily be searched using SQL, the query language (developed for use with relational database) that has become an almost universal standard for data analysis and reporting. All of these things should be considered when choosing a database technology for financial services applications that will use one or more XML-based standard for data exchange.
The pros and cons of various database technologies
There are several different database technologies to choose from. Each has strengths and weakness with regards to their suitability for use with XML-based standards:
Relational technology has dominated the database landscape for thirty years, mostly because of its easily understood tabular format, and the popularity of SQL as a query language. However, relational technology is not very adept at modeling the rich, complex information often found in the real world. For example, it is difficult to store XML data in the rows-and-columns format of a relational database. Usually, XML data must be “mapped” into several different relational tables. A rich XML schema, such as is needed to describe data in the financial services industry, might map to a hundred or more tables. The processing overhead required to store and retrieve XML data from a relational database can be significant, and will adversely affect application performance.
Pure XML Databases
One way of avoiding the issue of mapping is to store XML in its native form, without any transformation at all. Pure XML databases have not become popular, however, partly because they are typically large and cumbersome – the inevitable result of saving the tags along with the data. Also, a pure XML database is not compatible with SQL and (unless a mapping layer is added) will not work with most data analysis and reporting tools.
Pure Object Databases
Object technology, with its rich data structures, is a good match for XML. The XML schema – the tags – can be used to define an object class, and then the data itself is stored as instances of the class. Parsing between objects and XML is easy, and can be automated. Because the tags are not stored with each object, the database will be compact and nimble. However, like pure XML databases, pure object databases cannot be queried using SQL.
A fourth option – InterSystems Caché
Caché is none of the above. Its fundamental data structures (called “globals”) are sparse multidimensional arrays. Rich enough to model complex information, they can also be presented in an SQL-compatible form. Therefore, Caché, through its “Unified Data Architecture”, uniquely allows both object and relational access to data – without mapping. Caché provides all the benefits of an object database, and the SQL compatibility of a relational database.
Caché can automatically parse data between Caché objects and XML. Caché objects can also be projected to Java, .NET, C++, and other technologies, allowing rapid application development and easy integration into any environment.
Caché, from InterSystems, is an excellent choice of database technologies for applications that use XML or XML-based standards for data exchange. Its Unified Data Architecture allows seamless and simultaneous data access via both objects and SQL. That means Caché can replicate rich XML schemas, without incurring a performance penalty when making data available to data analysis and reporting tools.
Financial services enterprises that are developing applications around emerging XML-based standards like XBRL, FpML, MDDL, RIXML, and FIXML, would be well advised to consider using Caché.