masthead-resources

Introduction

Because of increasing business and governmental pressures to integrate their operations, the financial services industry is developing a number of standards for data exchange and other common functions. Standards such as XBRL, FpML, MDDL, RIXML, and FIXML are all specialized dialects of XML (Extensible Markup Language). Any financial services application with good support for XML will be able to communicate effectively using one or more of the emerging industry standards.

Efficient data exchange increases the need for fast, scalable data persistence. Financial services applications must be able to process and persist large amounts data very quickly. Plus, the data must be easily accessible via SQL for analysis and reporting purposes. This paper will examine various database technologies with regards to their suitability for use with XML-based standards. It will show that Caché, with its Unified Data Architecture and seamless XML parsing, is the ideal database for use with the XML-based standards of the financial services industry.

The challenge of storing XML-formatted data

XML (and the standards based on it) is useful for sharing data between applications because the data contains tags that describe what the data represents. For example, Figure #1 shows how some simple information might be rendered in XML. From the tags, it can be seen that, in this case, Boston is the name of a city, not a person, and it is where John Smith lives, as opposed to where he works.

Figure #1: Some simple XML

<person>
<lastname>Smith</lastname>
<firstname>John</firstname>

<home_address>
<street>1234 Jones St.</street>
<city>Boston</city>
<state>MA</state>
</home_address>

<work_address>
<street>56 Brown St.</street>
<city>Cambridge</city>
<state>MA</state>
</work_address>
</person>

Figure #1 also shows that the structure of XML documents is hierarchical and multidimensional, and that the tags in an XML documents typically take up more space than the data itself. In addition, data in XML format cannot easily be searched using SQL, the query language (developed for use with relational database) that has become an almost universal standard for data analysis and reporting. All of these things should be considered when choosing a database technology for financial services applications that will use one or more XML-based standard for data exchange.

The pros and cons of various database technologies

There are several different database technologies to choose from. Each has strengths and weakness with regards to their suitability for use with XML-based standards:

Relational Databases

Relational technology has dominated the database landscape for thirty years, mostly because of its easily understood tabular format, and the popularity of SQL as a query language. However, relational technology is not very adept at modeling the rich, complex information often found in the real world. For example, it is difficult to store XML data in the rows-and-columns format of a relational database. Usually, XML data must be “mapped” into several different relational tables. A rich XML schema, such as is needed to describe data in the financial services industry, might map to a hundred or more tables. The processing overhead required to store and retrieve XML data from a relational database can be significant, and will adversely affect application performance.

Pure XML Databases

One way of avoiding the issue of mapping is to store XML in its native form, without any transformation at all. Pure XML databases have not become popular, however, partly because they are typically large and cumbersome – the inevitable result of saving the tags along with the data. Also, a pure XML database is not compatible with SQL and (unless a mapping layer is added) will not work with most data analysis and reporting tools.

Pure Object Databases

Object technology, with its rich data structures, is a good match for XML. The XML schema – the tags – can be used to define an object class, and then the data itself is stored as instances of the class. Parsing between objects and XML is easy, and can be automated. Because the tags are not stored with each object, the database will be compact and nimble. However, like pure XML databases, pure object databases cannot be queried using SQL.

A fourth option – InterSystems Caché

Caché is none of the above. Its fundamental data structures (called “globals”) are sparse multidimensional arrays. Rich enough to model complex information, they can also be presented in an SQL-compatible form. Therefore, Caché, through its “Unified Data Architecture”, uniquely allows both object and relational access to data – without mapping. Caché provides all the benefits of an object database, and the SQL compatibility of a relational database.

Caché can automatically parse data between Caché objects and XML. Caché objects can also be projected to Java, .NET, C++, and other technologies, allowing rapid application development and easy integration into any environment.

Conclusion

Caché, from InterSystems, is an excellent choice of database technologies for applications that use XML or XML-based standards for data exchange. Its Unified Data Architecture allows seamless and simultaneous data access via both objects and SQL. That means Caché can replicate rich XML schemas, without incurring a performance penalty when making data available to data analysis and reporting tools.

Financial services enterprises that are developing applications around emerging XML-based standards like XBRL, FpML, MDDL, RIXML, and FIXML, would be well advised to consider using Caché.