2019 promises to be an amazing year in the data management industry. While I certainly don’t have a perfect crystal ball, I’ll offer a few predictions. Here are five predictions for data management in 2019.
Artificial Intelligence meets governance – There has been no shortage of discussion about artificial intelligence (AI) and machine learning (ML) in the last few years. AI and ML are highly hyped, but there is also real potential benefits and applications of the technology nearly everywhere you look. It’s still early; in a 2018 study, McKinsey examined 160 AI uses cases and found only 12 percent had progressed beyond the experimental stage.
Machine learning has found applications in governance, risk and compliance (GRC), but ironically ML applications typically lack governance themselves. For example, the European Union’s General Data Protection Regulation (GDPR) requirement of “explainability” restricts the kind of ML algorithms that can be used. There's also tension around access to ‘crown jewel’ data for machine learning purposes, often resulting in an impasse between Data Scientists (who want unrestricted access to all the data) and IT (who need to maintain compliance with data privacy and security regulations).
I predict this will come to a head in 2019. Organizations will start recognizing the problem and look for solutions that keep their data secure and compliant while giving data scientists appropriate access.
Enterprise data platforms gain traction – Since data lakes came on the scene around 2010, organizations have rushed to deploy data lake technology, mostly based on Hadoop. They were driven by a vision of unified data access for the entire organization, transforming their legacy firms into modern, data-driven companies. The reality, however, has been very far from the vision, and tales of data lake projects producing ‘data swamps’ abound.
Over the last year, I’ve noticed more attention to traditional concerns such as data integrity and rapid development of data-intensive applications. I also noticed the coining of a new variant of the term, an Enterprise Data Platform (EDP). I predict that this will become a standard term in 2019, and that we’ll see many more organizations turn to this approach to power their digital transformations.
IDC issued a report about the emergence of Enterprise Data Platforms in December 2018. "[EDP] technology enables users and developers to find and leverage data where it lives, to combine it where it makes sense to do so, to understand the data, to record its meaning, and to generate more frequent and better analyses and, in some cases, data-driven operations based on the data in the platform," says Carl Olofson, research vice president for Data Management Software research at IDC. "The resulting business value includes reducing the time to insight, more comprehensive business decisions, better data to support artificial intelligence/machine learning (AI/ML) operations, and a more agile enterprise."
Trend towards humanizing data gains steam – I predict that “humanizing data” will be part of at least 20 percent of data management initiatives in 2019.
This trend is not simply a backlash against the impersonal & opaque analytics that so often have negative business and societal effects, as highlighted so well in the book "Weapons of Math Destruction". Humanizing data requires adding context to it, making it easy to access, and recognizing that a human touch is required to interpret it.
There's a lot of good to be gained from this mindset:
• considering data quality as well as quantity results in better outcomes, better decisions, and less unconscious bias
• bringing in more data about each individual customer provides a more complete picture, enabling customization of services and products on an unforeseen scale.
• making data accessible by a wider audience using intuitive tools brings wisdom and common sense to bear
But this trend also raises the bar for knowledge and sensitivity of data analytics before many organizations are ready for this. Those that already have a Chief Data Officer, a data platform strategy, and an advanced analytics capability may find humanizing their data a natural next step. For those who have just gotten to the first rung of the data ladder, it can be discouraging to add a new level of complexity and fuzziness to their projects.
Information management changes as you work to humanize data initiatives. Bringing in context requires related data which often requires different data structures or a multi-model approach to data management. Making large quantities of data intuitive to explore and understand requires high performance data platforms as well as a variety of data visualization and analytics capabilities all running on the same data. Data quality assessment is easier with good control over the sources, processing pipelines, and residency of the data; large quantities of historical data may need to be tapped to provide a more complete view.
DataOps overtakes ETL
The term “DataOps” is less than five years old, but it is already well recognized. If you haven’t heard of it, DataOps is an approach to the entire data lifecycle that applies the mindset and techniques of agile software development, DevOps, and statistical process control to data analytics.
In 2018, DataOps appeared for the first time in Gartner’s Hype Cycle for Data Management. In 2019, I predict that DataOps practices will be a bigger part of organization’s data management than traditional extract, transform, and load methodology (ETL).
Although DataOps may be considered by some to encompass ETL, I view it as a new school of data integration that is much more agile and integrated. ETL is batch-oriented and quite heavyweight to develop and maintain, and it’s the domain of specific data integration vendors. With a DataOps approach, batch and real-time data integration can be done using the same development platform and same compute and database infrastructure.
Database sprawl results in a recognition of multi-model
The number of different data models in use has continued to grow since the advent of NoSQL databases over a decade ago. I often run into applications that use multiple different data stores, with five or six different models across five or six different products. This adds a lot of cost and complexity, not to mention risk of data inconsistency from duplicate data being incompletely synchronized across different products.
Multi-model databases provide an attractive option for those looking to support two or more different types of data models in the same application. A multi-model database offers several advantages over the ‘polyglot persistence’ approach of combining multiple products: the simplicity and lower cost of a single product, quicker time to market, and lower development, maintenance and support costs.
I predict that multi-model databases will be a recognized category by all major analysts within 2019. There’s just too much pain from database sprawl, and something has to give.
There you go: five predictions you can hold me to. I’ll report back in a year.