Big data analysis in 2015
Big data and analytics technologies have been going gangbusters for several years, with companies, funding rounds, technologies, and releases occuring rapid-fire since Hadoop and NoSQL stepped onto the industry stage. Each of the last several years saw an overarching theme in the data arena: 2012 was the year big data became really hot; 2013 was the year it grew more accessible, through SQL-on-Hadoop; and 2014 was the year it became far-more versatile, with the addition of YARN and Spark. 2015 will be the year Hadoop matures.
This maturation process in the data world will involve developments that are less exciting than those we have seen over the last few years. But while 2015 won’t be a breakthrough year, it certainly will be a year of the much-needed building of credibility for newer data and analytics technologies. As such, it will be a year where these technologies become more standardized, more broadly adopted, and more successful. And for vendors in the space, it may be a more lucrative year than those which have preceded it.
Hadoop will become more:
- Usable by end-users as it transitions to being more embedded and less-directly exposed
- Enterprise-adoptable through such additions as better tooling and true role-based access controls (RBAC)
- Developer-friendly through the addition of tools and comprehensive APIs
Not coincidentally, the third point will help advance the first two.
NoSQL databases will mature as well. In fact, so will relational databases. Each will continue to acquire qualities of the other, with a likely outcome that the two categories will converge.
This report will examine the following likely trends affecting the analytics space in 2015:
- Data governance will become a higher priority, and the conundrum of enforcing governance over ever-growing, real-time streaming data sets will have to be confronted. Current approaches to governance, based on years of data warehousing and OLAP work, won’t be sufficient to the task, and continuing to avoid the governance question won’t work either. New thinking will be required.
- Hadoop will morph from an infrastructural entity that users must care for as they work with it to a running service that they can use in a task-oriented way. Discrete components, like Hive and Pig, will continue to be useful in their own right but will become constituent services that higher-level, cloud-based self-service platforms will leverage for their own functionality.
- In much the same way that new user platforms will make Hadoop more self-service, so too will new developer platforms make Hadoop more developer-friendly and more embeddable in line-of-business applications. But proper evangelism and mentoring will be required to get enterprise developers over the big data “hump.”
- Relational databases will become more adept at handling semi-structured data. Such capability will need to extend beyond support for XML and JSON as datatypes to accommodating schema-less tables alongside conventional ones, in the same databases and the same SQL queries. Likewise, today’s NoSQL databases may need to allow schema-based tables, albeit within the key-value, column family, document store, or graph storage formats that they use today.