Mark Smith's Analyst Perspectives

Datameer Advances Big-Data Analytics on Hadoop

Written by Mark Smith | Apr 26, 2012 5:25:31 PM

The increasing pressure to store, retrieve and process data on an unprecedented scale in the enterprise has created a market for processes and tools to support it. Big data, as it’s widely known, is one of the six business technology innovations of the decade outlined in our research agenda, and it has created a renaissance in data management. Our benchmark research on big data finds the top benefits of it to be the ability to retain and analyze more data (74%) and to increase the speed of analysis (70%). In this context a vendor named Datameer comes in.

We have been tracking the company since its inception and first product announcement in 2010. Since then, Datameer has been steadily improving its tools for analyzing big data stored in the open source technology Hadoop. Typically, as organizations adopt and deploy Hadoop, they need tools that help analysts make sense of the mass of underlying data. The issue of analyzing data efficiently in Hadoop has been a growing concern among organizations that establish clusters of Hadoop instances. They need to apply analytics and visualize data for decision support; analyzing data is important to 88 percent of organizations that are using Hadoop, according to our research, and they have plenty of it work on. Organizations that use Hadoop are twice as likely (48%) as those that do not (23%) to produce more than 100 gigabytes a day.

Datameer has an interesting wizard-based approach to bringing structured and unstructured data together and the ability to schedule processing of it. Its automation capability addresses one of the key benefits of big data, namely reducing or eliminating manual processes, which is critical to more than half (59%) of organizations. Datameer provides a familiar spreadsheet-type approach and an analytics library of more than 200 functions.  It also has a drag–and-drop visual approach to analytics, with reports that can be assembled into presentations or delivered in other ways. As well it has many of the capabilities you would see in a business intelligence tool, including query, analysis, reporting and the ability to assemble dashboards, and it helps users discover trends in the data through its visualization. The tool is simple to use, and usability is an evaluation criterion of top importance in 78 percent of organizations using Hadoop. Reliability (63%) is next, which is understandable as Hadoop users analyze data more frequently than others – hourly in 27 percent of organizations and at least daily in 77 percent. A big-data approach enables organizations to analyze large volumes of data at a fine level of detail, which is something 88 percent of organizations said they need.

In Datameer 1.4, the most recent version, the company has added critical functionality. The software can partition data into time-based sets to help with time-series analysis and quickly determine trends from period to period or from a specific date forward. It also supports multiway joins that can be linked within one interface, eliminating the need to create extra sheets within the application. Datameer 1.4 expands the precision for big integer and decimal types, and provides an SDK to offer further flexibility in using input adapters that can get to data from PostgreSQL and Greenplum databases. It has also expanded its REST API, which is used for integration with other applications and scripting. To ensure that data remains secure across network transport layers, Datameer has added Secure LDAP over SSL for authentication and embraces HDFS permissions down to the end-user level.

At last fall’s Hadoop World, Datameer was flaunting its support across the Hadoop ecosystems. The product is available on major cloud computing platforms including Amazon Web Services, EMC, Cloudera, Hortonworks, IBM BigInsights and MapR. Datameer recently announced availability on Microsoft Windows Azure, and if you want to hear why that partnership is relevant, you can listen to this video. Last summer I wrote about the faceoff of Hadoop and Oracle that now has Oracle embracing and integrating to Hadoop instead of ignoring it which will lead to more dialogue on Datameer. Recently Datameer announced a partnership with DataSift, which provides social media analytics against large volumes of data from Twitter.

In these ways Datameer is helping advance the state of analytics on Hadoop. By simplifying the process it is addressing the largest challenges our research found in organizations using big data, which are staffing (80%) and training (74%). Datameer resembles the BI vendors in the 1990s that began attaching to data marts and letting businesses analyze their own data and found significant growth. If Datameer can continue to invest and expand its partner ecosystem with Hadoop vendors and other software providers and consulting firms, it will be able to grow rapidly. If you are working with Hadoop and have not tried Datameer, I recommend that you evaluate it.

Regards,

Mark Smith – CEO & Chief Research Officer