You are currently browsing the tag archive for the ‘HortonWorks’ tag.
Big data has become a big deal as the technology industry has invested tens of billions of dollars to create the next generation of databases and data processing. After the accompanying flood of new categories and marketing terminology from vendors, most in the IT community are now beginning to understand the potential of big data. Ventana Research thoroughly covered the evolving state of the big data and information optimization sector in 2014 and will continue this research in 2015 and beyond. As it progresses the importance of making big data systems interoperate with existing enterprise and information architecture along with digital transformation strategiesbecomes critical. Done properly companies can take advantage of big data innovations to optimize their established business processes and execute new business strategies. But just deploying big data and applying analytics to understand it is just the beginning. Innovative organizations must go beyond the usual exploratory and root-cause analyses through applied analytic discovery and other techniques. This of course requires them to develop competencies in information management for big data.
Among big data technologies, the open source Hadoop has been commercialized by now established providers including Cloudera, Hortonworks and MapR and made available in the cloud through platforms such as Qubole, which received a Ventana Research Technology Innovation Award in 2014. Other big data technologies are growing as well; for example, use of in-memory and specialized databases also is growing like Hadoop in more than 40 percent of organizations, according to our big data integration benchmark research. These technologies have been integrated into databases or what I call hybrid big data appliances like those from IBM, Oracle, SAP and Teradata that bring the power of Hadoop to the RDBMS and exploit in-memory processing to perform ever faster computing. When placed into hosted and cloud environments these appliances can virtualize big data processing. Another new provider, Splice Machine, brings the power of SQL processing in a scalable approach that uses Hadoop in a cloud-based approach; it received a Ventana Research Technology Leadership Award last year. Likewise advances in NoSQL approaches help organizations process and utilize semistructured information along with other information and blend them with analytics as Datawatch does. These examples show that disruptive technologies still have the potential to revolutionize our approaches to managing information.
Our firm also explores what we call information optimization, which assesses techniques for gaining full value from business information. Big data is one of these when used effectively in an enterprise information architecture. In this context the “data lake” analogy is not helpful in representing the full scope of big data, suggesting simply a container like a data marts or data warehouse. With big data, taking an architectural approach is critical. This viewpoint is evident in our 2014 Ventana Research Technology Innovation Award in Information Management to Teradata for its Unified Data Architecture. Another award winner, Software AG, blends big data and information optimization using its real-time and in-memory processing technologies.
Businesses need to process data in rapid cycles, many in real time and what we call operational intelligence, which utilizes events and streams and provides the ability to sense and respond immediately to issues and opportunities in organizations that adapt to a data-driven culture. Our operational intelligence research finds that monitoring, alerting and notification are the top use cases for deployment, in more than half of organizations. Also machine data can help businesses optimize not just IT processes but business processes that help govern and control the security of data in the enterprise. This imperative is evident in the dramatic growth of suppliers such as Splunk, Sumo Logic and Savi Technology, all of which won Ventana Research Technology Innovation awards for how they process machine and business data in large volumes at rapid velocity.
Another increasing trend in big data is presenting it in ways that ordinary users can understand quickly. Discovery and advanced visualization is not enough for business users who are not trained to interpret these presentations. Some vendors can present locationand geospatial data on maps that are easier to understand. At the other end of the user spectrum data scientists and analysts need more robust analytic and discovery tools, including predictive analytics, which is a priority for many organizations, according toour big data analytics research. In 2015 we will examine the next generation of predictive analytics in new benchmark research. But there is more work to do to present insights from information that are easy to understand. Some analytics vendors are telling stories by linking pages of content, but these narratives don’t as yet help individuals assess and act. Most analytics tools can’t match the simple functionality of Microsoft PowerPoint, placing descriptive titles, bullets and recommendations on a page with a graphic that represents something important to these business professional who reads it. Deeper insights may come from advances in machine learning and cognitive computing that have arrived on the market and bring more science to analytics.
So we strong potential for the outputs of big data, but they don’t arrive just by loading data into these new computing environments. Pragmatic and experienced professionals realize that information management processes do not disappear. A key one in this area is data preparation, which helps ready data sets for processing into big data environments. Preparing data is the second-most important task for 46 percent of organizations in our big data integration research. A second is data integration, which some new tools can automate. This can enable lines of business and IT to work together on big data integration, as 41 percent of organizations in our research are planning to do. To address this need a new generation of technologies came into their own in 2014 including those that received Ventana Research Technology Innovation Awards like Paxata and Tamr but also Trifacta.
Yet another area to watch is the convergence of big data and cloud computing. The proliferation of data sources in the cloud forces organizations to managed and integrate data from a variety of cloud and Internet sources, hence the rise of information as a service for business needs. Ventana Research Technology Innovation Award winner DataSift provides information as a service to blend social media data with other big data and analytics. Such techniques require more flexible environments for integration that can operate anywhere at any time. Dell Boomi, MuleSoft, SnapLogic and others now challenge established data integration providers such as Informatica and others including IBM, Oracle and SAP. Advances in master data management, data governance, data quality and integration backbones, and Informatica and Information Builders help provide better consistency of any type of big data for any business purpose. In addition our research finds that data security is critical for big data in 61 percent of organizations; only 14 percent said that is very adequate in their organization.
There is no doubt that big data is now widespread; almost 80 percent of organizations in our information optimization research, for example, will be using it some form by the end of 2015. This is partly due to increased use across the lines of business; our research on next-generation customer analytics in 2014 shows that it is important to improving understanding customers in 60 percent of organizations, is being used in one-fifth of organizations and will be in 46 percent by the end of this year. Similarly our next-generation finance analytics research in 2014 finds big data important to 37 percent of organizations, with 13 percent using it today and 42 percent planning to by the end of 2015. And we have already measured how it will impact human capital management and HR and where organizations are leveraging it in this area of importance.
I invite you to download and peruse our big data agenda for 2015. We will examine how organizations can instrument information optimization processes that use big data and pass this guidance along. We will explore big data’s role in sales and product areas and produce new research on data and analytics in the cloud. Our research will uncover best practices that innovative organizations use not only to prepare and integrate big data but also more tightly unify it with analytics and operations across enterprise and cloud computing environments. For many organizations taking on this challenge and seeking its benefits will require new information platforms and methods to access and provide information as part of their big data deployments. (Getting consistent information across the enterprise is the top benefit of big data integration according to 39 percent of organizations.) We expect 2015 to be a big year for big data and information optimization. I look forward to providing more insights and information about big data and helping everyone get the most from their time and investments in it.
CEO and Chief Research Officer
I had the pleasure of attending Cloudera’s recent analyst summit. Presenters reviewed the work the company has done since its founding six years ago and outlined its plans to use Hadoop to further empower big data technology to support what I call information optimization. Cloudera’s executive team has the co-founders of Hadoop who worked at Facebook, Oracle and Yahoo when they developed and used Hadoop. Last year they brought in CEO Tom Reilly, who led successful organizations at ArcSight, HP and IBM. Cloudera now has more than 500 employees, 800 partners and 40,000 users trained in its commercial version of Hadoop. The Hadoop technology has brought to the market an integration of computing, memory and disk storage; Cloudera has expanded the capabilities of this open source software for its customers through unique extension and commercialization of open source for enterprise use. The importance of big data is undisputed now: For example, our latest research in big data analytics finds it to be very important in 47 percent of organizations. However, we also find that only 14 percent are very satisfied with their use of big data, so there is plenty of room for improvement. How well Cloudera moves forward this year and next will determine its ability to compete in big data over the next five years.
Cloudera’s technology supports what it calls an enterprise data hub (EDH), which ties together a series of integrated components for big data that include batch processing, analytic SQL, a search engine, machine learning, event stream processing and workload management; this is much like the way relational databases and tools evolved in the past. These features also can deal with the types of big data most often used, according to our research: 40 percent or more use five types, from transactional data (60%) to machine data (42%). Hadoop combines layers of the data and analytics stack from collection, staging and storage to data integration and integration with other technologies. For its part Cloudera has a sophisticated focus on both engineering and customer support. Its goal is to enable enterprise big data management that can connect and integrate with other data and applications from its range of partners. Cloudera also seeks to facilitate converged analytics. One of these partners, Zoomdata, demonstrated the potential of big data analytics in analytic discovery and exploration through its visualization on the Cloudera platform; its integrated and interactive tool can be used by business people as well as professionals in analytics, data management and IT.
Cloudera latest major release with Cloudera Enterprise 5 brought a range of enterprise advancements from in-memory processing, resource management, data management, data protection to name a few. Cloudera offers a range of product options that they announced to make it easier to embrace their Hadoop technology. Cloudera Express is its free version of Hadoop, and it provides three editions licensed through subscription: basic, flex and data hub. The Flex Edition of Cloudera Enterprise has support for analytic SQL, search, machine learning, event stream processing and online NoSQL through the Hadoop components HBase, Impala, Spark and Navigator; a customer organization can have one of these per Hadoop cluster. The Enterprise Data Hub (EDH) Edition enables use of any of the components in any configuration. Cloudera Navigator is a product for managing metadata, discovery and lineage, and in 2014 it will add search, annotation and registration on metadata. Cloudera uses Apache Hive to support SQL through HiveQL, and Cloudera Impala provides a unique interface to the Hadoop file system HDFS using SQL. This is in line with what our research shows organizations prefer: More than half (52%) use standard SQL to access Hadoop. This range of choices in getting to data within Hadoop helps Cloudera’s customers realize a broad range of uses that include predictive customer care, market risk management, customer experience and other areas where very large volumes of information can be applied for applications that were not cost-effective before. With EDH Edition Cloudera can compete directly with large players IBM, Oracle, SAS and Teradata, all of which have ambitions to provide the hub of big data operations for enterprises.
Having open source roots, community is especially important to Hadoop. Part of building a community is providing training to certify and validate skills. Cloudera has enrolled more than 50,000 professionals in its Cloudera University and works with online learning provider Udacity to increase the number of certified Hadoop users. It also has developed academic relationships to promote Hadoop skills being taught to computer science students. Our research finds that this sort of activity is necessary: The most common challenge in big data analytics processes for two out of three (67%) organizations is not having enough skilled resources; we have found similar issues in the implementation and management of big data. The other aspect of a community is to enlist partners that offer specific capabilities. I am impressed with Cloudera’s range of partners, from OEMs and system integrators to channel resellers such as Cisco, Dell, HP, NetApp and Oracle to support in the cloud from Amazon, IBM, Verizon and others.
To help it keep up Cloudera announced it has raised another $160 million from the likes of T. Rowe Price, Michael Dell Ventures and Google Ventures to add to financing from venture capital firms. With this funding Cloudera outlined its investment focus for 2014 which will concentrate on advancing database and storage, security, in-memory computing and cloud deployment. I believe that it will need to go further to meet the growing needs for integration and analytics and prove that it can provide a high-value integrated offering directly as well as through partners. Investing in its Navigator product also is important, as our research finds that quality and consistency of data is the most challenging aspect of the big data analytics process in 56 percent of organizations. At the same time, Cloudera should focus on optimizing its infrastructure for the four types of data discovery that are required according to our analysis.
Cloudera’s advantage is being the focal point in the Hadoop ecosystem while others are still trying to match its numbers in developers and partners to serve big data needs. Our research finds substantial growth opportunity here: Hadoop will be used in 30 percent of organizations through 2015 and another 12 percent are planning to evaluate it. Our research also finds a significant lead for Cloudera in Hadoop distributions, but other options like Hortonworks and MapR are growing. The research finds that the most of these organizations are seeking the ability to respond faster to opportunities and threats; to do that they will need to have a next generation of skills to apply to big data projects. Our research in information optimization finds that over half (56%) of organizations are planning to use big data and Hadoop will be a key focus for those efforts. Cloudera has a strong position in the expanding big data market because it focuses on the fundamentals of information management and analytics through Hadoop. But it faces stiff competition from the established providers of RDBMSs and data appliances that are blending Hadoop with their technology as well as from a growing number of providers of commercial versions of Hadoop. Cloudera is well managed and has finances to meet these challenges; now it needs to be able to show many high-value production deployments in 2014 as the center of business’s big data strategies. If you are building a big data strategy with Hadoop, Cloudera must be in the evaluation priority for an organization.
CEO & Chief Research Officer