You are currently browsing the tag archive for the ‘HortonWorks’ tag.
The big-data landscape just got a little more interesting with the release of EMC’s Pivotal HD distribution of Hadoop. Pivotal HD takes Apache Hadoop and extends it with a data loader and command center capabilities to configure, deploy, monitor and manage Hadoop. Pivotal HD, from EMC’s Pivotal Labs division, integrates with Greenplum Database, a massively parallel processing (MPP) database from EMC’s Greenplum division, and uses HDFS as the storage technology. The combination should help sites gain from big data a key part of its value in information optimization.
Greenplum and EMC have been working with Hadoop technology to provide robust database and analytic technology offerings. EMC is using Hadoop and HDFS as a foundation to support a new generation of information architectures, on top of which the company provides a value-added layer of data and analytic processing to support a range of big data needs. The aim is to address one of the benefits of big data technology, which is to increase the speed of analysis; our big data benchmark research found that to be a key benefit for 70 percent of organizations.
EMC is placing a bet by building its distribution on top of Apache Hadoop 2.02, which has yet to be officially released. The company is testing its software on a thousand-node cluster to ensure it will be ready. While EMC calls Pivotal HD the most powerful Hadoop distribution, it is one of many new providers that are building on Hadoop technologies and commercializing it for organizations looking for direct support and services or looking for value-added technology on top of Hadoop. Oddly, however, EMC’s new offering appears to be competitive with its own licensing of MapR for a product it calls Greenplum MR.
EMC is calling the advanced database processing technology with Pivotal HD a new name of HAWQ. It provides the ability to use ANSI SQL in an optimized manner against big data through a query parser and optimizer with its own HAWQ nodes process query execution against HDFS data nodes. HAWQ also has its own Xtension Framework for adaptability to other technologies. HAWQ improves upon the performance of regular SQL as it is a specialized technology to manage distributed and optimized queries to data in Hadoop.
By supporting SQL as the language to get to Hadoop, HAWQ simplifies standardized access to big data through this approach that provides query optimization through its query planning and pipelining methods. Providing a SQL interface and an ODBC connection is not new; many Hadoop distributions now provide ODBC connectivity, including Cloudera, Hortonworks and MapR. EMC, however, uses its optimized query and SQL connection in HAWQ as an accelerator, which lets it stack its software technology up against any data and analytic technology, not just Hadoop. The question for organizations thinking about making an investment in this approach is whether they are limiting their access to future Hadoop advancements by investing in HAWQ technology that operates with only the Pivotal HD distribution or does the gains provide immediate value to separate any Hadoop challenges in optimizing its infrastructure. It is my belief that if an organization adopts this path of HAWQ, it will need to ensure it invests in an information architecture that includes integration technology at the HDFS level, as businesses will inevitably be operating against varying flavors of Hadoop.
Another area of differentiation EMC promises for HAWQ is in the area of performance. EMC claims exponential performance improvement using its query optimizer and SQL versus using Hive to access HDFS or Cloudera Impala and native Hadoop. In fact it claims 19 to 648 times faster performance using its own benchmark. Since these benchmarks were not run independently, it is hard to place significant value in them for now. I made inquiries to many Hadoop software providers, including Cloudera, and they said these metrics are probably not that accurate and invited performance comparisons against their technologies. Clearly these benchmarks should have been released to the Hadoop community for its members to design optimized queries using Hive for more accurate comparisons, but EMC is hoping that its results will entice IT professionals to try it for themselves.
EMC’s stature in the market and its work with a broad range of technology partners makes it an important player in the big data market. Tableau Software is one of those partners, providing discovery on data from HAWQ and Pivotal HD for analytics. Cirro also announced support for Pivotal HD, enabling a new generation of what I call big data integration. These partners are good examples and provide EMC a more complete stack of technologies for operating in a more enterprise approach for big data from analyst to connectivity to other data sources.
EMC can deploy its big data technology across a variety of deployment methods, including public cloud with OpenStack and Amazon Web Services (AWS), private cloud using VMware, and on-premises. Our big data research shows faster growth planned for hosted (59%) and software as a service (65%) than for future on-premises deployments. While EMC is not allowed to publicly mention its customer references, and I have yet to validate them, the company says they include some of the largest banks and manufacturers.
Meanwhile, the Hadoop community’s new project Tez provides an alternative to bypass MapReduce to improve performance. It uses Hadoop YARN for a more efficient run time and better performance for queries. Also, the Stinger Initiative is a project to improve interactive query support for Hive.
EMC acknowledges open source efforts that focus on improving the performance of accessing HDFS and look forward to those advancements and where they can be extracted into its Pivotal HD product but points to its query optimizer and ANSQ SQL as a better approach. It also did not deny that its performance comparisons could have been more optimized. But EMC is betting that its HAWQ efforts and its reliance on the next release of Apache Hadoop 2 will place it in a good market position, leveraging open source technology that is expected to be released in 2013.
This move to introduce Pivotal HD Enterprise and HAWQ is clearly an opportunity to accelerate EMC’s efforts. Greenplum’s technology needed assistance to grow its adoption as it competes with approaches that encompass not only Hadoop but also in-memory, appliance and RDBMS technology. Only time will tell how EMC’s focus on big data with Pivotal HD and HAWQ will play out. The battle among big data providers continues to be very competitive, with dozens of approaches. As each company moves from experimentation to development to production, it must carefully determine what technology will best meet its unique needs. Organizations should evaluate HAWQ and Pivotal HD on not just the merits of performance or providing SQL access but on the architectural and management needs of IT that span from adaptability, manageability, reliability and usability and the business value that should be ascertained with this technology compared to other Hadoop and big-data technology approaches.
CEO & Chief Research Officer
Business is starting to realize that taking advantage of big data is not just technically feasible but affordable by organizations of all sizes. However, as outlined in our agenda on big data and information optimization, the technology must be engineered to the information needs of business. Hortonworks has been steadily advancing its big data technology called Hadoop and contributing its developments back to the Apache Software Foundation for a range of projects. The company performs enterprise-level testing to ensure Hadoop not just operates but scales across operating systems, cloud computing, virtual machines and appliances. Over the last year Hortonworks has released a number of certifications and benchmarks for an enterprise-ready version of Hadoop for which it provides support and services. These are important steps forward in meeting the needs of IT management, which is the audience evaluating big data technologies in 66 percent of organizations according to our big data research.
Hortonworks Data Platform, the enterprise offering that I analyzed last year, is a Hadoop technology stack that is being adopted because of its enterprise readiness. At beginning of this year, Hortonworks released its latest version, which advanced management and monitoring through the use of Apache Ambari and improved security and authentication. It supports multiple concurrent query connections to Hive, making it more scalable in support of business intelligence and applications. While many of the competitive approaches to Hadoop and Hive point to its performance challenges, Hortonworks points out the improvement they are contributing to advance it and these competitive benchmark are significantly flawed in their design. It also offers improved SQL access to Hadoop, and while it is not the only application that provides it, Hortonworks worked with Simba to provide a Hive ODBC connector to support SQL-92 access from business intelligence tools. Hortonworks also includes Talend Open Studio to help with Hadoop integration needs. To support testing and development for enterprises, the company released Hortonworks Sandbox at beginning of 2013.
Hortonworks is working with Microsoft to have Hadoop operate on Windows platforms. It’s now available for download in beta and is expected to be generally available in the second quarter. By working with Microsoft, Hortonworks helps IT organizations that use Windows Server as the platform for their big data initiatives. Microsoft HDInsight Server and Windows Azure HDInsight Service are built on Hortonworks Hadoop Data Platform and make Hadoop readily available on Microsoft Windows. This strategic alliance helps IT organizations bring the power of Hadoop to Microsoft platforms. It is important for Hortonworks to broaden its reach to the large and global audience that uses Windows, especially given that our research on big data finds that 89 percent of organizations today use RDBMSes like SQL Server, and they can now use Hadoop more closely on a Windows platform.
This new approach with HDInsight can then connect with Microsoft SQL Server 2012 for sourcing or accessing data into Hadoop. Microsoft Excel can easily direct access to Hadoop through SQL which opens up further support for a large number of organizations. Microsoft HDInsight provides integration with Microsoft Systems Center for management of deployments, and security is integrated with Windows Server Active Directory. Making Hadoop available from Hortonworks on Windows is a significant step forward for Microsoft and its big data efforts. The company is not usually part of the discussion in the big data market though its technologies are used in many deployments that easily could be described as big data in nature. Both Microsoft and Hortonworks are highlighting this alliance and technology availability at the Strata 2013 conference in Silicon Valley this week.
In addition, Hortonworks has announced new Hadoop initiatives to further advance its potential. It has a project underway to improve Hive performance through support of interactive queries and a new project with Tez to get a newer-generation runtime to improve performance of Hive and not depend on MapReduce. It is also working on Hadoop Gateway to provide a single point of secured authentication to Hadoop that will help in operations across clusters. Part of Hortonworks approach is that each of these new advancements are contributed to the open source community where other developers and organization can contribute and help or use it once finalized and ready for distribution. This is a much different approach than others in the market who source Hadoop from the community and make proprietary extensions to it or embed it in their software and sell the license to the customer.
Hortonworks operates in a very competitive Hadoop market, let alone the broader market for big data technologies. Within the Hadoop market it has many competitive forces but Hortonworks states their pure support for working through the open source community and having not just their large number of committers to Hadoop but the power of all the developers and organizations that work to advance this big data technology is their competitive advantage. The company previously partnered with Teradata to have its Hadoop technology integrated with the Teradata Aster Big Analytics Appliance, a device that we awarded the 2012 Technology Innovation Award for Big Data for its sophistication in blending Hadoop and AsterData into one appliance. I expect Hortonworks to continue investments into partnerships in areas from integration to analytics, as it has done to support data integration with partners like Talend and Informatica; as I have pointed out, big data is broken without having an array of support for integration technologies.
I like the work that Hortonworks is doing to supporting Hadoop deployments. Our research on big data finds that 43 percent of organizations prefer on-premises deployments and 24 percent prefer it in the cloud, which Hortonworks addresses with a partnership with Rackspace and now with Microsoft and Windows Azure. Hortonworks offers many opportunities to help IT organizations use Hadoop across platforms and environments, and supports existing technology interoperating with it, allowing organizations to use employees who are already trained and ready to be leveraged. Organizations that are interested in examining certified and supported Hadoop should evaluate Hortonworks. Businesses looking for Hadoop support on Microsoft Windows will find Hortonworks their only option, as it is the strategic choice by Microsoft for operating Hadoop on its platforms. Hortonworks support for contributing to Hadoop openly, its significant sized partner ecosystem to complement Hadoop and the flexibility of operating across Linux and Windows to on-premises and in cloud computing environment make it a strategic provider of big data technology.
CEO & Chief Research Officer