You are currently browsing the tag archive for the ‘Hadoop’ tag.
Big data has become a big deal as the technology industry has invested tens of billions of dollars to create the next generation of databases and data processing. After the accompanying flood of new categories and marketing terminology from vendors, most in the IT community are now beginning to understand the potential of big data. Ventana Research thoroughly covered the evolving state of the big data and information optimization sector in 2014 and will continue this research in 2015 and beyond. As it progresses the importance of making big data systems interoperate with existing enterprise and information architecture along with digital transformation strategiesbecomes critical. Done properly companies can take advantage of big data innovations to optimize their established business processes and execute new business strategies. But just deploying big data and applying analytics to understand it is just the beginning. Innovative organizations must go beyond the usual exploratory and root-cause analyses through applied analytic discovery and other techniques. This of course requires them to develop competencies in information management for big data.
Among big data technologies, the open source Hadoop has been commercialized by now established providers including Cloudera, Hortonworks and MapR and made available in the cloud through platforms such as Qubole, which received a Ventana Research Technology Innovation Award in 2014. Other big data technologies are growing as well; for example, use of in-memory and specialized databases also is growing like Hadoop in more than 40 percent of organizations, according to our big data integration benchmark research. These technologies have been integrated into databases or what I call hybrid big data appliances like those from IBM, Oracle, SAP and Teradata that bring the power of Hadoop to the RDBMS and exploit in-memory processing to perform ever faster computing. When placed into hosted and cloud environments these appliances can virtualize big data processing. Another new provider, Splice Machine, brings the power of SQL processing in a scalable approach that uses Hadoop in a cloud-based approach; it received a Ventana Research Technology Leadership Award last year. Likewise advances in NoSQL approaches help organizations process and utilize semistructured information along with other information and blend them with analytics as Datawatch does. These examples show that disruptive technologies still have the potential to revolutionize our approaches to managing information.
Our firm also explores what we call information optimization, which assesses techniques for gaining full value from business information. Big data is one of these when used effectively in an enterprise information architecture. In this context the “data lake” analogy is not helpful in representing the full scope of big data, suggesting simply a container like a data marts or data warehouse. With big data, taking an architectural approach is critical. This viewpoint is evident in our 2014 Ventana Research Technology Innovation Award in Information Management to Teradata for its Unified Data Architecture. Another award winner, Software AG, blends big data and information optimization using its real-time and in-memory processing technologies.
Businesses need to process data in rapid cycles, many in real time and what we call operational intelligence, which utilizes events and streams and provides the ability to sense and respond immediately to issues and opportunities in organizations that adapt to a data-driven culture. Our operational intelligence research finds that monitoring, alerting and notification are the top use cases for deployment, in more than half of organizations. Also machine data can help businesses optimize not just IT processes but business processes that help govern and control the security of data in the enterprise. This imperative is evident in the dramatic growth of suppliers such as Splunk, Sumo Logic and Savi Technology, all of which won Ventana Research Technology Innovation awards for how they process machine and business data in large volumes at rapid velocity.
Another increasing trend in big data is presenting it in ways that ordinary users can understand quickly. Discovery and advanced visualization is not enough for business users who are not trained to interpret these presentations. Some vendors can present locationand geospatial data on maps that are easier to understand. At the other end of the user spectrum data scientists and analysts need more robust analytic and discovery tools, including predictive analytics, which is a priority for many organizations, according toour big data analytics research. In 2015 we will examine the next generation of predictive analytics in new benchmark research. But there is more work to do to present insights from information that are easy to understand. Some analytics vendors are telling stories by linking pages of content, but these narratives don’t as yet help individuals assess and act. Most analytics tools can’t match the simple functionality of Microsoft PowerPoint, placing descriptive titles, bullets and recommendations on a page with a graphic that represents something important to these business professional who reads it. Deeper insights may come from advances in machine learning and cognitive computing that have arrived on the market and bring more science to analytics.
So we strong potential for the outputs of big data, but they don’t arrive just by loading data into these new computing environments. Pragmatic and experienced professionals realize that information management processes do not disappear. A key one in this area is data preparation, which helps ready data sets for processing into big data environments. Preparing data is the second-most important task for 46 percent of organizations in our big data integration research. A second is data integration, which some new tools can automate. This can enable lines of business and IT to work together on big data integration, as 41 percent of organizations in our research are planning to do. To address this need a new generation of technologies came into their own in 2014 including those that received Ventana Research Technology Innovation Awards like Paxata and Tamr but also Trifacta.
Yet another area to watch is the convergence of big data and cloud computing. The proliferation of data sources in the cloud forces organizations to managed and integrate data from a variety of cloud and Internet sources, hence the rise of information as a service for business needs. Ventana Research Technology Innovation Award winner DataSift provides information as a service to blend social media data with other big data and analytics. Such techniques require more flexible environments for integration that can operate anywhere at any time. Dell Boomi, MuleSoft, SnapLogic and others now challenge established data integration providers such as Informatica and others including IBM, Oracle and SAP. Advances in master data management, data governance, data quality and integration backbones, and Informatica and Information Builders help provide better consistency of any type of big data for any business purpose. In addition our research finds that data security is critical for big data in 61 percent of organizations; only 14 percent said that is very adequate in their organization.
There is no doubt that big data is now widespread; almost 80 percent of organizations in our information optimization research, for example, will be using it some form by the end of 2015. This is partly due to increased use across the lines of business; our research on next-generation customer analytics in 2014 shows that it is important to improving understanding customers in 60 percent of organizations, is being used in one-fifth of organizations and will be in 46 percent by the end of this year. Similarly our next-generation finance analytics research in 2014 finds big data important to 37 percent of organizations, with 13 percent using it today and 42 percent planning to by the end of 2015. And we have already measured how it will impact human capital management and HR and where organizations are leveraging it in this area of importance.
I invite you to download and peruse our big data agenda for 2015. We will examine how organizations can instrument information optimization processes that use big data and pass this guidance along. We will explore big data’s role in sales and product areas and produce new research on data and analytics in the cloud. Our research will uncover best practices that innovative organizations use not only to prepare and integrate big data but also more tightly unify it with analytics and operations across enterprise and cloud computing environments. For many organizations taking on this challenge and seeking its benefits will require new information platforms and methods to access and provide information as part of their big data deployments. (Getting consistent information across the enterprise is the top benefit of big data integration according to 39 percent of organizations.) We expect 2015 to be a big year for big data and information optimization. I look forward to providing more insights and information about big data and helping everyone get the most from their time and investments in it.
CEO and Chief Research Officer
Big data has great promise for many organizations today, but they also need technology to facilitate integration of various data stores, as I recently pointed out. Our big data integration benchmark research makes it clear that organizations are aware of the need to integrate big data, but most have yet to address it: In this area our Performance Index analysis, which assesses competency and maturity of organizations, concludes that only 13 percent reach the highest of four levels, Innovative. Furthermore, while many organizations are sophisticated in dealing with the information, they are less able to handle the people-related areas, lacking the right level of training in the skills required to integrate big data. Most said that the training they provide is only somewhat adequate or inadequate.
Big data is still new to many organizations, and they face challenges in integrating big data that prevent them from gaining full value from their existing and potential investments. Our research finds that many lack confidence in processing large volumes of data. More than half (55%) of organizations characterized themselves as only somewhat confident or not confident in their ability to accomplish that task. They have even less confidence in their ability to process data that arrives at high velocity: Only 29 percent said they are somewhat confident or not confident in that. In dealing with the variety of big data, confidence is somewhat stronger, as more than half (56%) declared themselves confident or very confident. Assurance in one aspect is often found in others: 86 percent of organizations that said they are very confident in their ability to integrate the variety of big data are satisfied with how they manage the storage of big data. Similarly 91 percent of those that are confident or very confident with their data quality are satisfied with the way they manage the storage of big data.
Turning to the technology being used, we find only one-third (32%) of organizations satisfied with their current data integration technology, but twice as many (66%) are satisfied with their data integration processes for loading and creating big data. A substantial majority (86%) of those very confident in their ability to integrate the needed variety of big data are satisfied with their existing data integration processes. Those that are not satisfied said the process is too slow (61%), analytics are hard to build and maintain (50%) and data is not readily available (39%). These findings indicate that making a commitment to data integration, for big data and otherwise, can pay off in confidence and satisfaction with the processes for doing it. Additionally, organizations that use dedicated data integration technology (86%) are satisfied much more often than those that don’t use dedicated technology (52%).
New types of big data technologies are being introduced to meet expanding demand for storage and use of information across the enterprise. One of those fast-growing technologies is the open source Apache Hadoop and commercial enterprise versions of it that provide a distributed file system to manage large volumes of data. The research finds that currently 28 percent of organizations use Hadoop and about as many more (25%) plan to use it in the next two years. Nearly half (47%) have Hadoop-specific skills to support big data integration. For those that have limited resources, open source Hadoop can be affordable, and to automate and interface with it, adopters can use SQL in addition to its native interfaces; about three in five organizations now use each of these options. Hadoop can be a capable tool to implement big data but must be integrated with other information and operational systems.
Big data is not found only in conventional in-house information environments. Our research finds that data integration processes are most often applied between systems deployed on-premises (58%), but more than one-third (35%) are integrating cloud-based systems, which reflects the progress cloud computing has made. Nonetheless, cloud-to-cloud integration remains least common (18%). In the next year or two 20 to 25 percent of organizations plan additional support for all types of integration; those being considered most often are cloud-to-cloud (25%) and on-premises-to-cloud (23%), further reflecting movement into the cloud. In addition, nearly all (95%) organizations using cloud-to-cloud integration said they have improved their activities and processes. This finding confirms the value of integration of big data regardless of what types of systems hold it. With a growing number of organizations using cloud computing, data integration is a critical requirement for big data projects; more than one-quarter (28%) of organizations are deploying big data integration into cloud computing environments.
Because of the intense need of business units and process for big data, integration requires IT and business people to work together to build efficient processes. The largest percentage of organizations in the research (44%) have business analysts work with IT to design and deploy big data integration. Another one-third assign IT to build the integration, and half that many (16%) have IT use a dedicated data integration tool. The research finds some distrust in involving the business side. Almost one in four (23%) said they are resistant or very resistant to allowing business users to integrate big data that IT has not prepared first, and the majority (51%) resist somewhat. For more than half (58%) the IT group responsible for BI and data warehouse systems also is the key stakeholder for designing and deploying big data integration; no other option is used by more than 11 percent.
It is not surprising that IT is the department that most often facilitates big data and needs integration the most (55%). The most frequent issue arising between business units and IT is entrenchment of budgets and priorities (in 42% of organizations). Funding of big data initiatives most often comes from the general IT budget (50%); line-of-business IT budgets (38%) are the second-most commonly used. It is understandable that IT dominates this heavily technical function, but big data is beneficial only when it advances the organization’s goals for information that is needed by business. Management should ensure that IT works with the lines of business to enable them to get the information they need to improve business processes and decision-making and not settle for creating a more cost-effective and efficient method to store it.
Overcoming these challenges is a critical step in the planning process for big data. My analysis that big data won’t work well without integration is confirmed by the research. We urge organizations to take a comprehensive approach to big data and evaluate dedicated tools that can mitigate risks that others have already encountered.
CEO and Chief Research Officer