You are currently browsing the tag archive for the ‘Big Data’ tag.
Big data has great promise for many organizations today, but they also need technology to facilitate integration of various data stores, as I recently pointed out. Our big data integration benchmark research makes it clear that organizations are aware of the need to integrate big data, but most have yet to address it: In this area our Performance Index analysis, which assesses competency and maturity of organizations, concludes that only 13 percent reach the highest of four levels, Innovative. Furthermore, while many organizations are sophisticated in dealing with the information, they are less able to handle the people-related areas, lacking the right level of training in the skills required to integrate big data. Most said that the training they provide is only somewhat adequate or inadequate.
Big data is still new to many organizations, and they face challenges in integrating big data that prevent them from gaining full value from their existing and potential investments. Our research finds that many lack confidence in processing large volumes of data. More than half (55%) of organizations characterized themselves as only somewhat confident or not confident in their ability to accomplish that task. They have even less confidence in their ability to process data that arrives at high velocity: Only 29 percent said they are somewhat confident or not confident in that. In dealing with the variety of big data, confidence is somewhat stronger, as more than half (56%) declared themselves confident or very confident. Assurance in one aspect is often found in others: 86 percent of organizations that said they are very confident in their ability to integrate the variety of big data are satisfied with how they manage the storage of big data. Similarly 91 percent of those that are confident or very confident with their data quality are satisfied with the way they manage the storage of big data.
Turning to the technology being used, we find only one-third (32%) of organizations satisfied with their current data integration technology, but twice as many (66%) are satisfied with their data integration processes for loading and creating big data. A substantial majority (86%) of those very confident in their ability to integrate the needed variety of big data are satisfied with their existing data integration processes. Those that are not satisfied said the process is too slow (61%), analytics are hard to build and maintain (50%) and data is not readily available (39%). These findings indicate that making a commitment to data integration, for big data and otherwise, can pay off in confidence and satisfaction with the processes for doing it. Additionally, organizations that use dedicated data integration technology (86%) are satisfied much more often than those that don’t use dedicated technology (52%).
New types of big data technologies are being introduced to meet expanding demand for storage and use of information across the enterprise. One of those fast-growing technologies is the open source Apache Hadoop and commercial enterprise versions of it that provide a distributed file system to manage large volumes of data. The research finds that currently 28 percent of organizations use Hadoop and about as many more (25%) plan to use it in the next two years. Nearly half (47%) have Hadoop-specific skills to support big data integration. For those that have limited resources, open source Hadoop can be affordable, and to automate and interface with it, adopters can use SQL in addition to its native interfaces; about three in five organizations now use each of these options. Hadoop can be a capable tool to implement big data but must be integrated with other information and operational systems.
Big data is not found only in conventional in-house information environments. Our research finds that data integration processes are most often applied between systems deployed on-premises (58%), but more than one-third (35%) are integrating cloud-based systems, which reflects the progress cloud computing has made. Nonetheless, cloud-to-cloud integration remains least common (18%). In the next year or two 20 to 25 percent of organizations plan additional support for all types of integration; those being considered most often are cloud-to-cloud (25%) and on-premises-to-cloud (23%), further reflecting movement into the cloud. In addition, nearly all (95%) organizations using cloud-to-cloud integration said they have improved their activities and processes. This finding confirms the value of integration of big data regardless of what types of systems hold it. With a growing number of organizations using cloud computing, data integration is a critical requirement for big data projects; more than one-quarter (28%) of organizations are deploying big data integration into cloud computing environments.
Because of the intense need of business units and process for big data, integration requires IT and business people to work together to build efficient processes. The largest percentage of organizations in the research (44%) have business analysts work with IT to design and deploy big data integration. Another one-third assign IT to build the integration, and half that many (16%) have IT use a dedicated data integration tool. The research finds some distrust in involving the business side. Almost one in four (23%) said they are resistant or very resistant to allowing business users to integrate big data that IT has not prepared first, and the majority (51%) resist somewhat. For more than half (58%) the IT group responsible for BI and data warehouse systems also is the key stakeholder for designing and deploying big data integration; no other option is used by more than 11 percent.
It is not surprising that IT is the department that most often facilitates big data and needs integration the most (55%). The most frequent issue arising between business units and IT is entrenchment of budgets and priorities (in 42% of organizations). Funding of big data initiatives most often comes from the general IT budget (50%); line-of-business IT budgets (38%) are the second-most commonly used. It is understandable that IT dominates this heavily technical function, but big data is beneficial only when it advances the organization’s goals for information that is needed by business. Management should ensure that IT works with the lines of business to enable them to get the information they need to improve business processes and decision-making and not settle for creating a more cost-effective and efficient method to store it.
Overcoming these challenges is a critical step in the planning process for big data. My analysis that big data won’t work well without integration is confirmed by the research. We urge organizations to take a comprehensive approach to big data and evaluate dedicated tools that can mitigate risks that others have already encountered.
CEO and Chief Research Officer
The market for big data continues to grow as organizations try to extract business value from their own masses of data and other sources. Earlier this year I outlined the dynamics of the business opportunity for big data and information optimization. We continue to see advances as big data and associated information technologies deliver more value, but the range of innovation also has created fragmentation among existing systems including databases that are managed onpremises or in cloud computing environments. In this changing environment organizations encounter new challenges not only in adapting to technology that is more efficient in automating data processing but also in integrating it into their enterprise architecture. I’ve already explained how big data can be ineffective without integration, and we conducted more in-depth research into the market, resulting in our benchmark research on big data integration, which reveals the state of how organizations are adopting this technology in their processes.
The research shows that use of big data techniques has become widespread: Almost half (48%) of all organizations participating in this research and two-thirds of the very large ones use it for storage, and 45 percent intend to use big data in the next year or sometime in the future. This is a significant change in that most organizations have used relational database management systems (RDBMSs) for nearly everything. We find that RDBMSs (76%) are still the most widely used big data technology, followed by flat files (61%) and data warehouse appliances (46%). But this is not the direction many companies are planning to take in the future: Hadoop (44%), in-memory database (46%), specialized databases (43%) and NoSQL (42%) are the tools most often planned to be used by 2016 or being evaluated. Clearly there is a revolution in approaches to storing and using data, and that introduces both opportunities and challenges.
Establishing a big data environment requires integrating data through proper preparation and potentially continuous updates of data, whether in real time or batch processing. A further complication is that many organizations will not have only one but several big data environments to be integrated into the overall enterprise architecture; that requires data and systems integration. Our research finds that some organizations are aware of this issue: Automating big data integration is very important to 45 percent and important to more than one-third. Automation can not only bring efficiency to big data but also remove many risks of errors or inaccurate and inconsistent data.
Data integration technologies have evolved over the past decade, but advances to support big data are more recent. Our research shows a disparity in how well organizations handle big data integration tasks. Those that are mostly or completely adequate are accessing (for 63%), loading (60%), extracting (59%), archiving (55%) and copying (52%) data while the areas most in need of improvement are virtualizing (39%), profiling (37%), blending (34%), master data management (33%) and masking for privacy (33%). At the system level, the research finds that conventional enterprise capabilities are most often needed: load balancing (cited by 51%), cross-platform support (47%), a development and testing environment (42%), systems management (40%) and scalable execution of tasks (39%). To test the range of big data integration capabilities before it is applied to production projects, the “sandbox” has become the standard approach. For their development and testing environment, the largest percentage (36%) said they will use an internal sandbox with specialized big data. This group of findings reveals that big data integration has enterprise-level requirements that go beyond just loading data to build on advances in data integration.
Big data must not be a separate store of data but part of the overall enterprise and data architecture; that is necessary to ensure full integration and use of the data. Organizations that see data integration as critical to big data are embarking on sophisticated efforts to achieve it. The data integration capabilities most critical to their big data efforts are to develop and manage metadata that can be shared across BI systems (cited by 58%), to join disparate data sources during transformation (56%) and to establish rules for processing and routing data (56%).
Other organizations are still examining how to automate integration tasks. The most common barriers to improving big data integration are cost of the software or license (for 44%), lack of resources to use on improvement (37%) and the sense that big data technologies are too complicated to integrate (35%). These findings demonstrate that many organizations need to better understand the efficiency and cost savings that can be realized by using purpose-built technology instead of manual approaches using tools not designed for big data. Along with identifying solid business benefits, establishing savings of time and money are essential pieces of a convincing rationale for investment in big data integration technology. The most time spent in big data integration today is on basic tasks: reviewing data for quality and consistency (52%), preparing data for integration (46%) and connecting to data sources for integration (39%). The first two are related to ensuring that data is ready to load into big data environments. Data preparation is a key part of big data and overall information optimization. More vendors are developing dedicated technology to help with it.
For a process as complex as big data integration, choosing the right technology tool can be difficult. More than half (55%) of organizations are planning to change the way they assess and select such technology. Evaluations of big data integration tools should include considerations of how to deploy it and what sort of vendors can provide it. Almost half (46%) of organizations prefer to integrate big data on-premises while 28 percent opt for cloud-based software as a service and 17 percent have no preference. Half of organizations plan to use cloud computing for managing big data; another one-third (32%) don’t know whether they will. The research shows that the most important technology and vendor criteria used to evaluate big data integration technology are usability (very important for 53%), reliability (52%) and functionality (49%). These top three evaluation criteria are followed by manageability, TCO/ROI, adaptability and validation of vendors. Organizations are most concerned to have technology that is easy to use and can scale to meet their needs.
Big data cannot be used effectively without integration; we observe that the big data industry has not paid as much attention to information management as it should – after all, this is what enables automating the flow of data. Organizations trying to use big data without a focus on information management will have difficulty in optimizing the use of their data assets for business needs. Our research into big data integration finds that the proper technology is critical to meet these needs. We also learned from our benchmark research into big data analytics that data preparation is the largest and most time-consuming set of tasks that needs to be streamlined for best use of the analytics that reveal actionable insights. Organizations that are initiating or expanding their big data deployments whether onpremises or within cloud computing environments should have integration at the top of their priority list to ensure they do not create silos of data that they can’t fully exploit.
CEO and Chief Research Officer