You are currently browsing the tag archive for the ‘Big Data’ tag.
The market for big data continues to grow as organizations try to extract business value from their own masses of data and other sources. Earlier this year I outlined the dynamics of the business opportunity for big data and information optimization. We continue to see advances as big data and associated information technologies deliver more value, but the range of innovation also has created fragmentation among existing systems including databases that are managed onpremises or in cloud computing environments. In this changing environment organizations encounter new challenges not only in adapting to technology that is more efficient in automating data processing but also in integrating it into their enterprise architecture. I’ve already explained how big data can be ineffective without integration, and we conducted more in-depth research into the market, resulting in our benchmark research on big data integration, which reveals the state of how organizations are adopting this technology in their processes.
The research shows that use of big data techniques has become widespread: Almost half (48%) of all organizations participating in this research and two-thirds of the very large ones use it for storage, and 45 percent intend to use big data in the next year or sometime in the future. This is a significant change in that most organizations have used relational database management systems (RDBMSs) for nearly everything. We find that RDBMSs (76%) are still the most widely used big data technology, followed by flat files (61%) and data warehouse appliances (46%). But this is not the direction many companies are planning to take in the future: Hadoop (44%), in-memory database (46%), specialized databases (43%) and NoSQL (42%) are the tools most often planned to be used by 2016 or being evaluated. Clearly there is a revolution in approaches to storing and using data, and that introduces both opportunities and challenges.
Establishing a big data environment requires integrating data through proper preparation and potentially continuous updates of data, whether in real time or batch processing. A further complication is that many organizations will not have only one but several big data environments to be integrated into the overall enterprise architecture; that requires data and systems integration. Our research finds that some organizations are aware of this issue: Automating big data integration is very important to 45 percent and important to more than one-third. Automation can not only bring efficiency to big data but also remove many risks of errors or inaccurate and inconsistent data.
Data integration technologies have evolved over the past decade, but advances to support big data are more recent. Our research shows a disparity in how well organizations handle big data integration tasks. Those that are mostly or completely adequate are accessing (for 63%), loading (60%), extracting (59%), archiving (55%) and copying (52%) data while the areas most in need of improvement are virtualizing (39%), profiling (37%), blending (34%), master data management (33%) and masking for privacy (33%). At the system level, the research finds that conventional enterprise capabilities are most often needed: load balancing (cited by 51%), cross-platform support (47%), a development and testing environment (42%), systems management (40%) and scalable execution of tasks (39%). To test the range of big data integration capabilities before it is applied to production projects, the “sandbox” has become the standard approach. For their development and testing environment, the largest percentage (36%) said they will use an internal sandbox with specialized big data. This group of findings reveals that big data integration has enterprise-level requirements that go beyond just loading data to build on advances in data integration.
Big data must not be a separate store of data but part of the overall enterprise and data architecture; that is necessary to ensure full integration and use of the data. Organizations that see data integration as critical to big data are embarking on sophisticated efforts to achieve it. The data integration capabilities most critical to their big data efforts are to develop and manage metadata that can be shared across BI systems (cited by 58%), to join disparate data sources during transformation (56%) and to establish rules for processing and routing data (56%).
Other organizations are still examining how to automate integration tasks. The most common barriers to improving big data integration are cost of the software or license (for 44%), lack of resources to use on improvement (37%) and the sense that big data technologies are too complicated to integrate (35%). These findings demonstrate that many organizations need to better understand the efficiency and cost savings that can be realized by using purpose-built technology instead of manual approaches using tools not designed for big data. Along with identifying solid business benefits, establishing savings of time and money are essential pieces of a convincing rationale for investment in big data integration technology. The most time spent in big data integration today is on basic tasks: reviewing data for quality and consistency (52%), preparing data for integration (46%) and connecting to data sources for integration (39%). The first two are related to ensuring that data is ready to load into big data environments. Data preparation is a key part of big data and overall information optimization. More vendors are developing dedicated technology to help with it.
For a process as complex as big data integration, choosing the right technology tool can be difficult. More than half (55%) of organizations are planning to change the way they assess and select such technology. Evaluations of big data integration tools should include considerations of how to deploy it and what sort of vendors can provide it. Almost half (46%) of organizations prefer to integrate big data on-premises while 28 percent opt for cloud-based software as a service and 17 percent have no preference. Half of organizations plan to use cloud computing for managing big data; another one-third (32%) don’t know whether they will. The research shows that the most important technology and vendor criteria used to evaluate big data integration technology are usability (very important for 53%), reliability (52%) and functionality (49%). These top three evaluation criteria are followed by manageability, TCO/ROI, adaptability and validation of vendors. Organizations are most concerned to have technology that is easy to use and can scale to meet their needs.
Big data cannot be used effectively without integration; we observe that the big data industry has not paid as much attention to information management as it should – after all, this is what enables automating the flow of data. Organizations trying to use big data without a focus on information management will have difficulty in optimizing the use of their data assets for business needs. Our research into big data integration finds that the proper technology is critical to meet these needs. We also learned from our benchmark research into big data analytics that data preparation is the largest and most time-consuming set of tasks that needs to be streamlined for best use of the analytics that reveal actionable insights. Organizations that are initiating or expanding their big data deployments whether onpremises or within cloud computing environments should have integration at the top of their priority list to ensure they do not create silos of data that they can’t fully exploit.
CEO and Chief Research Officer
Teradata continues to expand its information management and analytics technology for big data to meet growing demand. My analysis last year discussed Teradata’s approach to big data in the context of its distributed computing and data architecture. I recently got an update on the company’s strategy and products at the annual Teradata analyst summit. Our big data analytics research finds that a broad approach to big data is wise: Three-quarters of organizations want analytics to access data from all sources and not just one specific to big data. This inclusive approach is what Teradata as designed its architectural and technological approach in managing the access, storage and use of data and analytics.
Teradata has advanced its data warehouse appliance and database technologies to unify in-memory and distributed computing with Hadoop, other databases and NoSQL in one architecture; this enables it to move to center stage of the big data market. Teradata Intelligent Memory provides optimal accessibility to data based on usage characteristics for DBAs, analysts and business users consuming data from Teradata’s Unified Data Architecture (UDA). Teradata also introduced QueryGrid technology, which virtualizes distributed access to and processing of data across many sources, including the Teradata range of appliances, Teradata Aster technology, Hadoop through its SQL-H, other databases including Oracle’s and data sources including the SAS, Perl, Python and even R languages. Teradata can provide push-down processing of getting data and analytics processed through parallel execution in its UDA including data from Hadoop. Teradata QueryGrid data virtualization layer can dynamically access data and compute analytics as needed making it versatile to meet a broadening scope of big data needs.
Teradata has embraced Hadoop through a strategic relationship with Hortonworks. Its commercial distribution, Teradata Open Distribution for Hadoop (TDH) 2.1, and originates from Hortonworks. It recently announced Teradata Portfolio for Hadoop 2, which has many components. There is also a new Teradata Appliance for Hadoop; this is its fourth-generation machine and includes previously integrated and configured software with the hardware and services. Teradata has embraced and integrated Hadoop into its UDA to ensure it is a unified part of its product portfolio that is essential as Hadoop is still maturing and is not ready to operate in a fully managed and scalable environment.
Teradata has enhanced its existing portfolio of workload-specific appliances. It includes the Integrated Big Data Platform 1700, which handles up to 234 petabytes, the Integrated Data Warehouses 2750 for up to 21 petabytes for scalable data warehousing and the 6750 for balanced active data warehousing. Each appliance is configured for enterprise-class needs, works in a multisystem environment and supports balancing and shifting of workloads with high availability and disaster recovery. They are available in a variety of ratios including disks, arrays and nodes, which makes them uniquely focused for enterprise use. The appliances run version 15 of the Teradata database with Teradata Intelligent Memory and interoperate through integrated workload management. In a virtual data warehouse the appliances can provide maximum compute power, capacity and concurrent user potential for heavy work such as connecting to Hadoop and Teradata Aster. UDA enables distributed management and operations of workload-specific platforms to use data assets efficiently. Teradata Unity now is more robust in moving and loading data, and Ecosystem Manager now supports monitoring of Aster and Hadoop systems across the entire range of data managed by Teradata.
Teradata is entering the market for legacy SAP applications with Teradata Analytics for SAP, which provides integration and data models across lines of business to use logical data from SAP applications more efficiently. Teradata acquired this product from a small company in last year; it uses an approach common among data integration technologies today and can make data readily available through new access points to SAP HANA. The product can help organizations that have not committed to SAP and its technology roadmap, which proposes using SAP HANA to streamline processing of data and analytics from business applications such as CRM and ERP. For others that are moving to SAP, Teradata Analytics for SAP can provide interim support for existing SAP applications.
Teradata continues expansion of its Aster Discovery Platform to process analytics for discovery and exploration and also advances visualization and interactivity with analytics, which could encroach on partners that provide advanced analytics capabilities like discovery and exploration. Organizations looking for analytic discovery tools should consider this technology overlap. Teradata provides a broad and integrated big data platform and architecture with advanced resource management to process data and analytics efficiently. In addition it provides archiving, auditing and compliance support for enterprises. It can support a range of data refining tasks including fast data landing and staging, lower workload concurrency, and multistructured and file-based data.
Teradata efforts are also supported in what I call a big data or data warehouse as a service and is called Teradata Cloud. Its approach is can operate across and be accessed from a multitenant environment where it makes its portfolio of Teradata, Aster and Hadoop available in what they call cloud compute units. This can be used in a variety of cloud computing approaches including public, private, hybrid and for backup and discovery needs. It has gained brand name customers like BevMo and Netflix who have been public references on their support of Teradata Cloud. Utilizing this cloud computing approach eliminates the need for placing Teradata appliances in the data center while providing maximum value from the technology. Teradata advancements in cloud computing comes at a perfect time where our information optimization research finds that a quarter of organizations now prefer a cloud computing approach with eight percent prefer it to be hosted by a supplier in a specific private cloud approach.
What makes Teradata’s direction unique is moving beyond its own appliances to embrace the enterprise architecture and existing data sources; this makes it more inclusive in access than other big data approaches like those from Hadoop providers and in-memory approaches that focus more on themselves than their customers’ actual needs. Data architectures have become more complex with Hadoop, in-memory, NoSQL and appliances all in the mix. Teradata has gathered this broad range of database technology into a unified approach while integrating its products directly with those of other vendors. This inclusive approach is timely as organizations are changing how they make information available, and our information optimization benchmark research finds improving operational efficiency (for 67%) and gaining a competitive advantage (63%) to be the top two reasons for doing that. Teradata’s approach to big data helps broaden data architectures, which will help organizations in the long run. If you have not considered Teradata and its UDA and new QueryGrid technologies for your enterprise architecture, I recommend looking at them.
CEO & Chief Research Officer