You are currently browsing the tag archive for the ‘Hadoop’ tag.
Teradata recently gave me a technology update and a peek into the future of its portfolio for big data, information management and business analytics at its annual technology influencer summit. The company continues to innovate and build upon its Teradata 14 releases and its new processing technology. Since my last analysis of Teradata’s big data strategy, it has embraced technologies like Hadoop with its Teradata Aster Appliance, which won our 2012 Technology Innovation Award in Big Data. Teradata is steadily extending beyond providing just big data technology to offer a range of analytic options and appliances through advances in Teradata Aster and its overall data and analytic architectures. One example is its data warehouse appliance business, which according to our benchmark research is one of the key technological approaches to big data; as well Teradata has advanced support with its own technology offering for in-memory databases, specialized databases and Hadoop in one integrated architecture. It is taking an enterprise management approach to these technologies through Teradata Viewpoint, which helps monitor and manage systems and support a more distributed computing architecture.
By expanding its platform to include workload-based appliances that can support terabytes to petabytes of data, its Unified Data Architecture (UDA) can meet a broad class of enterprise needs. That can help support a range of big data analytic needs, as my colleague Tony Cosentino has pointed out, by providing a common approach to getting data from Hadoop into Teradata Aster and then into Teradata’s analytics. This UDA can begin to address challenges in data activities and tasks in the analytic process, which our research finds are issues for 42 percent of organizations. Teradata Aster Big Analytics Appliance is for organizations that are serious about retaining and analyzing more data, which 29 percent of organizations in our research cited as the top benefit of big data technology. This appliance can handle up to 5 petabytes and is tightly integrated with Aster and Hadoop technology from Hortonworks, a company that is rapidly expanding its footprint, as I have already assessed.
The packaged approach of an appliance can help organization address what our technology innovation research identified as the largest challenges in big data: not enough skilled resources (for 56% of organizations) and being hard to build and maintain (52%). These can be overcome if an organization designs a big data strategy that can apply a common set of skills, and the Teradata technology portfolio can help with that.
At the influencer summit, I was surprised that Teradata did not go into the role of data integration processes and the steps to profile, cleanse, master, synchronize and even migrate data (which its closest partner, Informatica, emphasizes) but focused more on access to and movement of data through its own connectors, Unity Data Mover, Smart Loader for Hadoop and support of SQL-H. For most of its deployments there is a range of complementary data integration technology from its partners as much as it is a Teradata only approach. For SQL-H Teradata takes advantage of the metadata HCatalog to improve access to data in HDFS. I like how Teradata Studio 14 helps simplify the view and use of data in Hadoop, Teradata Aster and even spreadsheets and flat files for building sandbox and test environments for big data. (To learn more, look into the Teradata Developer Exchange.) Teradata has made it easy to add connecters to get access to Hadoop on its Exchange which is a great way to get the latest advances in its utilities and add-ons to its offerings.
Teradata provided an early peak on the just announced Teradata Intelligent Memory, a significant step in adapting big data architectures to the next generation of memory management. This new advancement can cache and pool data that is in high demand (hot) across any number of Teradata workload-specific platforms by processing data to determine the importance of data (described as hot, warm or cold) for fast and efficient access and applying analytics. This technological feat can then utilize both solid-state and conventional disk storage to ensure the fastest access and computation of the data for a range of needs. This is a unique and powerful way to support an extended memory space for big data and to intelligently adapt to the data patterns of user organizations; its algorithms can interoperate across Teradata’s family of appliances.
Teradata has also invested further into its data and computing architecture through what it calls fabric-based computing. That can help connect nodes across systems through access on the company’s Fabric Switch using its BYNET, Infiniband and other methods. (Teradata participates in the OpenFabrics Alliance, which works to optimize access and interconnection of systems data across storage-area networks.) Fabric Switch provides an access point through which other aspects of Teradata’s UDA can access and use data for various purposes, including backup and restore or data movement. These advances will significantly increase the throughput and combined reliability of systems and enhance performance and scalability at both the user and data levels.
Tony Cosentino pointed out the various types of analytics that Teradata can support; one of them is analytics for discovery through its recently launched Teradata Aster Discovery Platform. This directly addresses two of the four types of discovery I have just outlined : data and visual discovery. Teradata Aster has a powerful library of analytics such as path, text, statistical, cluster and other areas as core elements of its platform. Its nPath analytic expression has significant potential in enabling Aster to process distributed sets of data from Teradata and Hadoop in one platform. Analytic architectures should apply the same computational analytics across systems, from core database technology to Teradata Aster to the analytics tools that an analyst is actually using. Aster’s approach to visual and data discovery is challenging in that it requires a high level of expertise in SQL to make customizations; the majority of analysts that could use this technology don’t have that level of knowledge. But here Teradata can turn to partners such as MicroStrategy and Tableau, which have built more integrated support for Teradata Aster and offer easier to use that are interactive and visual designed for analysts who do not want to muck with SQL. Teradata has internal challenges in improving support for analysts and the analytic processes they are responsible for; its IT-focused, data-centric approach will not help here. Our big data research finds that staffing and training are the top two barriers for using this technology, according to more than 77 percent of organizations; vendors should note this and reduce the custom and manual work that requires specific SQL and data skills in their products.
Regarding analytics specifically, Teradata has continued to deepen its analytics efforts with partner SAS. A new release of Teradata Appliance supports SAS High-Performance Analytics for up to 52 terabytes of data and also supports SAS Visual Analytics, which I have tried and assessed and tried myself.
Through its Teradata Aprimo applications Teradata continues its efforts to attract marketing executives in business-to-consumer companies that require big data technology to utilize a broad range of information. Teradata has outlined a larger role for the CMO with big data and analytics capabilities that go well beyond its marketing automation software. The company announced expansion to support predictive analytics and has outlined its direction for supporting customer engagement. It needs to take steps such as these to ensure it tunes into business needs beyond what CIOs and IT are doing with Teradata as a big data environment for the enterprise.
Along these lines I have also pointed out that we should be cautious about accepting research that predicts the CMO will outspend the CIO in the future. What I have seen in these assertions is flawed in many facets and often come from those who have no experience in market research and the role marketing and dealing with technology expenditure in that context. As we have done research into both the business and IT sides, we have discovered the complexities of making practical technology investments; for example, our research into customer relationship maturity found that inbound interactions from customers occur across many departments; they occur in marketing (in 46% of organizations), but more often through contact centers (77%), where Teradata should strengthen its efforts. On the plus side Teradata continues to demonstrate success in assisting customers in marketing, winning our 2013 Leadership Award for Marketing Excellence with its deployment at International Speedway Corp. and in 2012 at Nationwide Insurance with Teradata Aprimo. Our current research into next-generation customer engagement already identifies a need to support multichannel and multidepartment interactions. Teradata could further expand its efforts in these areas with existing customers; KPN won our 2013 Leadership Award in Customer Excellence after connecting Teradata with its Oracle-based applications and supporting BI systems.
Overall Teradata is doing a great job of focusing on its strengths in big data and areas where it can maximize the impact of its analytics, especially marketing and customer relations. While IBM, Oracle, SAP and other large technology providers in the database and analytic markets tend to minimize what Teradata has created, it is has a loyal customer base that is attracted to the expanded architectures of its appliances and its broader UDA and intelligent memory systems. I think with more focus on the processes of real business analysts and further simplifying usability, Teradata’s opportunity could grow significantly. In helping its customers process more of the vast volumes of data and information from the Internet, such as weather, demographic and social media, it could make clear the broader value of big data in optimizing information from the variety of data in content and documents. It could expand its new generation of tools and applications to exploit the use of this information as it is beginning to do with marketing applications from Teradata Aprimo. If Teradata customers find it easier to access information and share it across lines of business through social collaboration and mobile technology, that will further demand for its technology to operate on larger scales in both the number of users and the places where it can be accessed even via cloud computing. Exploiting in-memory computing along with providing more discovery potential from analytics will help its customers utilize the power of big data and trust in Teradata to supply it.
CEO & Chief Research Officer
I recently attended the annual Informatica analyst summit to get the latest on that company’s strategy and plans. The data integration provider offers a portfolio of information management software that supports today’s big data and information optimization needs. Informatica is busy making changes in its presentation to the market and its marketing and sales efforts. New executives, including new CMO Marge Breya, are working to communicate what is possible with Informatica’s product portfolio, and it’s more than just data integration.
Big data and cloud computing have placed challenges on IT in its roles as both a facilitator and in providing governance and compliance with policies and regulations, including access and security. IT compliance costs are increasing, according to 53 percent of heavily regulated organizations, and even 17 percent of those subject to little or no regulation, according to our governance, risk and compliance research. CIOs should examine Informatica’s product portfolio to see how to increase efficiency in the access, governance and integration of data in IT systems for more effective business processes including those that are GRC related.
Governance over transactional, interaction and analytical systems is a complex task. Late last year I wrote about Informatica’s latest efforts in big data and cloud computing; the company is now shipping its PowerCenter Big Data Edition, which facilitates integration with Hadoop. I have written about how integration with big data is broken today as organizations struggle not just with Hadoop but also with other big data technologies. Informatica provides tools to parse data so it can be profiled and processed efficiently. For example, Informatica can perform natural language processing to extract entities from text within unstructured data that can help in a range of tasks including those related to IT need to perform reviews of data.
With its latest tools, Informatica has stepped beyond the Informatica Cloud Winter 2013 release, which started the software down the path of bringing master data management (MDM) and data governance into the cloud. The Cloud Spring 2013 release, expected in April, is about providing enterprise capabilities in the cloud. New Cloud Data Masking can help secure sensitive or confidential data; our data in the cloud research found that data security is the number one concern in 63 percent of organizations. A data loader for Salesforce makes a bulk read and write license available; I have written about how providing data plumbing is your business, as Salesforce has failed to meet customers’ needs in this area.
Informatica last month acquired Active Endpoints, whose Cloud Extend applies cloud-based workflow services to what would regularly just be state-based applications, such as Salesforce applications for SFA. Cloud Extend lets managers map out the steps that should be taken in an application and prompts users for action. This application, which is designed for line of business and analysts, can provide value for both business and centralized IT. Informatica is making it more efficient to set up and establish integration across the cloud, and its ability to subset data and support sandbox environments helps its customers reduce costs and time to get up and running.
Informatica has announced it is offering prepackaged integration with NetSuite and Workday applications that operate in the cloud in its Cloud Connector Marketplace Mall. This is a welcome step; Informatica needs to invest further to develop cloud connectors for the larger group of cloud computing applications in use today, as it has many more to address to reach critical mass or universal connectivity. The good news is that many software organizations that operate in the cloud, including MicroStrategy, Ultimate Software and Xactly, are embedding Informatica to improve their ability to be efficient with data and support customers’ needs. In its Spring release Informatica will also provide connectivity to Amazon Redshift, Oracle CRM On Demand and Microsoft Dynamics AX. The announced move to support Amazon Redshift is important as more organizations look to embrace cloud computing for their data storage and processing needs.
At the analyst summit Informatica presented its vision of the future of cloud as an IT-led activity, saying that the days that line of business owned and led cloud effort are past. In this the company could not be more wrong, as subscription and access to cloud applications and services by business continues to grow as their need for them increases when they get little to no support from IT. While IT might be getting engaged and starting to leverage this utility of computing, they are no way leading or controlling what business is doing. We continue to see this in sales, marketing, customer service, operations, human resources and even finance. In the end, business is held accountable for business processes and outcomes, and I do not see any research points that indicate this will change in the near future. What is needed is more of an adaptive environment where analysts and business can facilitate more interactions through data requests and tasks, not just stewardship and increasing the quality of the data that exists, which is only part of the bottleneck.
Informatica also provided more insight to how it uses Virtual Data Machine, where Informatica products can operate across platforms and environments yet be insulated from their differences. I would expect to hear more from the company on where this can play a role in cloud and hosted environments as much as it can in on-premises environments. Ultimately this technology should be able to support more integration points and partners as it has done with Teradata; Informatica recently announced further support for Teradata Unified Data Architecture, where it can streamline data integration from within the Informatica Virtual Data Machine to environments like Teradata.
Informatica also continues its strategic partnership with Heiler, which it is in the process of acquiring and expected by year end if approved by German regulatory review. Since my analysis of the announcement last fall the companies have been working to integrate MDM with product information management (PIM). Informatica has come to recognize that PIM is not MDM; they have different business and IT requirements, but together they can be a valuable combination. This simple position is not generally accepted by the majority of IT analysts, who have led many of the largest of software companies into the IT approach, which our PIM benchmark research has found is wrong, and which led me to write a perspective on how PIM is for business. Heiler, which we rated as Hot in our 2012 Value Index for Product Information Management, plus Informatica, which was Hot in our 2012 Value Index for Data Integration, combined might be the next PIM powerhouse.
Informatica continues to expand its portfolio to support a range of real-time operations needs. It recently released a new version of Informatica Ultra Messaging that my colleague Robert Kugel assessed. Beyond the near-real-time features is Informatica’s capability of handling complex event processing (CEP) and what we call operational intelligence in its products. Unfortunately, with such a busy product portfolio, Informatica’s CEP and operational intelligence capabilities are rarely marketed and not very well known. Our benchmark research finds that activity or event monitoring is a top priority in 62 percent of organizations, and that is exactly what Informatica PowerCenter offers.
I expect to see more big steps forward for Informatica, as it has many development initiatives that are still confidential that will continue its expansion as an information-centered software provider. As technology providers such as Informatica are further pressured to demonstrate business value, we will see a further shift to what we call information optimization, which is in the end what business needs on a more timely and consistent basis, as I have outlined in our research agenda.
Informatica finds its customers moving to being stewards of business data but need to move further to supporting analysts’ needs for data to perform analytics. Our latest research finds that 42 percent of organizations are still impeded by data-related tasks preventing them from handling analytic ones. This has led to the startling reality, found in our latest research into spreadsheets, that spreadsheets are used 74 percent of the time for business intelligence tasks, despite the fact that they are responsible for a high amount of errors from the manual copy, paste and calculation tasks. The need to remedy data-related problems should help Informatica bridge the data divide between business and IT. Informatica continues to be bullish on its growth opportunities, and it does not have to convince me, as our research for a decade has shown the need for rationalization to improve efficiency and profitability.
CEO & Chief Research Officer