SYDNEY, 4 MARCH 2011 - According to the analyst firm Gartner, the data warehouse is set to remain a critical component of IT infrastructure and, as the demand for business intelligence (BI) and business analytics increases, optimisation, flexible designs and alternative strategies will become more important.
"The data warehouse remains one of the largest if not the largest information repository in the enterprise," said research vice-president, Mark Beyer. "Only by being aware of the market trends and how emerging technology solutions will blend with proven practices can the CIO avoid budget waste through 'misdirection' by the data warehouse management and delivery team."
Gartner has identified nine major trends in the data warehousing market for 2011 through 2012:
Advanced functionality for hardware management of input/output (I/O), disk storage and CPU/memory balancing are now included almost as a matter of course in data-warehouse-capable platforms. Some new entrants are focusing on optimisation as a differentiator and nearly every data warehouse vendor is now addressing the issue of optimising storage for the warehouse via compression and usage-based data placement strategies. Vendors are also expending great effort differentiating their products on performance claims and technology, in ways that are not necessarily significant to the use case.
Although there are many reasons why organisations consider buying an appliance, the main reason is simplicity. The vendor builds and certifies the configuration, balancing hardware, software and services for a predictable performance. The appliance is delivered complete and installs rapidly. If there are any problems, a single call to the appliance vendor is the first course of action. There is a secondary effect as well, in that appliances can speed delivery by avoiding time-consuming hardware balancing.
Most organisations understand the need to perform a proof of concept (POC) with a shortlist of vendors during the selection phase of the data warehouse database management system. Gartner recommends that POCs use as much real source-system extracted data from the operational systems as possible, while performing the POC with as many users as possible, creating a data warehouse workload that approaches that of the environment to be used in production.
There are six workloads that are delivered by the data warehouse platform: Bulk/batch load, basic reporting, basic online analytical processing (OLAP), real-time/continuous load, data mining and operational BI.
Warehouses delivering all six workloads need to be assessed for predictability of mixed workload performance as failing to plan for mixed workloads will lead to increased administration costs over time, as volume and additional workloads are added, potentially leading to major sustainability issues.
A data mart is an application-specific analytic repository of any size, normally with a specific, smaller group of users than a data warehouse. Data marts can be used to optimise the data warehouse by offloading part of the workload to the data mart, returning greater performance to the warehousing environment.
Column-store database management systems generally exhibit faster query response than traditional, row-based systems and can serve as excellent data mart platforms, and even as a main data warehouse platform. Gartner foresees several vendors changing the pricing model for the software from a more traditional per-user or per-core model to a price based on the volume of data loaded into the database.
In-memory database management technologies exhibit extremely fast query response and data commit times and introduce a higher probability that analytics and transactional systems can share the same database. Analytic data models, master data approaches and data services within a middle tier will begin to emerge as the dominant approach, forcing more traditional row-based vendors to adapt to column approaches and in-memory simultaneously. BI solutions will emerge sooner rather than later, and these will leverage in-memory database management with superior-performing products and will quickly become acquisition targets for megavendors.
In 2011, data warehouse as a service comes in two 'flavours' software-as-a service (SaaS) and outsourced data warehouses. Data warehouse in the cloud is primarily an infrastructure design option as a data model must still be developed, an integration strategy must be deployed and BI user access must be enabled and managed. Private Clouds are an emerging infrastructure design choice for some organisations in supporting their data warehouse and analytics.
Open source database management systems are still being used in both experimental and more formal approaches. At this point, open source warehouses are rare and usually smaller than traditional ones and also generally require a more manual level of support. However, some solutions are optimised specifically for data warehousing.
Sign up for MIS Asia eNewsletters.