Posts Tagged ‘DB2’

The 3 Waves of Data Warehousing

February 7, 2011 Leave a comment

The data warehousing market is fascinating to me. Thanks to my past experiences in real-time change-data-capture technology, I’ve managed to be exposed to some fascinating use-cases and real-world technologies.

The following is based on my experiences working with (what I call) appliance vendors like Teradata and Netezza (now IBM), as well as more OLAP style systems like Oracle, SQL Server and DB2.

The Fundamental Flaw:
Businesses are event based, and event driven. Business is based on interactions, or transactions, between people, processes and systems. Every transaction may matter, some may matter more than others; some may matter not at all. Regardless, there are always transactions, or events, that do matter and when harnessed effectively will provide organizations the ability to sense and respond to changing business dynamics and requirements more effectively than their competition.

What’s required is what’s missing – the ability to capture and store the events that matter in a way that meets the business need for immediacy whilst minimizing business and IT risk. Businesses are looking for an environment that’s capable of acting as a detailed transactional system for operational reporting, the source for master information and strategic reporting activities.

A warehouse that can be all things to all business users

The detailed information about business events is more critical than ever before, especially given Master Data Management initiatives, operational intelligence requirements and the expanded use of data warehouses.

Companies that have a dynamic warehouse which contains detailed transactional information through real-time feeds of data will have a competitive advantage. These organizations will have a repository of transactional information for operational intelligence reporting as well as a complete source of information easily supporting changes to MDM definitions and report requirements

First Wave
Data warehouses with highly normalized data schemas replace reporting against production environments and manual tabulation from hand written documents (incl. spreadsheets). Data warehouses were created to aid in the reporting on data contained within systems, but have at least one fundamental, critical flaw from a pure technical perspective – it’s impossible to decompose aggregates. In the past, processing power and disk space was extremely expensive. In order for the IT organization to meet the business requirements (speed and completeness) around reporting it was important that data was highly normalized from production environments – aggregates created as data is loaded to shrink the time and effort required to generate reports.

A critical problem with this is the inability to decompose an aggregate. If the answer is “42” what’s the question? Maybe it’s the sum of 10 + 30 + 2 or maybe it’s 84/2. But if the answer to the question of “average time to ship replacement inventory” is “42 hours” how can you track the ebbs and flows of business process execution? How can you correlate events across business process execution – you need detailed transactional detail from every operation of every system.

Second Wave
Data warehouses contain transactional detail created by execution of business processes from operations over time. The schema format is fundamentally different from the “traditional” star or snowflake forms found in First Wave data warehouses. The warehouse schema shows timestamps for the execution of every operation on each source system as well as source system identification metadata. By maintaining this metadata and transactional detail it’s possible to recreate a production system at any point in time in the past (of potential value in highly compliant companies) and report on the status and temporal characteristics of business process execution.

Businesses get better ability to understand events and IT can more easily adapt to changing business requirements for report generation.

There are no summarizations generated to speed reporting without the associated detailed data that shows how any and all aggregates can be decomposed.

Having the detailed information in the single repository allows for greater value to be derived from MDM projects. As the definition of a “gold master” (i.e. customer definition) changes over time, the past can always be reconstructed because the detailed information still exists.

The detailed data also provides value to quality processes. Over time as the business changes reporting requirements and master data definitions, quality processes will necessarily change too. Quality is no more static than a business itself. Detailed historical data allows companies to produce more value and understanding.

Operational empowerment becomes pervasive across organizations. Event propagation means more than just flowing data transactions between systems, people and processes. Companies will utilize historical context available in the data warehouse with event awareness to gain a further competitive advantage. First-fruits will probably be with customer service empowerment such as proactively alerting pre-paid cellphone customers that they’re running low on available minutes. The critical component is the utilization of the data warehouse as an area for gaining context into business events for effective, and as much as possible automated, reaction.

Third Wave
Predictive computational analysis through holistic system awareness expands data warehouse usage into systems management, not just business reporting.

Detailed transactional data is stored in conjunction with systems operations information such as transaction volumes, system response time, network traffic, etc. Correlating metadata about transactional operations (inserts, updates or deletes plus length of transaction) with systems information (how many active users and bandwidth usage) can be used to predict system resource requirements.

IT now uses the data warehouse to help serve business requirements better and ensure they’re meeting internal and external SLA’s.

The technology industry today is between the first and second waves.

Companies I’ve worked with understand the value of operational data in warehouse appliances but are struggling to come to terms with non-OLTP nature of existing appliance solutions and challenges in loading data quickly and efficiently for effective reporting.

Teradata, Netezza and others in the warehouse appliance space are starting to understand the limitations of their technologies in supporting the changing requirements – limitations seemingly not faced by OLTP vendors like Oracle, Microsoft and DB2.