Introduction to Data Warehousing

Data Warehousing Purpose The purpose of this paper is to define data warehousing to the beginning applied IT student, by discussing the theoretical definition of a data warehouse and by giving some practical examples of the technologies and tools available. What is a Data Warehouse? According to Ralph Kimball, author of The Data Warehouse Toolkit, a data warehouse is a copy of transactional data structured specifically for analysis, querying, and reporting. That does not mean that any database that you can query or write reports against is a data warehouse. Just because you can analyze, query, and report on the Northwind database that came with Microsoft Access does not make it a data warehouse. A data warehouse, in practical terms, is not only a large database that has its data structured using a special design (also known as a schema), but it also includes all of the tools used to create, update and access it. The things that make up the special design of the data warehouse include cubes, facts, dimensions, measures, and aggregations. Cubes are special data structures designed using either a Star schema or a Snowflake schema (see page 5 for more info on these schemas). ... A measure of a fact is some numerical data, such as sales, number of customers, or costs. ... The combination of this specially structured data, with special server software (in the case of Microsoft, their data warehouse software is made up of two components, the relational database engine, SQL Server 2000, and their analysis server, aptly called SQL Server 2000 Analysis Server), optimized server hardware, and some kind or kinds of client tools (like Microsoft Excel or the new Data Analyzer, also from Microsoft) for the end user to retrieve the data for the analysis, querying, and reporting, all make up the reality of a practical, real world data warehouse. Ralph Kimball’s definition of a data warehouse leaves out the other components of the data warehouse, which makes it a theoretical definition of a data warehouse. ... Kimball is basically saying that the data warehouse is just the data. The reality is that a real world data warehouse requires all of the above mentioned elements. Why a Data Warehouse? The goal of having a data warehouse is to allow the end user to be able to easily query for data, without putting an excessive load on the OLTP (online transaction processing) system. A good data warehouse will allow the end user to ‘slice and dice’ the data any way they so desire, without complicated tools or queries, and without constantly having to make requests from the IS department, requesting report after report (which is what a lot of companies do). A good data warehouse will keep these queries separate from the OLTP system, so as to not weigh it down with reporting requests, which is important, because the main focus of the OLTP must be in handling day to day transactions. ... Also, many of the reports the users require aren’t even known to the users until they become knowledgeable with the new system and recognize that the data may be available, which is usually long after the system has been deployed. ... A data warehouse addresses this problem by providing special querying and reporting tools (like Microsoft Excel’s Pivot Table and Data Analyzer), which allow the end user to run their own ad hoc queries and reports, as well as performing advanced data analysis such as charting and data mining. And since the data warehouse will generally be on a separate server and have its own data store, there will be no adverse effect of all these user’s queries to the OLTP system. Another goal of the data warehouse is to provide a centralized place for this reporting to occur. There may be many OLTP systems collecting data, and they usually all have some reporting facilities. What happens when the end user knows what data they want, but doesn’t know which of the OLTP systems to get it from? A data warehouse addresses this issue by providing a single central point querying of enterprise data. OLAP versus OLTP Understanding that the Northwind sample Access database is not a data warehouse, one asks, where can I find a data? ... The three main operations that occur are adding, editing, and deleting data. Reading or selecting data also occurs, but this is different in that selecting data does not cause an update to occur. ... The underlying database structure is usually normalized, meaning that data is not stored redundantly, so it is only updated and stored in one place. ... When you need to retrieve an order and the customer name that the order is for, the OLTP system joins the data from its different spots within the system and returns it to the end user. ... Conversely, an OLAP (Online Analytical Processing) system stores data redundantly, specifically so that it will not have to spend time joining data from many different spots within the database. By using more space, an OLAP system performs much faster at returning data, and thus is optimized for data retrieval (versus data update). Designing a data warehouse is very different from designing an OLTP system. In contrast to an OLTP system, in which the purpose is to organize large amounts of data changes and additions, the purpose of a data warehouse is to organize large amounts of mainly static data for ease of querying and reporting. Because of these differing purposes, there are many considerations in data warehouse design that differ from OLTP database design.

Essay Information


Words: 4554
Pages: 18.2
Rating: None

All Papers Are For Research And Reference Purposes Only. You must cite our web site as your source.