[Business Intelligence] Data Warehouse (Multidimensional Data Model | Multidimensional Data Analysis)

Business Intelligence Series Article Directory

[Business Intelligence] Data Preprocessing
[Business Intelligence] Data Warehouse (Multidimensional Data Model | Multidimensional Data Analysis)


Article Directory


Preface

The previous blog [Business Intelligence] Data Preprocessing explained data preprocessing operations. This blog introduces data warehouses, the difference between data warehouses and traditional databases, multidimensional data models, etc.;





1. The difference between data warehouse and traditional database


Data warehouse characteristics:

  • Subject-oriented
  • integrated
  • Not updateable
  • Constantly changing over time

Data Warehouse Definition: A data warehouse is used to better support business / organizational decision analysis , subject-oriented, integrated, non-renewable, over time, evolving set of data;


Traditional database , also known as operational databases , such as Oracle, MySQL database and the like;


Operational dataAnalytical data
Data granularityDetailedComprehensive
Data agingAccurate storage instantlyPast historical data
Read-onlyUpdatableNot updateable
Demand knowableKnow the requirements during operationDo not know the requirements in advance during operation
Life cycleLife cycle conforms to SDLCCompletely different life cycle
Performance requirementsHigh performance requirementsLow performance requirements
Operation sizeOperate the data of a unit at the same timeOperate a set of data at the same time
Data sizeSmall amount of data in a single operationLarge amount of data in a single operation
Driving forceTransaction drivenAnalysis driven
specific purposeApplication-orientedAnalysis-oriented
Application scenarioSupport daily operationsSupport management needs
Data contentBusiness related dataDecision-related data
Data modelRelationship, hierarchyRelationship, multidimensional data structure
data accessRandom read and write operationsOnly query operations
WorkloadThe amount of transaction processing is large, and each transaction involves few recordsSmall queries, large amounts of data need to be queried each time
Transaction outputRarelyVery big
System shutdownCatastrophic system downtimeDelayed decision





Second, the data warehouse system architecture


Data warehouse system architecture:

  • data source
  • Integrated tools
  • database
  • Data warehouse server
  • OLAP server
  • Metadata
  • Metadata management tools
  • Data mart
  • Foreground analysis tools





Three, multi-dimensional data model


Multi-dimensional data model: from the perspective of business analysis, a method of logically modeling data; it is simple, easy to understand, and convenient to query;

Multidimensional data model is also called dimensional data model, which is composed of dimension table and fact table;


1. Star mode


Star schema: the middle of a table , called the fact table , surrounded by many small tables, these tables are called dimension tables;


Disadvantages of star mode:

① The star schema does not support the layer structure of dimensions;

  • Single dimension table: There is only one dimension table for each dimension, and all dimension layer attributes are placed in one table without normalization;
  • Single dimension table example: Take the dimension table corresponding to the time of the above "commodity" fact table as an example, put the year, month, day, hour, minute, second and other fields in the same dimension table, and the time dimension can become multiple Dimension tables, such as dimension tables containing only year, month, and day, and dimension tables containing only year, month, etc.;

② Data redundancy:

  • Data redundancy: Each dimension table must represent all layers, each layer has its own attributes, and there is a lot of data redundancy;
  • Data redundancy example: For each product in the above time dimension table, the complete year, month, day, hour, minute, and second data must be stored. In fact, the year, month, and other data of the product are the same for many products. Only record once, and all products record year and month information, so a lot of redundant data is generated;

③ The same query problem of attribute names in different dimension layers:

  • Different dimension layers have the same attributes and can only be queried by changing the name;
  • Examples of the same attributes in different dimension layers: For example, in the store fact table, city, province, and country, each level has a manager Manager. When querying the Manager attribute, the city manager, province manager, and country manager are all queried directly. It is not possible to query the manager information of a single level;


2. Snow flake mode


For the dimension hierarchy of complex peacekeeping

  • To avoid redundant data taking up too much space
  • In order to support the same attribute query in different dimension layers

Use multiple dimension tables to describe complex dimensions , so that branches appear on the corners of the star of the star model, similar to the shape of a snowflake, so this variant of the star model is called a "snow flake model";


Advantages and disadvantages of the snow flake model:

  • Snowflake model advantage: dimension tables snowflake model is normalized dimension tables, dimension tables snowflake model is easy to maintain, saving storage space;
  • Snowflake model drawback: when snow model query needs to be more join operations, affecting system performance;


3. Fact constellations


The model than the star schema, snowflake more complex model, both models, the fact that there is only one table, but the fact that the constellation model, there are multiple fact tables, two fact tables, dimension tables may be a number of common;





Fourth, online analysis and processing


Online analytical processing: a class of software technology, analysts , managers take advantage of this technology, from multiple perspectives , through fast, consistent, interactive access to data, reaching for insights;





Five, multi-dimensional data analysis operations


Multidimensional data analysis operation:

Slice: In a certain dimension of the data cube, select a dimension member;

Dicing: Select a dimension member of a certain interval in a certain dimension of the data cube ;

Rotation: change the order of data cube dimensions;

Drill: analysis process, the user needs more dimensions , or on a finer level of dimensional observation data, the former add more dimension , which is drilled to a finer level of data in the existing dimension;

Volume: analysis process, the user needs less maintenance , or one of the dimension coarser level observational data, the former to reduce a dimensional analysis after which the volume available to a higher level on the analysis of a dimension;






to sum up

This blog analyzes the difference between data warehouse and data block from various angles, briefly introduces the data warehouse architecture, the three model structures of the multidimensional data model, and the data analysis operations in the multidimensional data model;

Insert picture description here