Warehouse Mechanism - Star Schema And Extended Star Schema

Star Schema: 

In data warehousing and business intelligence (BI), a star schema is the simplest form of a dimensional model, in which data is organized into facts and dimensions. A fact is an event that is counted or measured, such as a sale or login. A dimension contains reference information about the fact, such as date, product, or customer. A star schema is diagrammed by surrounding each fact with its associated dimensions. The resulting diagram resembles a star.

Star sctrial are optimized for querying large data sets and are used in data warehouses and data marts to support OLAP cubes, business intelligence and analytic applications, and ad-hoc queries. 

Within the data warehouse or data mart, a dimension table is associated with a fact table by using a foreign key relationship. The dimension table has a single primary key that uniquely identifies each member record (row). The fact table contains the primary key of each associated dimension table as a foreign key. Combined, these foreign keys form a multi-part composite primary key that uniquely identifies each member record in the fact table. The fact table also contains one or more numeric measures. 

For example, a simple Sales fact with millions of individual clothing sale records might contain a Product Key, Promotion Key, Customer Key, and Date Key, along with Units Sold and Revenue measures. The Product dimension would hold reference information such as product name, description, size, and color. The Promotion dimension would hold information such as promotion name and price. The Customer dimension would hold information such as first and last name, birth date, gender, address, etc. The Date dimension would include calendar date, week of year, month, quarter, year, etc. This simple Sales fact will easily support queries such as “total revenue for all clothing products sold during the first quarter of the 2020” or “count of female customers who purchased 5 or more dresses in December 2019”. 

The star schema supports rapid aggregations (such as count, sum, and average) of many fact records, and these aggregations can be easily filtered and grouped (“sliced & diced”) by the dimensions. A star schema may be partially normalized (snowflaked), with related information stored in multiple related dimension tables, to support specific data warehousing needs. 

Online analytical processing (OLAP) databases (data warehouses and data marts) use a denormalized star schema, with different but related information stored in one dimension table, to optimize queries against large data sets. A star schema may be partially normalized, with related information stored in multiple related dimension tables, to support specific data warehousing needs. In contrast, an online transaction processing (OLTP) database uses a normalized schema, with different but related information stored in separate, related tables to ensure transaction integrity and optimize processing of individual transactions.


Extended Star Schema

In the star schema design, fact table sits in the middle and is connected to other surrounding objects (dimension lookup tables) like a star. Each dimension is represented as a single table. The primary key in each dimension table is related to a foreign key in the fact table. Ok. Now what is Extended Star Schema? Let us take a sample transaction:



In the above figure, you can notice a cluster of tables. Let me explain what is actually happening with these tables.


We mentioned Sales, Customer, Product and Color as the master data.SID tables are the tables which store the Master Data in BW. So as soon as system encounters Sales person S1, Customer C1, Product P1 and Color Black which are not present in the system. BW System creates SID’s for them. In the above figure you can see SID’s created highlighted in red box.


In the above figure you can see how the Dimension Id’s (DID’s) are created. For every combination of SID’s we get a unique DID. 
We have 2 cases here: 

1) Keeping the Sales and Customer as separate dimensions, we can see how the Dimension ID’s are created. 

2) In the other case, where Product and Color are kept under the same dimension as the relationship between the product and Color is one-one. We can see that Dimension ID is created for the unique combination of both the SID’s.


We can see now how the Fact table is constructed with the help of Dimension and Surrogate ID tables. In this way the facts are analyzed with the help of dimensions. 

Now let us have a look on what happens if we have another transaction entry. 

Example:


In the above transaction, we find all the entries are same except color i.e. “RED”. Now let us see how BW captures the data into the database.

Now you can see that new color “RED” has created a new entry in SID table as 2 and therefore created a new entry in dimension table “2”.

And thus the fact table points to those values and transaction is stored This is how an Extended Star Schema works. 

Now let us focus on some interesting points of Extended Star Schema: 

1) By keeping the master data outside of Dimension tables, we are able to share the master data across the system there by reducing the need of redundancy of master data. 

2) We can have a maximum of 16 Dimension tables around Fact table. 

But with advent of HANA, do you think this design is relevant? Do we really need to maintain data? So is this most popular design is not going to be used anymore? 

With HANA we implement in-memory technique. Which means the system stores all the data “IN- MEMORY” like RAM storing all the data? When asks any question (reporting) to the system, then the system would take help of the views concept in HANA and gather the require data for answering the question.

Comments

Post a Comment