A "fact" is a numeric value that a business wishes to count or sum.
A "dimension" is essentially an entry point for getting at the facts. Dimensions are things of interest to the business.
A set of level properties that describe a specific aspect of a business, used for analyzing the factual measures.
What
is Fact Table?
A
Fact Table in a dimensional model consists of one or more numeric facts of
importance to a business. Examples of facts are as follows:
·
the
number of products sold
·
the
value of products sold
·
the
number of products produced
the number of service
calls received
What
is Factless Fact Table?
Factless
fact table captures the many-to-many relationships between dimensions, but
contains no numeric or textual facts. They are often used to record events or
coverage information.
Common
examples of factless fact tables include:
- Identifying
product promotion events (to determine promoted products that didn't sell)
- Tracking student
attendance or registration events
- Tracking
insurance-related accident events
Types
of facts?
- Additive: Additive facts
are facts that can be summed up through all of the dimensions in the fact
table.
- Semi-Additive: Semi-additive
facts are facts that can be summed up for some of the dimensions in the
fact table, but not the others.
- Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
- Cumulative: This type of
fact table describes what has happened over a period of time. For example,
this fact table may describe the total sales by product by store by day.
The facts for this type of fact tables are mostly additive facts. The
first example presented here is a cumulative fact table.
- Snapshot: This type of
fact table describes the state of things in a particular instance of time,
and usually includes more semi-additive and non-additive facts. The second
example presented here is a snapshot fact table.
Fact Table Example:
Time ID | Product ID | Customer ID | Unit Sold |
1 | 17 | 2 | 1 |
3 | 21 | 3 | 2 |
1 | 4 | 1 | 1 |
Dimension tables contain details
about each instance of an object. For example, the items dimension table would
contain a record for each item sold in the store. It might include information
such as the cost of the item, the supplier, color, sizes, and similar data.
Types
of Dimensions?
- SCD(Slowly Changing Dimension)
- Conformed Dimension
- Junk Dimension/Dirty Dimension
- De-Generated Dimension
- Bridge Dimension
1. What
is Conformed Dimension?
Conformed
Dimensions (CD): these dimensions are something that is built once in your
model and can be reused multiple times with different fact tables. For example, consider a model containing
multiple fact tables, representing different data marts. Now look for a dimension that is common to
these facts tables. In this example
let’s consider that the product dimension is common and hence can be reused by
creating short cuts and joining the different fact tables.Some of the examples
are time dimension, customer dimensions, product dimension.
2.
What
is Junk Dimension?
A
"junk" dimension is a collection of random transactional codes, flags
and/or text attributes that are unrelated to any particular dimension. The junk
dimension is simply a structure that provides a convenient place to store the
junk attributes. A good example would be a trade fact in a company that brokers
equity trades.
When
you consolidate lots of small dimensions and instead of having 100s of small
dimensions, that will have few records in them, cluttering your database with
these mini ‘identifier’ tables, all records from all these small dimension
tables are loaded into ONE dimension table and we call this dimension table
Junk dimension table. (Since we are
storing all the junk in this one table) For example: a company might have
handful of manufacture plants, handful of order types, and so on, so forth, and
we can consolidate them in one dimension table called junked dimension table
It’s
a dimension table which is used to keep junk attributes
3.
What
is De Generated Dimension?
An
item that is in the fact table but is stripped off of its description, because
the description belongs in dimension table, is referred to as Degenerated
Dimension. Since it looks like
dimension, but is really in fact table and has been degenerated of its
description, hence is called degenerated dimension..
Degenerated
Dimension:
a dimension which is located in fact table known as Degenerated dimension
4.
What
is slowly Changing Dimension?
Slowly changing dimensions refers to the
change in dimensional attributes over time.
An
example of slowly changing dimension is a Resource dimension where attributes
of a particular employee change over
time like their designation changes or dept changes etc.
Types of SCD
Implementation:
Type 1 Slowly Changing Dimension
In Type 1 Slowly Changing Dimension, the new information
simply overwrites the original information. In other words, no history is kept.
In our example, recall we originally have the following
table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
After Christina moved from Illinois to California, the
new information replaces the new record, and we have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
California
|
Advantages:
- This is the easiest way to handle the Slowly Changing
Dimension problem, since there is no need to keep track of the old information.
Disadvantages:
-
All history is lost. By applying this methodology, it is
not possible to trace back in history. For example, in this case, the company
would not be able to know that Christina lived in Illinois before.
-
Usage:
About 50% of the time.
When to use Type 1:
Type 1 slowly changing dimension should be used when it
is not necessary for the data warehouse to keep track of historical changes.
Type 2 Slowly Changing Dimension
In Type 2 Slowly Changing Dimension, a new record is
added to the table to represent the new information. Therefore, both the
original and the new record will be present. The newe record gets its own
primary key.
In our example, recall we originally have the following
table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
After Christina moved from Illinois to California, we add
the new information as a new row into the table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
1005
|
Christina
|
California
|
Advantages:
- This allows us to accurately keep all historical
information.
Disadvantages:
- This will cause the size of the table to grow fast. In
cases where the number of rows for the table is very high to start with,
storage and performance can become a concern.
- This necessarily complicates the ETL process.
Usage:
About 50% of the time.
When to use Type 2:
Type 2 slowly changing dimension should be used when it
is necessary for the data warehouse to track historical changes.
Type 3 Slowly Changing Dimension
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.
In our example, recall we originally have the following
table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
To accommodate Type 3 Slowly Changing Dimension, we will
now have the following columns:
- Customer Key
- Name
- Original State
- Current State
- Effective Date
After Christina moved from Illinois to California, the
original information gets updated, and we have the following table (assuming
the effective date of change is January 15, 2003):
Customer Key
|
Name
|
Original State
|
Current State
|
Effective Date
|
1001
|
Christina
|
Illinois
|
California
|
15-JAN-2003
|
Advantages:
- This does not increase the size of the table, since new
information is updated.
- This allows us to keep some part of history.
Disadvantages:
- Type 3 will not be able to keep all history where an
attribute is changed more than once. For example, if Christina later moves to
Texas on December 15, 2003, the California information will be lost.
Usage:
Type 3 is rarely used in actual practice.
When to use Type 3:
Type III slowly changing dimension should only be used
when it is necessary for the data warehouse to track historical changes, and
when such changes will only occur for a finite number of time.
Dimension Table Examples:
Customer Dimension
Customer ID
|
Name
|
Gender
|
Income
|
Education
|
Region
|
1 | Brian Edge | M | 2 | 3 | 4 |
2 | Fred Smith | M | 3 | 5 | 1 |
3 | Sally Jones | F | 1 | 7 | 3 |
Date Dimension
Time ID
|
DateKey
|
Date_UK
|
Date_USA
|
DayofMonth
|
DayName
|
1 | 20130101 | 01/01/2013 | 01/01/2013 | 1 | Tuesday |
2 | 20130102 | 02/01/2013 | 02/01/2013 | 2 | Wednesday |
3 | 20130103 | 03/01/2013 | 03/01/2013 | 3 | Thursday |
Product Dimension
Product ID
|
Product Name
|
Product Business Key
|
Batch ID
|
Category
|
Group
|
1 | Aero Milk | AC-3AA | 1 | Dairy | Dairy |
2 | Bikky Rice | BK-B34 | 2 | Food | Cereals |
3 | Bikky Bics | BZ-CG5 | 2 | Biscuits | Cookies |
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.