August 14, 2012

SQL Traps -- Data Modeling Issue

 Chasm Trap :
                       This also known as Joining path issue or Data modeling issue.A chasm trap is a type of join path between three tables when two many-to-one joins converge on a single table, and there is no context in place that separates the converging join paths.

You only get incorrect results when the following circumstances all exist simultaneously:
1. There is a “many-to-one-to-many” relationship between three tables in the universe structure.
2. The query includes objects based on the two “many” tables.
3. There are multiple rows returned for an object (usually a dimension) based on the "one"
table.

Ex:

Client_Name
Sale_Date
Sale_Revenue
XYZ
4/12/2012
30000
XYZ
3/11/2012
34000



Client_Name
Rental_Date
Rental_Revenue
XYZ
8/10/2011
300
XYZ
10/10/2011
400


Query 1: What is the sale amount?

Client_Name
Sale_Revenue
XYZ
64000

Correct results

Query 2: What is the sale Rental Amount?

Client_Name
Rental_Revenue
XYZ
700

Correct Results

Query 3: What is the sale amount and Rental Amount

Client_Name
Rental_Revenue
Sale_Revenue
XYZ
1400
128000


This is a wrong result, which is called as issue due to chasm trap.

What is the reason behind wrong results?

                The first two queries return the correct data, but combining Sales Revenue and Rental Revenue in the third query returns inaccurate results. The query returns every possible combination of sale rows with every possible combinationof rental rows. Hence, the sale transactions each appear twice as do the rental transactions, and as a result of this the aggregates have been multiplied by the number of related rows on the alternative "many" table.


Client_Name
Sale_Date
Sale_Revenue
Rental_Date
Rental_Revenue
XYZ
4/12/2012
30000
8/10/2011
300
XYZ
4/12/2012
30000
10/10/2011
400
XYZ
3/11/2012
34000
8/10/2011
300
XYZ
3/11/2012
34000
10/10/2011
400
Sum:
128000
1400

 Solutions:
   
1.       Modify the SQL parameters for the universe so you can generate separate SQL queries for
Each measure. (File > Parameters > SQL tab )
This method is not recommended as it only works with measures and results in certain
Inefficiencies in processing. It does not generate separate queries for dimension or detail
Objects.
2.       Create a context for each fact table.
This solution works in all cases and does not result in inefficiencies.
 

FAN Trap :

This also known as Joining path issue or Data modeling issue. A Fan trap occurs when below conditions are satisfied
 
  Case 1:

1.       Table1 --< table 2 --< table 3 i.e. one to many and again one to many 
2.       A dimension coming from the first table and measures coming from the two subsequent tables.

Or

  Case 2:
  
1.       Table1 --< table 2 i.e. Two tables in a one-to-many relationship.
2.       A dimension and a measure coming from the first table and a measure coming from the subsequent table(s).

Ex:

CLINET
Client ID
Client Name
1
XYZ

PURCHASE_ORDER
PO_Number
Client_ID
Total_Amount
Discount
Net_Amount
1
1
4000
300
3700

ORDER_DEATILS
OrderNumber
Item_id
Amount
1
100
3000
1
200
1000

In this case report designer wants pull the report for below columns for client XYZ
    Client_Name – Clinet.Clinet_name
    Net_Amount – Order.Net_Amount
    Amount – Order_Details.Amount

If I pull the report for this, report will look like this.

Client_Name
Net_Amount
Amount
XYZ
7400
4000

If you see Net_Amount is not coming correctly, the value should come as 3700 not 7400.

To understand what is happening, you need to look at the rows that are returned. Since two
Different item ID numbers are involved for one order, there are two rows returned. The amount 3700 is aggregated twice in this case.

Client_Name
Item_id
Net_Amount
Amount
XYZ
100
3700
3000
XYZ
200
3700
1000

   Where you have a one-many-many relationship for tables in the FROM clause the resulting logical table produces a Cartesian product. Only then is aggregation applied. This is the reason for the fan effect.

Solutions:
1.       Modify the SQL parameters for the universe so you can generate separate SQL queries for
Each measure. (File > Parameters > SQL tab)
This method is not recommended as it only works with measures and results in certain
Inefficiencies in processing. It does not generate separate queries for dimension or detail
Objects.
2.        Use a combination of aliases and contexts.

3.        Avoid the fan trap scenario.

No comments:

Post a Comment