'What should happen to a SCD 2 table record if the record becomes inactive?
Let's say we have a table that has these values:
id| name| country| start_dt| end_dt| is_current| current_dt
1001| John| CA| 2001-01-10| 2012-06-01| TRUE| 2012-06-01
Next day, the same record comes from upstream and the SCD2 record modifies to:
id| name| country| start_dt| end_dt| is_current| partition_dt
1001| John| CA| 2001-01-10| 2012-06-01| FALSE| 2012-06-02
1001| John| US| 2012-06-01| 2016-06-02| TRUE| 2012-06-02
But on the third day, this record stops coming from upstream table.
In this case, what should happen in the SCD2 table?
Scenario 1: The record remains as it is.
id| name| country| start_dt| end_dt| is_current| partition_dt
1001| John| CA| 2001-01-10| 2012-06-01| FALSE| 2012-06-03
1001| John| US| 2012-06-01| 9999-12-31| TRUE| 2012-06-03
Scenario 2: The record becomes inactive in the SCD2 table
id| name| country| start_dt| end_dt| is_current| partition_dt
1001| John| CA| 2001-01-10| 2012-06-01| FALSE| 2012-06-03
1001| John| US| 2012-06-01| 2012-06-03| FALSE| 2012-06-03
Solution 1:[1]
You leave the last version as IS_CURRENT=TRUE.
The last-known-good version of an SCD row is sometimes used instead of the time-sensitive version. So you can write
select d.Name, sum(f.someval) totalVal
from somefact f
join somedim d
on f.dimid = d.id
and d.is_current = 'TRUE'
for joins to the dimension that treat it as a non-SCD.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | David Browne - Microsoft |