'Collapse multiple rows into a single row based upon a break condition

I have a simple sounding requirement that has had me stumped for a day or so now, so its time to seek help from the experts.

My requirement is to simply roll-up multiple rows into a single row based upon a break condition - when any of these columns change Employee ID, Allowance Plan, Allowance Amount or To Date, then the row is to be kept, if that makes sense.

An example source data set is shown below:

enter image description here

and the target data after collapsing the rows should look like this:

enter image description here

As you can see I don't need any type of running totals calculating I just need to collapse the rows into a single record per from date/to date combination.

So far I have tried the following SQL using a GROUP BY and MIN function

select [Employee ID], [Allowance Plan], 
       min([From Date]), max([To Date]), [Allowance Amount] 
from   [dbo].[#AllowInfo] 
group by [Employee ID], [Allowance Plan], [Allowance Amount]

but that just gives me a single row and does not take into account the break condition.

what do I need to do so that the records are rolled-up (correct me if that is not the right terminology) correctly taking into account the break condition?

Any help is appreciated.

Thank you.



Solution 1:[1]

Note that your test data does not really exercise the algo that well - e.g. you only have one employee, one plan. Also, as you described it, you would end up with 4 rows as there is a change of todate between 7->8, 8->9, 9->10 and 10->11.

But I can see what you are trying to do, so this should at least get you on the right track, and returns the expected 3 rows. I have taken the end of a group to be where either employee/plan/amount has changed, or where todate is not null (or where we reach the end of the data)

CREATE TABLE #data
(
    RowID INT,
    EmployeeID INT,
    AllowancePlan VARCHAR(30),
    FromDate DATE, 
    ToDate DATE,
    AllowanceAmount DECIMAL(12,2)
);

INSERT INTO #data(RowID, EmployeeID, AllowancePlan, FromDate, ToDate, AllowanceAmount)
VALUES
(1,200690,'CarAllowance','30/03/2017', NULL, 1000.0),
(2,200690,'CarAllowance','01/08/2017', NULL, 1000.0),
(6,200690,'CarAllowance','23/04/2018', NULL, 1000.0),
(7,200690,'CarAllowance','30/03/2018', NULL, 1000.0),
(8,200690,'CarAllowance','21/06/2018', '01/04/2019', 1000.0),
(9,200690,'CarAllowance','04/11/2021', NULL, 1000.0),
(10,200690,'CarAllowance','30/03/2017', '13/05/2022', 1000.0),
(11,200690,'CarAllowance','14/05/2022', NULL, 850.0);

-- find where the break points are
WITH chg AS 
(
    SELECT *, 
        CASE WHEN LAG(EmployeeID, 1, -1) OVER(ORDER BY RowID) != EmployeeID
               OR LAG(AllowancePlan, 1, 'X') OVER(ORDER BY RowID) != AllowancePlan  
               OR LAG(AllowanceAmount, 1, -1) OVER(ORDER BY RowID) != AllowanceAmount
               OR LAG(ToDate, 1) OVER(ORDER BY RowID) IS NOT NULL
    THEN 1 ELSE 0 END AS NewGroup
    FROM #data   
), 
-- count the number of break points as we go to group the related rows
grp AS
(
    SELECT chg.*,
        ISNULL(
            SUM(NewGroup) 
                OVER (ORDER BY RowID 
                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
              0) AS grpNum
    FROM chg
) 
SELECT  MIN(grp.RowID) AS RowID, 
        MAX(grp.EmployeeID) AS EmployeeID,      
        MAX(grp.AllowancePlan) AS AllowancePlan,
        MIN(grp.FromDate) AS FromDate, 
        MAX(grp.ToDate) AS ToDate,  
        MAX(grp.AllowanceAmount) AS AllowanceAmount
FROM grp
GROUP BY grpNum

Solution 2:[2]

one way is to get all rows the last todate, and then group on that

select min(t.RowID) as RowID,
       t.EmployeeID,
       min(t.AllowancePlan) as AllowancePlan,
       min(t.FromDate) as FromDate,
       max(t.ToDate) as ToDate,
       min(t.AllowanceAmount) as AllowanceAmount
from   ( select t.RowID,
                t.EmployeeID,
                t.FromDate,
                t.AllowancePlan,
                t.AllowanceAmount,
                case when t.ToDate is null then ( select top 1 t2.ToDate 
                                                  from   test t2
                                                  where  t2.EmployeeID = t.EmployeeID
                                                  and    t2.ToDate is not null
                                                  and    t2.FromDate > t.FromDate -- t2.RowID > t.RowID
                                                  order by t2.RowID, t2.FromDate 
                                                )
                     else t.ToDate
                end as todate     
         from   test t
       ) t

group by t.EmployeeID, t.ToDate
order by t.EmployeeID, min(t.RowID)

See and test yourself in this DBFiddle

the result is

RowID EmployeeID AllowancePlan FromDate ToDate AllowanceAmount
1 200690 CarAllowance 2017-03-30 2019-04-01 1000
9 200690 CarAllowance 2021-11-04 2022-05-13 1000
11 200690 CarAllowance 2022-05-14 (null) 850

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 James Casey
Solution 2