'SUM Values by Sequence Number and Group By Flag

We have a list with a sequence number. The sequence will break, then begin again.

As you can see below, the SalesOrderLine is missing the number 4.

SalesOrder      SalesOrderLine  MStockCode  MPrice      MBomFlag
000000000182181 1               901337      0.00000     P
000000000182181 2               901335      2476.90000  C
000000000182181 3               340151      0.00000     C
000000000182181 5               900894      0.00000     P
000000000182181 6               400379      0.00000     C
000000000182181 7               900570      600.90000   C

What I'm looking to do is summarize the MPrice field by a consecutive number sequence, then use the MBomFlag field to pick the "parent".

This would be the expected result of the above data. Any grouping will be done on the MBomFlag field if the value = P

SalesOrder      SalesOrderLine  MStockCode  MPrice      MBomFlag
000000000182181 1               901337      2476.90000  P
000000000182181 5               900894      600.90000   P

What would be the best way to attack this? I'm trying to figure out something using RANK(), ROW_NUMBER(), LEAD, and LAG but not having much success

Here is the source data:

CREATE TABLE #SalesOrder (
SalesOrder NVARCHAR(20),
SalesOrderLine INT,
MStockCode INT,
MPrice DECIMAL(18,2),
MBomFlag VARCHAR(1))

INSERT INTO #SalesOrder (SalesOrder, SalesOrderLine, MStockCode, MPrice, MBomFlag)
SELECT '000000000182181',   1,  '901337',   0.00000, 'P'
UNION
SELECT '000000000182181',   2,  '901335', 2476.90000, 'C'
UNION
SELECT '000000000182181',   3,  '340151',   0.00000,  'C'
UNION
SELECT '000000000182181',   5,  '900894',   0.00000,  'P'
UNION
SELECT '000000000182181',   6,  '400379',   0.00000,  'C'
UNION
SELECT '000000000182181',   7,  '900570',   2600.90000, 'C'

SELECT *
  FROM #SalesOrder

DROP TABLE #SalesOrder


Solution 1:[1]

This is a classic gaps-and-islands problem.

However, in this case the start of each island is clearly delineated by a P (or a row that is not C). So we don't need LAG for that.

We just need to assign a grouping ID for each island, which we can do using a windowed conditional COUNT. Then we simply group by that ID.

SELECT
  pv.SalesOrder,
  SalesOrderLine = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.SalesOrderLine END),
  MStockCode = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MStockCode END),
  MPrice = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MPrice END),
  MBomFlag = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MBomFlag END)
FROM (
    SELECT *,
      GroupingId = COUNT(NULLIF(t.MBomFlag, 'C')) OVER (PARTITION BY t.SalesOrder ORDER BY t.SalesOrderLine ROWS UNBOUNDED PRECEDING)
    FROM @tbl t
) pv
GROUP BY
  pv.SalesOrder,
  pv.GroupingId;

Note that NULLIF(t.MBomFlag, 'C') returns null if the flag is C, so COUNT will only count the other rows. You could also write that explicitly using COUNT(CASE WHEN t.MBomFlag = 'P' THEN 1 END)

db<>fiddle

Solution 2:[2]

You can use lag() to detect when there is a jump in the sequence of SalesOrderLine, and keep a running total of the number of times there is a jump. Rows with the same number of running total jumps belong to the same group.

with u as
(select *, lag(SalesOrderLine) over (order by SalesOrderLine) as 
previousSOL
from #SalesOrder),
v as 
(select *, sum(case when SalesOrderLine = PreviousSOL+ 1 then 0 else 1 
end)
over (order by SalesOrderLine rows unbounded preceding) as jumps
from u
)
select min(case when MBomFlag = 'P' then SalesOrder end) as SalesOrder,
min(case when MBomFlag = 'P' then SalesOrderLine end) as SalesOrderLine,
min(case when MBomFlag = 'P' then MStockCode end) as MStockCode,
sum(MPrice) as Mprice,
'P' as MBomFlag
from v
group by jumps;

Fiddle

Solution 3:[3]

Please try the following solution.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE  (ID INT IDENTITY(1,1) PRIMARY KEY,
SalesOrder NVARCHAR(20),
SalesOrderLine INT,
MStockCode INT,
MPrice DECIMAL(18,2),
MBomFlag VARCHAR(1))

INSERT INTO @tbl (SalesOrder, SalesOrderLine, MStockCode, MPrice, MBomFlag) VALUES
('000000000182181', 1, '901337',   0.00000, 'P'),
('000000000182181', 2, '901335', 2476.90000, 'C'),
('000000000182181', 3, '340151',   0.00000,  'C'),
('000000000182181', 5, '900894',   0.00000,  'P'),
('000000000182181', 6, '400379',   0.00000,  'C'),
('000000000182181', 7, '900570',   600.90000, 'C');
-- DDL and sample data population, end

WITH rs AS
(
    SELECT series.*,
           ns = SalesOrderLine - id
    FROM @tbl AS series
), cte AS
(
    SELECT * 
        , FIRST_VALUE(MStockCode) OVER(PARTITION BY ns ORDER BY id) AS _MStockCode
        , FIRST_VALUE(MBomFlag) OVER(PARTITION BY ns ORDER BY id) AS _MBomFlag
    FROM rs
)
SELECT MIN(cte.SalesOrder) AS SalesOrder
    , MIN(cte.SalesOrderLine) AS SalesOrderLine
    , MIN(cte._MStockCode) AS MStockCode
    , SUM(cte.MPrice) AS MPrice
    , MIN(cte._MBomFlag) AS MBomFlag
FROM cte
GROUP BY ns;

Output

+-----------------+----------------+------------+---------+----------+
|   SalesOrder    | SalesOrderLine | MStockCode | MPrice  | MBomFlag |
+-----------------+----------------+------------+---------+----------+
| 000000000182181 |              1 |     901337 | 2476.90 | P        |
| 000000000182181 |              5 |     900894 |  600.90 | P        |
+-----------------+----------------+------------+---------+----------+

Solution 4:[4]

I assume this happens on a per SalesOrder basis. This is standard gaps and islands. There's no need for recursive queries or temp tables:

with data as (
    select *, SalesOrderLine - row_number()
        over (partition by SalesOrder order by SalesOrderLine) as grp
    from #SalesOrder
)
select
    SalesOrderLine,
    min(SalesOrderLine) as SalesOrderLine,
    min(case when MBomFlag = 'P' then MStockCode end) as MStockCode,
    min(case when MBomFlag = 'P' then MPrice end) as MPrice,
    /* doesn't assume there will be a parent -- also retains column datatype */
    min(case when MBomFlag = 'P' then MBomFlag end) as MBomFlag
from data
group by SalesOrder, grp

Solution 5:[5]

The data is ordered by SalesOrderLine. You create a new group each time when the difference between the current SalesOrderLine value and the next one is not 1; in each group, find the record where MbomFlag is P and change its Mprice value to the sum of Mprice values in the current group. To implement this in SQL, you need to first create a marker column using the window function, group rows by this marker column, and get the final result using case statement. A common alternative is to fetch the original data out of the database and process it in Python or SPL. SPL, the open-source Java package, is easy to be integrated into a Java program and generates much simpler code. It handles the computation with only three lines of code: img

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Charlieface
Solution 2
Solution 3 Yitzhak Khabinsky
Solution 4
Solution 5