'SUM Values by Sequence Number and Group By Flag
We have a list with a sequence number. The sequence will break, then begin again.
As you can see below, the SalesOrderLine
is missing the number 4.
SalesOrder SalesOrderLine MStockCode MPrice MBomFlag
000000000182181 1 901337 0.00000 P
000000000182181 2 901335 2476.90000 C
000000000182181 3 340151 0.00000 C
000000000182181 5 900894 0.00000 P
000000000182181 6 400379 0.00000 C
000000000182181 7 900570 600.90000 C
What I'm looking to do is summarize the MPrice
field by a consecutive number sequence, then use the MBomFlag
field to pick the "parent".
This would be the expected result of the above data. Any grouping will be done on the MBomFlag
field if the value = P
SalesOrder SalesOrderLine MStockCode MPrice MBomFlag
000000000182181 1 901337 2476.90000 P
000000000182181 5 900894 600.90000 P
What would be the best way to attack this? I'm trying to figure out something using RANK()
, ROW_NUMBER()
, LEAD
, and LAG
but not having much success
Here is the source data:
CREATE TABLE #SalesOrder (
SalesOrder NVARCHAR(20),
SalesOrderLine INT,
MStockCode INT,
MPrice DECIMAL(18,2),
MBomFlag VARCHAR(1))
INSERT INTO #SalesOrder (SalesOrder, SalesOrderLine, MStockCode, MPrice, MBomFlag)
SELECT '000000000182181', 1, '901337', 0.00000, 'P'
UNION
SELECT '000000000182181', 2, '901335', 2476.90000, 'C'
UNION
SELECT '000000000182181', 3, '340151', 0.00000, 'C'
UNION
SELECT '000000000182181', 5, '900894', 0.00000, 'P'
UNION
SELECT '000000000182181', 6, '400379', 0.00000, 'C'
UNION
SELECT '000000000182181', 7, '900570', 2600.90000, 'C'
SELECT *
FROM #SalesOrder
DROP TABLE #SalesOrder
Solution 1:[1]
This is a classic gaps-and-islands problem.
However, in this case the start of each island is clearly delineated by a P
(or a row that is not C
). So we don't need LAG
for that.
We just need to assign a grouping ID for each island, which we can do using a windowed conditional COUNT
. Then we simply group by that ID.
SELECT
pv.SalesOrder,
SalesOrderLine = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.SalesOrderLine END),
MStockCode = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MStockCode END),
MPrice = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MPrice END),
MBomFlag = MIN(CASE WHEN pv.MBomFlag <> 'C' THEN pv.MBomFlag END)
FROM (
SELECT *,
GroupingId = COUNT(NULLIF(t.MBomFlag, 'C')) OVER (PARTITION BY t.SalesOrder ORDER BY t.SalesOrderLine ROWS UNBOUNDED PRECEDING)
FROM @tbl t
) pv
GROUP BY
pv.SalesOrder,
pv.GroupingId;
Note that NULLIF(t.MBomFlag, 'C')
returns null if the flag is C
, so COUNT
will only count the other rows. You could also write that explicitly using COUNT(CASE WHEN t.MBomFlag = 'P' THEN 1 END)
Solution 2:[2]
You can use lag()
to detect when there is a jump in the sequence of SalesOrderLine, and keep a running total of the number of times there is a jump. Rows with the same number of running total jumps belong to the same group.
with u as
(select *, lag(SalesOrderLine) over (order by SalesOrderLine) as
previousSOL
from #SalesOrder),
v as
(select *, sum(case when SalesOrderLine = PreviousSOL+ 1 then 0 else 1
end)
over (order by SalesOrderLine rows unbounded preceding) as jumps
from u
)
select min(case when MBomFlag = 'P' then SalesOrder end) as SalesOrder,
min(case when MBomFlag = 'P' then SalesOrderLine end) as SalesOrderLine,
min(case when MBomFlag = 'P' then MStockCode end) as MStockCode,
sum(MPrice) as Mprice,
'P' as MBomFlag
from v
group by jumps;
Solution 3:[3]
Please try the following solution.
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY(1,1) PRIMARY KEY,
SalesOrder NVARCHAR(20),
SalesOrderLine INT,
MStockCode INT,
MPrice DECIMAL(18,2),
MBomFlag VARCHAR(1))
INSERT INTO @tbl (SalesOrder, SalesOrderLine, MStockCode, MPrice, MBomFlag) VALUES
('000000000182181', 1, '901337', 0.00000, 'P'),
('000000000182181', 2, '901335', 2476.90000, 'C'),
('000000000182181', 3, '340151', 0.00000, 'C'),
('000000000182181', 5, '900894', 0.00000, 'P'),
('000000000182181', 6, '400379', 0.00000, 'C'),
('000000000182181', 7, '900570', 600.90000, 'C');
-- DDL and sample data population, end
WITH rs AS
(
SELECT series.*,
ns = SalesOrderLine - id
FROM @tbl AS series
), cte AS
(
SELECT *
, FIRST_VALUE(MStockCode) OVER(PARTITION BY ns ORDER BY id) AS _MStockCode
, FIRST_VALUE(MBomFlag) OVER(PARTITION BY ns ORDER BY id) AS _MBomFlag
FROM rs
)
SELECT MIN(cte.SalesOrder) AS SalesOrder
, MIN(cte.SalesOrderLine) AS SalesOrderLine
, MIN(cte._MStockCode) AS MStockCode
, SUM(cte.MPrice) AS MPrice
, MIN(cte._MBomFlag) AS MBomFlag
FROM cte
GROUP BY ns;
Output
+-----------------+----------------+------------+---------+----------+
| SalesOrder | SalesOrderLine | MStockCode | MPrice | MBomFlag |
+-----------------+----------------+------------+---------+----------+
| 000000000182181 | 1 | 901337 | 2476.90 | P |
| 000000000182181 | 5 | 900894 | 600.90 | P |
+-----------------+----------------+------------+---------+----------+
Solution 4:[4]
I assume this happens on a per SalesOrder
basis. This is standard gaps and islands. There's no need for recursive queries or temp tables:
with data as (
select *, SalesOrderLine - row_number()
over (partition by SalesOrder order by SalesOrderLine) as grp
from #SalesOrder
)
select
SalesOrderLine,
min(SalesOrderLine) as SalesOrderLine,
min(case when MBomFlag = 'P' then MStockCode end) as MStockCode,
min(case when MBomFlag = 'P' then MPrice end) as MPrice,
/* doesn't assume there will be a parent -- also retains column datatype */
min(case when MBomFlag = 'P' then MBomFlag end) as MBomFlag
from data
group by SalesOrder, grp
Solution 5:[5]
The data is ordered by SalesOrderLine. You create a new group each time when the difference between the current SalesOrderLine value and the next one is not 1; in each group, find the record where MbomFlag is P and change its Mprice value to the sum of Mprice values in the current group. To implement this in SQL, you need to first create a marker column using the window function, group rows by this marker column, and get the final result using case statement. A common alternative is to fetch the original data out of the database and process it in Python or SPL. SPL, the open-source Java package, is easy to be integrated into a Java program and generates much simpler code. It handles the computation with only three lines of code: img
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Charlieface |
Solution 2 | |
Solution 3 | Yitzhak Khabinsky |
Solution 4 | |
Solution 5 |