'Query to retrieve rows where a value has changed
trying to get back into some very rusty SQL skills and already over my head! I have a table (example below) where each week I upload rows that indicate the progression of sales opportunities. The opportunities have indexed IDs, while some of the dimensions, like "Stage" and "Total Cost", may change week over week. Each file load also has a dateLoaded column in Date format.
dateLoaded | OppNumber | Stage | Total Cost |
---|---|---|---|
2022-04-25 | 12345 | 04 | 60.00 |
2022-04-25 | 23456 | 01 | 500.00 |
2022-04-25 | 34567 | 02 | 225.00 |
2022-04-25 | 45678 | 04 | 1750.00 |
2022-04-25 | 56789 | 06 | 50.00 |
2022-05-01 | 12345 | 04 | 100.00 |
2022-05-01 | 23456 | 01 | 500.00 |
2022-05-01 | 34567 | 02 | 275.00 |
2022-05-01 | 45678 | 04 | 2000.00 |
2022-05-01 | 56789 | 06 | 50.00 |
2022-05-07 | 12345 | 04 | 125.00 |
2022-05-07 | 23456 | 02 | 500.00 |
2022-05-07 | 34567 | 04 | 275.00 |
2022-05-07 | 56789 | 04 | 55.00 |
(Note that OppNumber 45678 has been removed in the 2022-05-07 file)
The query I'm trying to write would look at the 2 most recent file loads where "Total Cost" changes, and returns BOTH the original value and the new value. Using the table example from above, it would return (sorted by OppNumber, then dateLoaded ASC):
dateLoaded | OppNumber | Stage | Total Cost |
---|---|---|---|
2022-05-01 | 12345 | 04 | 100.00 |
2022-05-07 | 12345 | 04 | 125.00 |
2022-05-01 | 56789 | 06 | 50.00 |
2022-05-07 | 56789 | 06 | 55.00 |
Ideally, I'd also love to see where a record existed 2 loads ago and not in the most recent (in this example, it would return the record for 45678 from the 2022-05-01 file).
I end up having to dump this data into Excel and creating formulas, but I am sure there are ways to write a query for this.
THANK YOU in advance for your suggestions!
Solution 1:[1]
If you have mysql 8 or above you can use window functions
example 1
where the first cte picks those where there is an oppnumber in the most recent load and dense_ranks by date and lags to get previous price after that it's pretty simple
WITH CTE AS
(
SELECT * ,DENSE_RANK() OVER (ORDER BY DATELOADED) DR,
LAG(TOTAL_COST) OVER (PARTITION BY OPPNUMBER ORDER BY DATELOADED) PREV
FROM T
WHERE EXISTS (SELECT OPPNUMBER FROM T T1 WHERE T1.OPPNUMBER = T.OPPNUMBER AND T1.DATELOADED = (sELECT MAX(DATELOADED) FROM T) )
ORDER BY OPPNUMBER,DATELOADED DESC
),
CTE1 AS
(SELECT MAX(DR) MAXDR FROM CTE
)
SELECT CTE.*
FROM CTE
JOIN CTE1 ON (CTE.DR = CTE1.MAXDR)
WHERE TOTAL_COST <> PREV
UNION ALL
SELECT CTE.*
FROM CTE
JOIN CTE1 ON (CTE.DR = CTE1.MAXDR - 1)
JOIN (SELECT CTE.*
FROM CTE
JOIN CTE1 ON (CTE.DR = CTE1.MAXDR)
WHERE TOTAL_COST <> PREV) S ON S.OPPNUMBER = CTE.OPPNUMBER
ORDER BY OPPNUMBER, DATELOADED ASC;
+------------+-----------+-------+------------+----+--------+
| dateLoaded | OppNumber | Stage | Total_Cost | DR | PREV |
+------------+-----------+-------+------------+----+--------+
| 2022-05-01 | 12345 | 4 | 100.00 | 2 | 60.00 |
| 2022-05-07 | 12345 | 4 | 125.00 | 3 | 100.00 |
| 2022-05-01 | 56789 | 6 | 50.00 | 2 | 50.00 |
| 2022-05-07 | 56789 | 4 | 55.00 | 3 | 50.00 |
+------------+-----------+-------+------------+----+--------+
4 rows in set (0.028 sec)
Example 2
Similar to example 1 except specifically interested in most recent change rather than most recent load
WITH CTE AS
(
SELECT * ,row_number() OVER (PARTITION BY OPPNUMBER ORDER BY DATELOADED) rn,
LAG(TOTAL_COST) OVER (PARTITION BY OPPNUMBER ORDER BY DATELOADED) PREV
FROM T
ORDER BY OPPNUMBER,DATELOADED DESC
),
cte1 as
(
Select oppnumber opn, max(rn) maxrn from cte where total_cost <> prev group by oppnumber
)
select * from cte join cte1 on cte1.opn = cte.oppnumber and cte1.maxrn = cte.rn
union all
select * from cte join cte1 on cte1.opn = cte.oppnumber and cte1.maxrn - 1 = cte.rn
order by oppnumber , dateloaded desc
+------------+-----------+-------+------------+----+---------+-------+-------+
| dateLoaded | OppNumber | Stage | Total_Cost | rn | PREV | opn | maxrn |
+------------+-----------+-------+------------+----+---------+-------+-------+
| 2022-05-07 | 12345 | 4 | 125.00 | 3 | 100.00 | 12345 | 3 |
| 2022-05-01 | 12345 | 4 | 100.00 | 2 | 60.00 | 12345 | 3 |
| 2022-05-01 | 34567 | 2 | 275.00 | 2 | 225.00 | 34567 | 2 |
| 2022-04-25 | 34567 | 2 | 225.00 | 1 | NULL | 34567 | 2 |
| 2022-05-01 | 45678 | 4 | 2000.00 | 2 | 1750.00 | 45678 | 2 |
| 2022-04-25 | 45678 | 4 | 1750.00 | 1 | NULL | 45678 | 2 |
| 2022-05-07 | 56789 | 4 | 55.00 | 3 | 50.00 | 56789 | 3 |
| 2022-05-01 | 56789 | 6 | 50.00 | 2 | 50.00 | 56789 | 3 |
+------------+-----------+-------+------------+----+---------+-------+-------+
8 rows in set (0.004 sec)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | P.Salmon |