'Query to retrieve rows where a value has changed

trying to get back into some very rusty SQL skills and already over my head! I have a table (example below) where each week I upload rows that indicate the progression of sales opportunities. The opportunities have indexed IDs, while some of the dimensions, like "Stage" and "Total Cost", may change week over week. Each file load also has a dateLoaded column in Date format.

dateLoaded OppNumber Stage Total Cost
2022-04-25 12345 04 60.00
2022-04-25 23456 01 500.00
2022-04-25 34567 02 225.00
2022-04-25 45678 04 1750.00
2022-04-25 56789 06 50.00
2022-05-01 12345 04 100.00
2022-05-01 23456 01 500.00
2022-05-01 34567 02 275.00
2022-05-01 45678 04 2000.00
2022-05-01 56789 06 50.00
2022-05-07 12345 04 125.00
2022-05-07 23456 02 500.00
2022-05-07 34567 04 275.00
2022-05-07 56789 04 55.00

(Note that OppNumber 45678 has been removed in the 2022-05-07 file)

The query I'm trying to write would look at the 2 most recent file loads where "Total Cost" changes, and returns BOTH the original value and the new value. Using the table example from above, it would return (sorted by OppNumber, then dateLoaded ASC):

dateLoaded OppNumber Stage Total Cost
2022-05-01 12345 04 100.00
2022-05-07 12345 04 125.00
2022-05-01 56789 06 50.00
2022-05-07 56789 06 55.00

Ideally, I'd also love to see where a record existed 2 loads ago and not in the most recent (in this example, it would return the record for 45678 from the 2022-05-01 file).

I end up having to dump this data into Excel and creating formulas, but I am sure there are ways to write a query for this.

THANK YOU in advance for your suggestions!



Solution 1:[1]

If you have mysql 8 or above you can use window functions

example 1

where the first cte picks those where there is an oppnumber in the most recent load and dense_ranks by date and lags to get previous price after that it's pretty simple

WITH CTE AS 
(
SELECT * ,DENSE_RANK() OVER (ORDER BY DATELOADED) DR,
         LAG(TOTAL_COST) OVER (PARTITION BY OPPNUMBER ORDER BY DATELOADED) PREV
FROM T
WHERE EXISTS (SELECT OPPNUMBER FROM T T1 WHERE T1.OPPNUMBER = T.OPPNUMBER AND T1.DATELOADED = (sELECT MAX(DATELOADED) FROM T) )
ORDER BY OPPNUMBER,DATELOADED DESC
),
CTE1 AS
(SELECT MAX(DR) MAXDR FROM CTE
) 
SELECT CTE.*
FROM CTE 
JOIN CTE1 ON (CTE.DR = CTE1.MAXDR)
WHERE TOTAL_COST <> PREV
UNION ALL
SELECT CTE.*
FROM CTE 
JOIN CTE1 ON (CTE.DR = CTE1.MAXDR - 1)
JOIN (SELECT CTE.*
FROM CTE 
JOIN CTE1 ON (CTE.DR = CTE1.MAXDR)
WHERE TOTAL_COST <> PREV) S ON S.OPPNUMBER = CTE.OPPNUMBER
ORDER BY OPPNUMBER, DATELOADED ASC;

+------------+-----------+-------+------------+----+--------+
| dateLoaded | OppNumber | Stage | Total_Cost | DR | PREV   |
+------------+-----------+-------+------------+----+--------+
| 2022-05-01 |     12345 |     4 |     100.00 |  2 |  60.00 |
| 2022-05-07 |     12345 |     4 |     125.00 |  3 | 100.00 |
| 2022-05-01 |     56789 |     6 |      50.00 |  2 |  50.00 |
| 2022-05-07 |     56789 |     4 |      55.00 |  3 |  50.00 |
+------------+-----------+-------+------------+----+--------+
4 rows in set (0.028 sec)

Example 2

Similar to example 1 except specifically interested in most recent change rather than most recent load

WITH CTE AS 
(
SELECT * ,row_number() OVER (PARTITION BY OPPNUMBER ORDER BY DATELOADED) rn,
         LAG(TOTAL_COST) OVER (PARTITION BY OPPNUMBER ORDER BY DATELOADED) PREV
FROM T
ORDER BY OPPNUMBER,DATELOADED DESC
),
cte1 as
(
Select oppnumber opn, max(rn) maxrn from cte where total_cost <> prev group by oppnumber
)
select * from cte join cte1 on cte1.opn = cte.oppnumber and cte1.maxrn = cte.rn
union all
select * from cte join cte1 on cte1.opn = cte.oppnumber and cte1.maxrn - 1 = cte.rn
order by oppnumber , dateloaded desc

+------------+-----------+-------+------------+----+---------+-------+-------+
| dateLoaded | OppNumber | Stage | Total_Cost | rn | PREV    | opn   | maxrn |
+------------+-----------+-------+------------+----+---------+-------+-------+
| 2022-05-07 |     12345 |     4 |     125.00 |  3 |  100.00 | 12345 |     3 |
| 2022-05-01 |     12345 |     4 |     100.00 |  2 |   60.00 | 12345 |     3 |
| 2022-05-01 |     34567 |     2 |     275.00 |  2 |  225.00 | 34567 |     2 |
| 2022-04-25 |     34567 |     2 |     225.00 |  1 |    NULL | 34567 |     2 |
| 2022-05-01 |     45678 |     4 |    2000.00 |  2 | 1750.00 | 45678 |     2 |
| 2022-04-25 |     45678 |     4 |    1750.00 |  1 |    NULL | 45678 |     2 |
| 2022-05-07 |     56789 |     4 |      55.00 |  3 |   50.00 | 56789 |     3 |
| 2022-05-01 |     56789 |     6 |      50.00 |  2 |   50.00 | 56789 |     3 |
+------------+-----------+-------+------------+----+---------+-------+-------+
8 rows in set (0.004 sec)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 P.Salmon