'SQL: Pull rows based on sequence of values
I need to pull rows of data based on the existence of certain values that exist in a specific sequence.
Here's an example of the data:
Header | EventId | EventDate |
---|---|---|
67891882 | 382 | 2022-01-21 09:29:50.000 |
67891882 | 81 | 2022-01-21 09:03:23.000 |
67891882 | 273 | 2022-01-21 09:03:51.000 |
67891882 | 77 | 2022-01-21 09:05:58.000 |
67891882 | 2 | 2022-01-21 09:29:48.000 |
The results I need are to capture the Header and the EventDate for EventId=81. Further criteria include:
- EventID 81 is the "start" and EventID 77 is the "end"
- Any number of other events can exist between these two with the exception of (60, 72, 73, 74, 75, 76, 83, 85, 86, 87, 103, 154, 166, 197, 199)
So in the example above, Eventid 81 with EventDate 2022-01-21 09:03:23.000
would qualify as a row I want to pull as 273 is not in the exception list.
ATTEMPT: I have tried the following query
SELECT *
FROM #Table
WHERE EventDate BETWEEN (SELECT EventDate
FROM #Table
WHERE EventId = 81)
AND (SELECT eventdate
FROM #Table
WHERE EventId = 77)
AND EventId NOT IN (60, 72, 73, 74, 75, 76, 83, 85, 86, 87, 103, 154, 166, 197, 199)
ORDER BY 3
But I was immediately confronted with the fact that my sub-queries return more than one result, so this won't work (I was using this to test a singular Header # example, which worked fine). So now I'm not quite sure how to proceed. I'd hate to think that I'd be forced to use a CURSOR
, mostly because my source data is comprised of 266 million rows.
I had also previously tried using the LAG()
function to find my "starting point", but that possibility seemed to dissipate once the request started becoming more and more complex (with the addition of the exclusion list as well as the fact that there could be 1 or 40 rows in between the 81 and 77).
How should I proceed with this? Here's some example data to play with. The Header can be thought of as a parent key, associated with any number of EventID (representing a specific action) and the EventDate with when this occurred:
create table #data (header int, eventid int, eventdate datetime)
insert into #data
values
('62252595', '22', '5/23/2021 12:34:02 PM'),
('62252595', '81', '5/23/2021 12:34:03 PM'),
('62252595', '29', '5/23/2021 12:34:12 PM'),
('62252595', '40', '5/23/2021 12:34:27 PM'),
('62252595', '22', '5/23/2021 12:35:02 PM'),
('62252595', '22', '5/23/2021 12:36:12 PM'),
('62252595', '37', '5/23/2021 12:36:36 PM'),
('62252595', '77', '5/23/2021 12:37:04 PM'),
('62252595', '6', '5/23/2021 12:37:52 PM'),
('63252595', '39', '5/23/2021 12:38:01 PM'),
('63252595', '81', '5/23/2021 12:38:04 PM'),
('63252595', '37', '5/23/2021 12:38:06 PM'),
('63252595', '21', '5/23/2021 12:38:09 PM'),
('63252595', '75', '5/23/2021 12:38:10 PM'),
('63252595', '77', '5/23/2021 12:38:12 PM'),
('64252595', '29', '5/23/2021 12:38:15 PM'),
('64252595', '26', '5/23/2021 12:38:18 PM'),
('64252595', '81', '5/23/2021 12:38:20 PM'),
('64252595', '40', '5/23/2021 12:38:21 PM'),
('64252595', '81', '5/23/2021 12:38:24 PM'),
('64252595', '83', '5/23/2021 12:39:06 PM'),
('64252595', '77', '5/23/2021 12:39:07 PM'),
('65252595', '41', '5/23/2021 12:39:12 PM'),
('65252595', '81', '5/23/2021 12:39:16 PM'),
('65252595', '37', '5/23/2021 12:39:20 PM'),
('65252595', '18', '5/23/2021 12:39:56 PM'),
('65252595', '18', '5/23/2021 12:40:03 PM'),
('65252595', '77', '5/23/2021 12:40:15 PM'),
('65252595', '36', '5/23/2021 12:40:46 PM'),
('65252595', '77', '5/23/2021 12:40:53 PM')
EXPECTED RESULTS:
From this #Data
table, the results I would expect to see would be:
Header | EventId | EventDate |
---|---|---|
62252595 | 81 | 5/23/2021 12:34:03 PM |
65252595 | 81 | 5/23/2021 12:39:16 PM |
Header #'s 63252595 and 64252595 would not qualify because between the first instance of 81 and the first instance of 77 (partition by Header order by EventDate), there exists a 75 at 5/23/2021 12:38:10 PM
and an 83 at 5/23/2021 12:39:06 PM
respectively (both of which in exclusion list). I hope this clears up some confusion.
EDIT: After some thinking, I wonder if it would be possible to simplify this using a CASE
expression. Using the example data from the #Data
table above, I wrote this query:
select *
from (
select * from (
select *, id=case when EventId = 81 then 1
when EventId = 77 then 2
when EventId in (60, 72, 73, 74, 75, 76, 83, 85, 86, 87, 103, 154, 166, 197, 199) then 5 else 0 end
from #data) a
where id <> 0)b
order by 3
What this does is filters out all of the 'allowable' events and makes it so that I can filter to only see the unencumbered events where id
=1 and then follows with a 2. What I'm not sure of as of yet is how to get it to show me only entries of id
=1 with a following 2.
Solution 1:[1]
It would be handy to see the actual expected results for the sample data so I don't actually know if this is correct, it looks like you just need to calculate date ranges per each header:
with h as (
select *,
Min(case when eventid=81 then eventdate end) over(partition by header) Sdate,
Max(case when eventid=77 then eventdate end) over(partition by header) Edate
from #data
)
select header, eventId, EventDate
from h
where eventdate between SDate and EDate
and EventId not in(60, 72, 73, 74, 75, 76, 83, 85, 86, 87, 103, 154, 166, 197, 199)
order by eventdate;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Stu |