'sqlite: select display few rows around matching rows
I have a table of the form
select rowid,* from t;
rowid f
----- -----
1 aaa
2 bbb
3 ccc
4 ddd
5 eee
6 fff
7 ggg
8 aaa
9 bbb
10 ccc
11 ddd
12 eee
13 fff
14 ggg
Id like to select n row before and m row after a given row match, i.e for instance for rows that match f='ccc' with n=m=1 I'd like to get
2 bbb
3 ccc
4 ddd
9 bbb
10 ccc
11 ddd
The rowid is sequential in my setup so I guess we can play with it. I tried thing along the line of
select rowid,f from t where rowid between
(select rowid-1 from t where f='ccc') and
(select rowid+1 from t where f='ccc');
rowid f
----- -----
2 bbb
3 ccc
4 ddd
But the result is obviously wrong I got only the 1st occurence of the 'ccc' match. I guess I got to join or may be recursive cte, but I am affraid it is beyound my knowlegde so far :) Thanx in advance.
Solution 1:[1]
A scalar subquery can return only a single value.
You could do two self joins, but it would be simpler to use set operations:
SELECT * FROM t
WHERE rowid IN (SELECT rowid - 1 FROM t WHERE f = 'ccc'
UNION ALL
SELECT rowid FROM t WHERE f = 'ccc'
UNION ALL
SELECT rowid + 1 FROM t WHERE f = 'ccc');
Larger values of n
and m
require more subqueries.
If there are too many, you can use a join:
SELECT *
FROM t
WHERE rowid IN (SELECT t.rowid
FROM t
JOIN (SELECT rowid - ? AS n,
rowid + ? AS m
FROM t
WHERE f = 'ccc'
) AS ranges
ON t.rowid BETWEEN ranges.n AND ranges.m);
Solution 2:[2]
I come with a solution that is not optimal I think but I am not able to simplify (remove) the temp (intermediate) table.
select rowid,f from t;
rowid f
----- -----
1 aaa
2 bbb
3 ccc
4 ddd
5 eee
6 fff
7 ggg
8 aaa
9 bbb
10 ccc
11 ddd
12 eee
13 fff
14 ggg
create table u as
select t2.rowid x,t1.rowid+2 y from t t1 // +2 ==> 2 rows after 'ccc'
join t t2 on t1.rowid=t2.rowid+1 // +1 ==> 1 row before 'ccc'
where t1.f='ccc';
select * from u;
x y
----- -----
2 5
9 12
select t.rowid,t.f from t inner
join u on t.rowid>=u.x and t.rowid<=u.y'
rowid f
----- -----
2 bbb 1 before
3 ccc <== match
4 ddd 2 after
5 eee
9 bbb 1 before
10 ccc <== match
11 ddd 2 after
12 eee
I think I am set with what I need, but optimisations are welome :)
Solution 3:[3]
I might be overlooking something, but the provided solutions that suggest adding/subtracting from values of the rowid
column could be improved upon. They will face issues should rowid
ever be missing a value (Which I'm aware was stated to never be the case in the top post, but in general is an assumption that's often not true).
By using sqlite's row_number()
you can have a solution that circumvents that problem and also can be used to fetch the entries "around" your row matches based on any arbitrary order you want, not just based on rowid.
Together with Common Table Expressions this can even be made somewhat readable, though should you have a larger amount of row-matches this will still be a slow query.
What you'll conceptually be doing is:
- Do a pre-select on your table (here
cte_t
) to get all possible values that could be a valid hit and attach a row-number to each entry - Do a select on that pre-select to fetch the specific row that you actually want and get only its row-number (here
targetRows
) - "join" the two by pretty much just multiplying the two tables generated in 1) and 2).
Now you can easily select for all entries whose row-number is in a specific range around the target's row-number using ABS
WITH
cte_t AS (
SELECT *, row_number() OVER (ORDER BY t.rowid) AS rownum
FROM t
-- If you can make this cte smaller by removing all entries that can't possibly be the solution with the appropriate WHERE clause, you can make the entire query substantially faster
),
targetRows AS(
SELECT rownum AS targetRowNum
FROM cte_t
WHERE f = 'ccc' -- This should be the WHERE condition that defines the entries that match your query exactly and for which you want to get the entries around them
)
SELECT cte_t.rowid, cte_t.f
FROM cte_t, targetRows -- This is basically multiplying both tables with one another, this part will be horribly slow if targetRows gets larger
WHERE ABS(cte_t.rownum - targetRows.targetRowNum) <= 1; --Get all entries in targetRows as well as those whose rownum is 1 larger or 1 lower than the rownum of a targetRow
This will return
rowid f
2 bbb
3 ccc
4 ddd
9 bbb
10 ccc
11 ddd
Here a good resource about this.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Phi |
Solution 3 | Philipp Doerner |