'Snowflake subquery
I have two tables. Transaction(ID, TERMINALID) and Terminal(ID, TERMINALID, EXPORT_DATE). The goal is to obtain for each row from Transaction table newest recored from Terminal table. Snowflake is used as a backend.
I have this SQL query:
SELECT tr.ID,
(SELECT te.ID
FROM "Terminal" te
WHERE te.TERMINALID = tr.TERMINALID
ORDER BY te.EXPORT_DATE DESC
LIMIT 1)
FROM "Transaction" tr;
But I get this error:
SQL compilation error: Unsupported subquery type cannot be evaluated
Error disappears if I replace tr.TERMINALID with a specific value. So I can't reference parent table from nested SELECT. Why this is not possible? Query works in MySQL.
Solution 1:[1]
I'm afraid Snowflake doesn't support correlated subqueries of this kind.
You can achieve what you want by using FIRST_VALUE
to compute best per-terminalid id :
-- First compute per-terminalid best id
with sub1 as (
select
terminalid,
first_value(id) over (partition by terminalid order by d desc) id
from terminal
),
-- Now, make sure there's only one per terminalid id
sub2 as (
select
terminalid,
any_value(id) id
from sub1
group by terminalid
)
-- Now use that result
select tr.ID, sub2.id
FROM "Transaction" tr
JOIN sub2 ON tr.terminalid = sub2.terminalid
You can run subqueries first to see what they do.
We're working on making our support for subqueries better, and possibly there's a simpler rewrite, but I hope it helps.
Solution 2:[2]
SELECT
tr.ID
, (SELECT te.ID
FROM "Terminal" te
WHERE te.TERMINALID = tr.TERMINALID
ORDER BY te.EXPORT_DATE DESC
LIMIT 1
) AS the_id -- <<-- add an alias for the column
FROM "Transaction" tr
;
UPDATE:
length for type varchar cannot exceed 10485760
- just use type
varchar
(ortext
) instead
Works here (with quoted identifiers):
CREATE TABLE "Transaction" ("ID" VARCHAR(123), "TERMINALID" VARCHAR(123)) ;
CREATE TABLE "Terminal" ( "ID" VARCHAR(123), "TERMINALID" VARCHAR(123), "EXPORT_DATE" DATE);
SELECT tr."ID"
, (SELECT te."ID"
FROM "Terminal" te
WHERE te."TERMINALID" = tr."TERMINALID"
ORDER BY te."EXPORT_DATE" DESC
LIMIT 1) AS meuk
FROM "Transaction" tr
;
BONUS UPDATE: avoid the scalar subquery and use plain old NOT EXISTS(...)
to obtain the record with the most recent date:
SELECT tr."ID"
, te."ID" AS meuk
FROM "Transaction" tr
JOIN "Terminal" te ON te."TERMINALID" = tr."TERMINALID"
AND NOT EXISTS ( SELECT *
FROM "Terminal" nx
WHERE nx."TERMINALID" = te."TERMINALID"
AND nx."EXPORT_DATE" > te."EXPORT_DATE"
)
;
Solution 3:[3]
So few years later (2022), some correlated subqueries are support, but not this one:
using this data:
WITH transaction(id, terminalid) AS (
SELECT * FROM VALUES
(1,10),
(2,11),
(3,12)
), terminal(id, terminalid, export_date) AS (
SELECT * FROM VALUES
(100, 10, '2022-03-18'::date),
(101, 10, '2022-03-19'::date),
(102, 11, '2022-03-20'::date),
(103, 11, '2022-03-21'::date),
(104, 11, '2022-03-22'::date),
(105, 12, '2022-03-23'::date)
)
So compared to Marcin's we can now use a QUALIFY to select only one value per terminalid
in a single step:
WITH last_terminal as (
SELECT id,
terminalid
FROM terminal
QUALIFY row_number() over(PARTITION BY terminalid ORDER BY export_date desc) = 1
)
SELECT tr.ID,
te.id
FROM transaction AS tr
JOIN last_terminal AS te
ON te.TERMINALID = tr.TERMINALID
ORDER BY 1;
giving:
ID | ID |
---|---|
1 | 101 |
2 | 104 |
3 | 105 |
and if you have multiple terminals per day, and terimal.id is incrementing number you could use:
QUALIFY row_number() over(PARTITION BY terminalid ORDER BY export_date desc, id desc) = 1
Now if your table is not that larger, you can do the JOIN then prune via the QUALIFY, and avoid the CTE, but on large tables this is much less performant, so I would only use this form when doing ad-hoc queries, where swapping forms if viable if performance problems occur.
SELECT tr.ID,
te.id
FROM transaction AS tr
JOIN terminal AS te
ON te.TERMINALID = tr.TERMINALID
QUALIFY row_number() over(PARTITION BY tr.id ORDER BY te.export_date desc, te.id desc) = 1
ORDER BY 1;
Solution 4:[4]
This kind of subquery is currently not supported.
Working with subqueries - Limitations:
The only type of subquery that allows a LIMIT / FETCH clause is an uncorrelated scalar subquery. Also, because an uncorrelated scalar subquery returns only 1 row, the LIMIT clause has little or no practical value inside a subquery
Query in question is correlated subquery, thus the result.
SELECT tr.ID,
(SELECT te.ID
FROM "Terminal" te
WHERE te.TERMINALID = tr.TERMINALID --correlation
ORDER BY te.EXPORT_DATE DESC
LIMIT 1)
FROM "Transaction" tr;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Marcin Zukowski |
Solution 2 | |
Solution 3 | Simeon Pilgrim |
Solution 4 | Lukasz Szozda |