'Snowflake subquery

I have two tables. Transaction(ID, TERMINALID) and Terminal(ID, TERMINALID, EXPORT_DATE). The goal is to obtain for each row from Transaction table newest recored from Terminal table. Snowflake is used as a backend.

I have this SQL query:

SELECT tr.ID,
       (SELECT te.ID
        FROM "Terminal" te
        WHERE te.TERMINALID = tr.TERMINALID
        ORDER BY te.EXPORT_DATE DESC
        LIMIT 1)
FROM "Transaction" tr;

But I get this error:

SQL compilation error: Unsupported subquery type cannot be evaluated

Error disappears if I replace tr.TERMINALID with a specific value. So I can't reference parent table from nested SELECT. Why this is not possible? Query works in MySQL.



Solution 1:[1]

I'm afraid Snowflake doesn't support correlated subqueries of this kind.

You can achieve what you want by using FIRST_VALUE to compute best per-terminalid id :

-- First compute per-terminalid best id
with sub1 as (
  select 
    terminalid, 
    first_value(id) over (partition by terminalid order by d desc) id
  from terminal 
),
-- Now, make sure there's only one per terminalid id
sub2 as (
  select 
    terminalid, 
    any_value(id) id
  from sub1
  group by terminalid
)
-- Now use that result
select tr.ID, sub2.id
FROM "Transaction" tr
JOIN sub2 ON tr.terminalid = sub2.terminalid

You can run subqueries first to see what they do.

We're working on making our support for subqueries better, and possibly there's a simpler rewrite, but I hope it helps.

Solution 2:[2]

SELECT
tr.ID
  , (SELECT te.ID
     FROM "Terminal" te 
     WHERE te.TERMINALID = tr.TERMINALID
     ORDER BY te.EXPORT_DATE DESC
     LIMIT 1
    ) AS the_id -- <<-- add an alias for the column
FROM "Transaction" tr
    ;

UPDATE:

  • length for type varchar cannot exceed 10485760
  • just use type varchar (or text) instead

Works here (with quoted identifiers):

CREATE TABLE "Transaction" ("ID" VARCHAR(123), "TERMINALID"  VARCHAR(123)) ;
CREATE TABLE "Terminal" ( "ID"  VARCHAR(123), "TERMINALID"  VARCHAR(123), "EXPORT_DATE" DATE);

SELECT tr."ID"
        , (SELECT te."ID"
        FROM "Terminal" te
        WHERE te."TERMINALID" = tr."TERMINALID"
        ORDER BY te."EXPORT_DATE" DESC
        LIMIT 1) AS meuk
FROM "Transaction" tr
        ;

BONUS UPDATE: avoid the scalar subquery and use plain old NOT EXISTS(...) to obtain the record with the most recent date:

SELECT tr."ID"
        , te."ID" AS meuk
FROM "Transaction" tr
JOIN "Terminal" te ON te."TERMINALID" = tr."TERMINALID"
        AND NOT EXISTS ( SELECT *
        FROM "Terminal" nx
        WHERE nx."TERMINALID" = te."TERMINALID"
        AND nx."EXPORT_DATE" > te."EXPORT_DATE"
        )
        ;

Solution 3:[3]

So few years later (2022), some correlated subqueries are support, but not this one:

using this data:

WITH transaction(id, terminalid) AS (
    SELECT * FROM VALUES
    (1,10),
    (2,11),
    (3,12)
), terminal(id, terminalid, export_date) AS (
     SELECT * FROM VALUES
    (100, 10, '2022-03-18'::date),
    (101, 10, '2022-03-19'::date),
    (102, 11, '2022-03-20'::date),
    (103, 11, '2022-03-21'::date),
    (104, 11, '2022-03-22'::date),
    (105, 12, '2022-03-23'::date)
)

So compared to Marcin's we can now use a QUALIFY to select only one value per terminalid in a single step:

WITH last_terminal as (
    SELECT id, 
        terminalid
    FROM terminal
    QUALIFY row_number() over(PARTITION BY terminalid ORDER BY export_date desc) = 1
)
SELECT tr.ID,
      te.id
FROM transaction AS tr
JOIN last_terminal AS te 
    ON te.TERMINALID = tr.TERMINALID
ORDER BY 1;

giving:

ID ID
1 101
2 104
3 105

and if you have multiple terminals per day, and terimal.id is incrementing number you could use:

QUALIFY row_number() over(PARTITION BY terminalid ORDER BY export_date desc, id desc) = 1

Now if your table is not that larger, you can do the JOIN then prune via the QUALIFY, and avoid the CTE, but on large tables this is much less performant, so I would only use this form when doing ad-hoc queries, where swapping forms if viable if performance problems occur.

SELECT tr.ID,
      te.id
FROM transaction AS tr
JOIN terminal AS te 
    ON te.TERMINALID = tr.TERMINALID
QUALIFY row_number() over(PARTITION BY tr.id ORDER BY te.export_date desc, te.id desc) = 1    
ORDER BY 1;

Solution 4:[4]

This kind of subquery is currently not supported.

Working with subqueries - Limitations:

The only type of subquery that allows a LIMIT / FETCH clause is an uncorrelated scalar subquery. Also, because an uncorrelated scalar subquery returns only 1 row, the LIMIT clause has little or no practical value inside a subquery


Query in question is correlated subquery, thus the result.

SELECT tr.ID,
       (SELECT te.ID
        FROM "Terminal" te
        WHERE te.TERMINALID = tr.TERMINALID  --correlation
        ORDER BY te.EXPORT_DATE DESC
        LIMIT 1)
FROM "Transaction" tr;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Marcin Zukowski
Solution 2
Solution 3 Simeon Pilgrim
Solution 4 Lukasz Szozda