'LEFT JOIN with an OR in the ON clause BigQuery Standard SQL
I need some help understanding joins in bigquery standard sql. I want to do a left join keeping all the columns in table1, and joining to table2 if 2 fields match OR a different 2 fields match. This should be better explained in my example.
Table1:
id1 id2 column1
1 a first
2 b second
3 c third
4 d fourth
5 e fifth
6 f sixth
Table 2:
id3 id4 column2
1 5674 alpha
2 4535 bravo
345 a charlie
341 b delta
I want to keep all the rows in table1 and all the rows from table2 if id1 = id3 OR id2 = id4, the resulting table would look like this:
Result:
id1 id2 column1 id3 id4 column2
1 a first 1 5674 alpha
1 a first 345 a charlie
2 b second 2 4535 bravo
2 b second 341 b delta
3 c third
4 d fourth
5 e fifth
6 f sixth
Although I can't get this result as it seems I can't do a left join with an OR statement in the ON clause.
I have tried this query:
SELECT * FROM table1
JOIN table2
on (table1.id1 = table2.id3) or (table1.id2 = table2.id4)
Which is an inner join and results in:
id1 id2 column1 id3 id4 column2
1 a first 1 5674 alpha
1 a first 345 a charlie
2 b second 2 4535 bravo
2 b second 341 b delta
Which is almost there but does not include the other rows from table1
Trying this query:
SELECT * FROM table1
LEFT JOIN table2
on (table1.id1 = table2.id3) or (table1.id2 = table2.id4)
Results in the error:
LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
I understand that there may be a workaround which I would love to hear, but also it would be good to understand why the same method for the inner join is not applicable to the left join.
Thanks in advance, Ryan
EDIT I am still struggling to get my head around why this is not possible, Mikhail's workaround is fine quite heavy on resources and I'd like to understand what is preventing me from using the conditional left join.
Solution 1:[1]
Below is for BigQuery Standard SQL
one of the options:
#standardSQL
SELECT DISTINCT * FROM (
SELECT * FROM `project.dataset.table1`
LEFT JOIN `project.dataset.table2`
ON id1 = id3
UNION ALL
SELECT * FROM `project.dataset.table1`
LEFT JOIN `project.dataset.table2`
ON id2 = id4
)
Solution 2:[2]
ORACLE SQL:
SELECT * FROM table1
LEFT JOIN table2
on (table1.id1 = table2.id3) or (table1.id2 = table2.id4)
The above ORACLE SQL can be re-written in BIGQUERY. As both queries will fetch the same results.
BIGQUERY SQL:
#standardSQL
SELECT * FROM (
SELECT * FROM `project.dataset.table1`
LEFT JOIN `project.dataset.table2`
ON id1 = id3
UNION ALL
SELECT * FROM `project.dataset.table1`
LEFT JOIN `project.dataset.table2`
ON id2 = id4
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Thomas |
Solution 2 |