'SQL: Use RegEx to return multiple substrings of a specified string in column
I am querying metadata on Snowflake which contains a column of queries:
select query_name
from metadata
query_name |
---|
SELECT * FROM SYSIBM.SQLCOLUMNS |
SELECT * FROM SYSIBM.SYSDUMMY1 |
SELECT CCID, CCCODE FROM V820.ZCC ZCC JOIN V820.ZCI ZCI ON ZCC.CCID = ZCI.CCID WHERE trim(CCCODE) NOT IN ('89090','89089','89087') |
SELECT * FROM V820.IIM IIM LEFT JOIN V820.IIME AS IIME ON IIM.IPROD = IIME.IMPROD LEFT JOIN (SELECT CCCODE, CCDESC FROM V820.ZCC) ZCC ON IIM.CCCODE = ZCC.CCCODE LEFT JOIN (SELECT ICLAS, ICDES FROM V820.IIC) AS IIC ON IIM.ICLAS = IIC.ICLAS |
I want to grab ALL table references in the query_name column. The output should look like the following:
query_name |
---|
SYSIBM.SQLCOLUMNS |
SYSIBM.SYSDUMMY1 |
V820.ZCC V820.ZCI |
V820.IIM V820.IIME V820.ZCC V820.IIC |
Edit: I've updated what I think is a potential solution using regular expressions:
with data as
(select parentid,
query_name
FROM
metadata),
froms as
(select any_value(data.parentid) parentid, listagg(regexp_substr(value, '\.[^\.]+\.'), ' ') dependencies
from data, table(split_to_table(upper(query_name), 'FROM '))
where index>1
group by seq)
SELECT * FROM froms
dependencies |
---|
SYSIBM. |
SYSIBM. |
V820. V820. |
V820. V820. |
But I don't have the correct RegEx syntax. How can I adjust my RegEx to get everything before and after the periods?
Solution 1:[1]
Try this query:
Select
SUBSTRING(query_name,CHARINDEX('from ', lower(query_name)) + 5,CHARINDEX(' ', lower(query_name + ' '), CHARINDEX('from ', lower(query_name)) + 5) - CHARINDEX('from ', lower(query_name)) - 5) query_name
from infotech.log_analytics.metadata;
Solution 2:[2]
This wouldnt work with queries involving more than one table, neither if you have tables whose names cointains space character ([Example table], for example). Anyway, following your asumptions, you want to get the words between the 'FROM ' and the first space after that (please notice space left after from). Here is one possible approach. I use a query against a subquery:
Subquery: here we get the text after 'FROM ' from your table:
SELECT
SUBSTRING(
query_name,
CHARINDEX('FROM ', UPPER(query_name)) + 1,
ABS(LEN(query_name) - CHARINDEX('FROM ', UPPER(query_name)))
) AS AUXILIARFIELD
FROM
infotech.log_analytics.metadata
Once we have this (should return the following):
SYSIBM.SQLCOLUMNS
SYSIBM.SYSDUMMY1 V820.ZCC ZCC JOIN V820.ZCI ZCI ON ZCC.CCID = ZCI.CCID WHERE trim(CCCODE) NOT IN ('89090','89089','89087')
V820.IIM
what you need next is to get the text from the start to the first position where there is an space character, this is easier (please notice the AUX subquery is the one explained above):
SELECT
SUBSTRING(AUXILIARFIELD,
1,
IIF(CHARINDEX(' ', AUXILIARFIELD) = 0,
LEN(AUXILIARFIELD),
CHARINDEX(' ', AUXILIARFIELD))) AS EXPECTEDRESULT
FROM (
SELECT
SUBSTRING(
query_name,
CHARINDEX('FROM ', UPPER(query_name)) + 5,
ABS(LEN(query_name) - CHARINDEX('FROM ', UPPER(query_name)))
) AS AUXILIARFIELD
FROM
Table1
) AS AUXILIARQUERY
Please have a look at it and take into account I may have missed index exception cases if no space is after the table name. I just checked it against a different table to try it and later used your table and field names.
Fiddle here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |