'SQL: Use RegEx to return multiple substrings of a specified string in column

I am querying metadata on Snowflake which contains a column of queries:

select query_name
from metadata
query_name
SELECT * FROM SYSIBM.SQLCOLUMNS
SELECT * FROM SYSIBM.SYSDUMMY1
SELECT CCID, CCCODE FROM V820.ZCC ZCC JOIN V820.ZCI ZCI ON ZCC.CCID = ZCI.CCID WHERE trim(CCCODE) NOT IN ('89090','89089','89087')
SELECT * FROM V820.IIM IIM LEFT JOIN V820.IIME AS IIME ON IIM.IPROD = IIME.IMPROD LEFT JOIN (SELECT CCCODE, CCDESC FROM V820.ZCC) ZCC ON IIM.CCCODE = ZCC.CCCODE LEFT JOIN (SELECT ICLAS, ICDES FROM V820.IIC) AS IIC ON IIM.ICLAS = IIC.ICLAS

I want to grab ALL table references in the query_name column. The output should look like the following:

query_name
SYSIBM.SQLCOLUMNS
SYSIBM.SYSDUMMY1
V820.ZCC V820.ZCI
V820.IIM V820.IIME V820.ZCC V820.IIC

Edit: I've updated what I think is a potential solution using regular expressions:

with data as 
    (select parentid,
    query_name 
    FROM
        metadata),
    froms as
    (select any_value(data.parentid) parentid, listagg(regexp_substr(value, '\.[^\.]+\.'), '   ') dependencies
    from data, table(split_to_table(upper(query_name), 'FROM '))
    where index>1
    group by seq)
    SELECT * FROM froms
dependencies
SYSIBM.
SYSIBM.
V820. V820.
V820. V820.

But I don't have the correct RegEx syntax. How can I adjust my RegEx to get everything before and after the periods?



Solution 1:[1]

Try this query:

Select 
SUBSTRING(query_name,CHARINDEX('from ', lower(query_name)) + 5,CHARINDEX(' ', lower(query_name + ' '), CHARINDEX('from ', lower(query_name)) + 5) - CHARINDEX('from ', lower(query_name)) - 5) query_name
from infotech.log_analytics.metadata;

DBFiddle Demo Link

Solution 2:[2]

This wouldnt work with queries involving more than one table, neither if you have tables whose names cointains space character ([Example table], for example). Anyway, following your asumptions, you want to get the words between the 'FROM ' and the first space after that (please notice space left after from). Here is one possible approach. I use a query against a subquery:

Subquery: here we get the text after 'FROM ' from your table:

SELECT 
        SUBSTRING(  
                    query_name, 
                    CHARINDEX('FROM ', UPPER(query_name)) + 1, 
                    ABS(LEN(query_name) - CHARINDEX('FROM ', UPPER(query_name)))
                ) AS AUXILIARFIELD
     FROM 
        infotech.log_analytics.metadata
    
    

Once we have this (should return the following):

SYSIBM.SQLCOLUMNS

SYSIBM.SYSDUMMY1 V820.ZCC ZCC JOIN V820.ZCI ZCI ON ZCC.CCID = ZCI.CCID WHERE trim(CCCODE) NOT IN ('89090','89089','89087')

V820.IIM

what you need next is to get the text from the start to the first position where there is an space character, this is easier (please notice the AUX subquery is the one explained above):

SELECT  
      SUBSTRING(AUXILIARFIELD, 
                1, 
                IIF(CHARINDEX(' ', AUXILIARFIELD) = 0, 
                     LEN(AUXILIARFIELD), 
                     CHARINDEX(' ', AUXILIARFIELD))) AS EXPECTEDRESULT
FROM (
        SELECT 
                SUBSTRING(  
                            query_name, 
                            CHARINDEX('FROM ', UPPER(query_name)) + 5, 
                            ABS(LEN(query_name) - CHARINDEX('FROM ', UPPER(query_name)))
                        ) AS AUXILIARFIELD
             FROM 
                Table1
) AS AUXILIARQUERY
    

Please have a look at it and take into account I may have missed index exception cases if no space is after the table name. I just checked it against a different table to try it and later used your table and field names.

Fiddle here.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2