'How to extract tables names in a SQL script?
Assuming there is a sql script:
select *
from (
select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b
on a.col1 = b.col2
left join
test.test_c c
on b.col2 = c.col3
left jon
(select
col4
from
test.test_d) d
on c.col3 = d.col4
I am reading this question and tring to extract all TABLE NAMES after 'from
' or 'join
' in the script above using python.
The difficuty is I am processing the script line by line, but the table name and the key word may be NOT in the same line.
So how to extract such table names from the script? Any suggestion is appreciated.
Solution 1:[1]
If you would like to use core python:
txt = """
select *
from (
select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b
on a.col1 = b.col2
left join
test.test_c c
on b.col2 = c.col3
left jon
(select
col4
from
test.test_d) d
on c.col3 = d.col4"""
replace_list = ['\n', '(', ')', '*', '=']
for i in replace_list:
txt = txt.replace(i, ' ')
txt = txt.split()
res = []
for i in range(1, len(txt)):
if txt[i-1] in ['from', 'join'] and txt[i] != 'select':
res.append(txt[i])
print(res)
Solution 2:[2]
Just convert all multiple sequences of spaces (including newlines) to a single space, than you'll get a single line, and cat look for your table names using a regular expression.
import re
sql = """select *
from (
select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b
on a.col1 = b.col2
left join
test.test_c c
on b.col2 = c.col3
left join
(select
col4
from
test.test_d) d
on c.col3 = d.col4"""
sql_line = re.sub('\s+', ' ', sql)
tbl_re = re.compile(r'(?:\b(?:from)|(?:join)\b\s+)(\w+)\b')
tablenames = tbl_re.findall(sql_line)
print(tablenames)
Please note that the table name extraction regexp is simplified and only intended as an example (you have to take into account possible quoting, etc.).
Solution 3:[3]
Here is a quick improvement on top of @r.user.05apr answer. Combining some bits from https://stackoverflow.com/a/46177004/82961
import re
txt = """
select *
from (
select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b
on a.col1 = b.col2
left join
test.test_c c -- from xxx
on b.col2 = c.col3 /* join xxxxx */
left jon
(select
col4
from
test.test_d) d
on c.col3 = d.col4"""
def get_tables(sql_str):
# remove the /* */ comments
sql_str = re.sub(r"/\*[^*]*\*+(?:[^*/][^*]*\*+)*/", "", sql_str)
# remove whole line -- and # comments
lines = [line for line in sql_str.splitlines() if not re.match("^\s*(--|#)", line)]
# remove trailing -- and # comments
sql_str = " ".join([re.split("--|#", line)[0] for line in lines])
replace_list = ['\n', '(', ')', '*', '=']
for i in replace_list:
sql_str = sql_str.replace(i, ' ')
sql_str = sql_str.split()
res = []
for i in range(1, len(sql_str)):
if sql_str[i-1] in ['from', 'join'] and sql_str[i] != 'select':
res.append(sql_str[i])
print(res)
get_tables(txt)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | r.user.05apr |
Solution 2 | Marco Pantaleoni |
Solution 3 | Faiz |