'Window functions not working in pd.read_sql; Its shows error
I am currently using the European soccer SQLite database for my data analysis in Google collab (Jupyter notebook).
Aim of analysis; For a specific team ex: Chelsea, get the wins and loss label for every match (Done using CASE statement) and after this partition the match count by its season and win_loss result. This is all done within pd.read_sql() statement in google collab(Jupyter notebook).
The statement runs all fine until the window function is introduced. But the query runs all fine in the SQLite DB browser (image attached). The main error i get is OperationalError: near "(": syntax error
Here is the code
Home_Perf = pd.read_sql(""" --- CTE to get the wins and loss as a home team
WITH Homes AS (
SELECT season, team_long_name AS HomeTeam,
home_team_goal, away_team_goal,
CASE
WHEN home_team_goal > away_team_goal THEN 'win'
WHEN home_team_goal < away_team_goal THEN 'loss'
ELSE 'Tie' END AS Win_Loss
FROM match
---Inner JOIN for getting the team name
INNER JOIN team
ON team_api_id = home_team_api_id
WHERE home_team_api_id = 8455)
SELECT season, HomeTeam,
COUNT(Win_Loss) OVER(PARTITION BY season) AS counts
FROM homes""", conn)
Home_Perf
Here is the error
ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 38))
---------------------------------------------------------------------------
OperationalError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in execute(self, *args, **kwargs)
1585 try:
-> 1586 cur.execute(*args, **kwargs)
1587 return cur
OperationalError: near "(": syntax error
The above exception was the direct cause of the following exception:
DatabaseError Traceback (most recent call last)
3 frames
<ipython-input-17-9b1c924dbbdd> in <module>()
15 SELECT season, HomeTeam,
16 COUNT(Win_Loss) OVER(PARTITION BY season) AS counts
---> 17 FROM homes""", conn)
18 Home_Perf
/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
410 coerce_float=coerce_float,
411 parse_dates=parse_dates,
--> 412 chunksize=chunksize,
413 )
414
/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in read_query(self, sql, index_col, coerce_float, params, parse_dates, chunksize)
1631
1632 args = _convert_params(sql, params)
-> 1633 cursor = self.execute(*args)
1634 columns = [col_desc[0] for col_desc in cursor.description]
1635
/usr/local/lib/python3.6/dist-packages/pandas/io/sql.py in execute(self, *args, **kwargs)
1596
1597 ex = DatabaseError(f"Execution failed on sql '{args[0]}': {exc}")
-> 1598 raise ex from exc
1599
1600 @staticmethod
DatabaseError: Execution failed on sql ' --- CTE to get the wins and loss as a home team
WITH Homes AS (
SELECT season, team_long_name AS HomeTeam,
home_team_goal, away_team_goal,
CASE
WHEN home_team_goal > away_team_goal THEN 'win'
WHEN home_team_goal < away_team_goal THEN 'loss'
ELSE 'Tie' END AS Win_Loss
FROM match
---Inner JOIN for getting the team name
INNER JOIN team
ON team_api_id = home_team_api_id
WHERE home_team_api_id = 8455)
SELECT season, HomeTeam,
COUNT(Win_Loss) OVER(PARTITION BY season) AS counts
FROM homes': near "(": syntax error
Solution 1:[1]
tl;dr Google Colab's using SQLite version 3.22 but SQLite only supports Window Functions after version 3.25.
I think the issue is that Google Colab uses an outdated version of SQLlite that does not support window functions. Google has to udpate this! I write this as of March 6, 2022.
To your point, more specifically:
On September 15, 2018 SQLite released version 3.25, and as you can see in the release log here, the first thing they did in this release was:
- Add support for window functions
This an issue I've encountered before, and the trick is to update your SQLite3 library, BUT, you are not on a local device, you are on a Google Colab.
So the next thing you want to do is check which version of SQLlite you have. You can do this by running the following line of SQL code: SELECT sqlite_version();
and I got that goodie from here.
And since you are querying within Pandas, you'd run:
pd.read_sql_query("""
SELECT sqlite_version();
""", conn)
So I just ran exactly that today (3/6/22), and I got the following in my Google Colab:
And this is too old of a version. I don't know how we can update the sqlite library within a Google Colab. So this is as far as I've gotten, and I think we just don't have that functionality at this point. This post may help with the updating of sqlite within the Notebook.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | George Hayward |