'panda df not showing all rows after loading from MS SQL
I'm using Pandas with latest sqlalchemy (1.4.36
) to query a MS SQL DB, using the following Python 3.10.3
[Win] snippet:
import pandas as pd #
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL
# ...
def get_table_columns():
SQLA = 'SELECT TABLE_NAME,COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME LIKE \'pa%\' ORDER BY TABLE_NAME;'
# Use pandas for getting named table & columns
conn_str = set_db_info()
conn_url = URL.create("mssql+pyodbc", query={"odbc_connect": conn_str})
engine = create_engine(conn_url)
df = pd.read_sql(SQLA, engine)
# Permanently changes the pandas settings
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
print(df)
return df
However, this only prints the first 292 rows, and not all of the entire 2351 rows. Using REPL, I can check this with:
>>> z = get_table_columns()
>>> z
TABLE_NAME COLUMN_NAME
0 paacc accesscd
... # <-- I added these
292 paapepi piapeheadat
>>> z.count()
TABLE_NAME 2351
COLUMN_NAME 2351
dtype: int64
>>> z.shape[0]
2351
>>> z.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2351 entries, 0 to 2350
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 TABLE_NAME 2351 non-null object
1 COLUMN_NAME 2351 non-null object
dtypes: object(2)
memory usage: 36.9+ KB
Q: What is going on, and why can't I print/show all the rows?
Solution 1:[1]
The problem was that pandas returns a packed dataframe (DF). So when printing you only get a partial (at best) or broken (at worst) strings from the data.
For some reason this is always on by default and the results varies widely as to how much of the string/data that is shown. The solution is to use the unpacking operator (*
) before/when trying to print the df, like this:
print(*df)
(This is also know as the splat operator for Ruby enthusiasts.)
References & tutorials:
- https://treyhunner.com/2018/10/asterisks-in-python-what-they-are-and-how-to-use-them/
- https://www.geeksforgeeks.org/python-star-or-asterisk-operator/
- https://medium.com/understand-the-python/understanding-the-asterisk-of-python-8b9daaa4a558
- https://towardsdatascience.com/unpacking-operators-in-python-306ae44cd480
Solution 2:[2]
To display all the rows in pandas, you should set the display option to None or 1 extra from the dataframe size as you have done in your code:
pd.set_option('display.max_rows', None)
pandas.set_option('display.max_rows', z.shape[0]+1)
Given that this is not the problem, it may be that the IDE or program that you use automatically crops this information from the view (For example Ipython crops every big output).
Other thing to try is to force the print of the dataframe instead of just is return value:
>>> print(z)
To inspect everything I would recommend you to pass that into a csv/excel file to do it better.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | not2qubit |
Solution 2 | mrleugim33 |