'Placeholder for DataFrame in pd.query
I use pd.query
and pd.eval
a lot. However, sometimes I find myself in situations where I would like to filter an unnamed DataFrame with pd.query
and it would be very handy if I could use a name for the DataFrame in the query expression. For sake of an example, consider this DataFrame:
>>> df = pd.DataFrame(
data=np.arange(20).reshape(5, 4),
columns=pd.MultiIndex.from_product([['A', 'B'], ['x', 'y']]))
>>> df
A B
x y x y
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
Now, to get the y
columns with values above 15 it would feel natural for me to try something like
df.loc[:, (slice(None), 'y')].query('(PLACEHOLDER > 15).any(1)')
My question is: Is there such a placeholder to refer to the DataFrame, on which query
has been called, for use in the expression? This would save me a lot of typing in many situations.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|