'Get index and column with multiple headers and index_col in Pandas DataFrame
I have a dataframe with multiple headers and column indexes, and would like to retrieve the list of entries that are non-zero. The dataframe is constructed from a .csv file provided by another party.
Its hard to include data as its sensitive, but I read in the data and remove NaNs to make it smaller and only include non-zero rows and columns.
df = pd.read_csv('Example.csv', header=[0,1,2,3], index_col=[0,1])
a = df.where(df==1).dropna(how='all').dropna(axis=1)
x = [(df[col][df[col].eq(1)].index[i], df.columns.get_loc(col)) for col in df.columns for i in range(len(df[col][df[col].eq(1)].index))]
for i in range(len(x)):
print(x[i])
I am hoping for the output
((index col1, index col2), (header 3))
So I guess the hypothetical would be
If I listed every iteration of comic book characters under header I would have:
Brand: Marvel/DC/Etc
Hero: Spiderman/Captain America/...
Person: Parker/Riley/Morales
Then my column indexes would be Comic name, next column number of that comic.
Each entry would be 1 if the character is present, and nothing otherwise in the .csv read from Excel.
I would like the output to be ((Amazing Spiderman, 1),( Parker, Spiderman))
etc.
I hope that makes sense.
Solution 1:[1]
I resolved this by removing rows not being used in the query at that time. It is not an ideal solution but it will make your version operational, though it does mean it can be fiddly if you need both/N headers outputted.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | EmptySet |