'dataframe transformation python products atrributes values
I have an excel file with products like this below. Is it possible to align the same kind of attributes to same column using python?
I have this
category | name | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
board games | board game 1 | kind | family | box color | red | weight | 0.7 |
board games | board game 2 | kind | card game | box color | blue | lenght | 25 |
board games | board game 3 | box color | green | weight | 0.5 | lenght | 32 |
Desired output
category | name | kind | box color | weight | length |
---|---|---|---|---|---|
board games | board game1 | family games | red | 0.7 | |
board games | board game2 | card games | blue | 25 | |
board games | board game3 | green | 0.5 | 32 |
i have an other case with duplicate values.
category | name | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|
smartphones | samsung1 | sim type | dual | color | green | ram | 4GB | Storage | 128GB | year | 2021 |
smartphones | samsung2 | sim type | dual | color | green | ram | 4GB | storage | 256GB | year | 2021 |
smartphones | xiaomi3 | sim type | dual | color | blue | ram | 6GB | length | 32mm | Storage | 128GB |
desired output:
category | name | sim type | color | ram | Storage | length | year |
---|---|---|---|---|---|---|---|
smartphones | samsung1 | dual | green | 4GB | 128GB | nan | 2021 |
smartphones | samsung2 | dual | green | 4GB | 256GB | nan | 2021 |
smartphones | xiaomi3 | dual | blue | 6GB | 128GB | 32mm | nan |
error message:
ValueError: Index contains duplicate entries, cannot reshape
UPDATE_2:
Index contains duplicate entries, cannot reshape
category | name | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
smartphones | samsung1 | sim type | dual | color | green | ram | 4GB | Storage | 128GB | year | 2021 | camera | yes | fingerprint | yes | accelometer | yes | gyroscope | yes | NFC | yes |
smartphones | samsung2 | sim type | dual | color | green | ram | 4GB | storage | 256GB | year | 2021 | camera | yes | fingerprint | yes | ||||||
smartphones | xiaomi3 | sim type | dual | color | blue | ram | 6GB | length | 32mm | Storage | 128GB | camera | yes | fingerprint | yes |
desired output:
category | name | sim type | color | ram | Storage | length | year | camera | fingerprint | accelometer | gyroscope | NFC |
---|---|---|---|---|---|---|---|---|---|---|---|---|
smartphones | samsung1 | dual | green | 4GB | 128GB | nan | 2021 | yes | yes | yes | yes | no |
smartphones | samsung2 | dual | green | 4GB | 256GB | nan | 2021 | yes | yes | nan | nan | nan |
smartphones | xiaomi3 | dual | blue | 6GB | 128GB | 32mm | nan | yes | yes | nan | nan | nan |
Solution 1:[1]
Update
To solve duplicate issue, you can try:
out = df.set_index(['category', 'name'], append=True).melt(ignore_index=False)
m = out['variable'].astype(int) % 2 == 1
out = (pd.concat([out.loc[m, 'value'].rename('variable'),
out.loc[~m, 'value']], axis=1)
.set_index('variable', append=True)['value']
.unstack('variable').reset_index(['category', 'name'])
.rename_axis(columns=None))
Output:
category | name | Storage | color | length | ram | sim type | year |
---|---|---|---|---|---|---|---|
smartphones | samsung1 | 128GB | green | nan | 4GB | dual | 2021 |
smartphones | samsung2 | 256GB | green | nan | 4GB | dual | 2021 |
smartphones | xiaomi3 | 128GB | blue | 32mm | 6GB | dual | nan |
Old answer
You can mainly use stack
and unstack
to rearrange your dataframe:
# Separate odd (names) and even (values) columns
out = (pd.concat([df.iloc[:, 2::2].stack().droplevel(1),
df.iloc[:, 3::2].stack().droplevel(1)], axis=1)
.set_index(0, append=True)[1].unstack(0))
out = pd.concat([df.iloc[:, :2], out], axis=1)
Output:
category | name | box color | kind | lenght | weight |
---|---|---|---|---|---|
board games | board game 1 | red | family | nan | 0.7 |
board games | board game 2 | blue | card game | 25 | nan |
board games | board game 3 | green | nan | 32 | 0.5 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |