'Python Dataframes - Breaking out single rows with duplicate columns into multiple rows and fewer columns
I have a data frame like this:
A | B | C | Date1 | Time1 | Value1 | Date2 | Time2 | Value2 |
---|---|---|---|---|---|---|---|---|
abc | def | ghi | 01-01-2000 | 15:00:00 | 100 | 01-01-2000 | 19:00:00 | 110 |
There are duplicate columns for Date/Time/Value and I would like to merge them into a single column. The tricky part is that I would like the values of columns A/B/C to be shown for each new row added. Something like this:
A | B | C | Date | Time | Value |
---|---|---|---|---|---|
abc | def | ghi | 01-01-2000 | 15:00:00 | 100 |
abc | def | ghi | 01-01-2000 | 19:00:00 | 110 |
The only approach I could think of was to take a subset of the original dataframe such as:
A | B | C | Date1 | Time1 | Value1 |
---|---|---|---|---|---|
abc | def | ghi | 01-01-2000 | 15:00:00 | 100 |
AND
A | B | C | Date2 | Time2 | Value2 |
---|---|---|---|---|---|
abc | def | ghi | 01-01-2000 | 19:00:00 | 110 |
Then try to merge them together. There has to be a faster way to do this. Any help would be appreciated
Solution 1:[1]
pd.wide_to_long
works fine for this; however, if your indices are not unique, pivot_longer from pyjanitor can be helpful here, and is efficient:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(
index = ['A','B','C'],
names_to = '.value',
names_pattern = r"(\D+)\d")
A B C Date Time Value
0 abc def ghi 01-01-2000 15:00:00 100
1 abc def ghi 01-01-2000 19:00:00 110
The .value
here determines which part of the columns remain as column headers; this is determined from the grouped regex in named_pattern
.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | sammywemmy |