'Combine Columns in Pandas

Let's say I have the following Pandas dataframe. It is what it is and the input can't be changed.

df1 = pd.DataFrame(np.array([['a', 1,'e', 5],
                             ['b', 2, 'f', 6],
                             ['c', 3, 'g', 7],
                             ['d', 4, 'h', 8]]))
df1.columns = [1,1,2,2]

See how the columns have the same name? The output I want is to have columns with the same name combined (not summed or concatenated), meaning the second column 1 is added to the end of the first column 1, like so:

df2 = pd.DataFrame(np.array([['a', 'e'], 
                             ['b','f'], 
                             ['c', 'g'], 
                             ['d', 'h'], 
                             [1,5],
                             [2,6],
                             [3,7],
                             [4,8]]))
df2.columns = [1,2]

How do I do this? I can do it manually, except I actually have like 10 column titles, about 100 iterations of each title, and several thousand rows, so it takes forever and I have to redo it with each new dataset.

EDIT: the columns in actual datasets are unequal in length.

Solution 1:^[1]

You can use a dictionary whose default value is list and loop through the dataframe columns. Use the column name as dictionary key and append the column value to the dictionary value.

from collections import defaultdict
d = defaultdict(list)

for i, col in enumerate(df1.columns):
    d[col].extend(df1.iloc[:, i].values.tolist())

df = pd.DataFrame.from_dict(d, orient='index').T

print(df)

   1  2
0  a  e
1  b  f
2  c  g
3  d  h
4  1  5
5  2  6
6  3  7
7  4  8

For df1.columns = [1,1,2,3], the output is

   1     2     3
0  a     e     5
1  b     f     6
2  c     g     7
3  d     h     8
4  1  None  None
5  2  None  None
6  3  None  None
7  4  None  None

Solution 2:^[2]

Try with groupby and explode:

output = df1.groupby(level=0, axis=1).agg(lambda x: x.values.tolist()).explode(df1.columns.unique().tolist())

>>> output
   1  2
0  a  e
0  1  5
1  b  f
1  2  6
2  c  g
2  3  7
3  d  h
3  4  8

Edit:

To reorder the rows, you can do:

output = output.assign(order=output.groupby(level=0).cumcount()).sort_values("order",ignore_index=True).drop("order",axis=1)

>>> output
   1  2
0  a  e
1  b  f
2  c  g
3  d  h
4  1  5
5  2  6
6  3  7
7  4  8

Solution 3:^[3]

Depending on the size of your data, you could split the data into a dictionary and then create a new data frame from that:

df1 = pd.DataFrame(np.array([['a', 1, 'e', 5],
                             ['b', 2, 'f', 6],
                             ['c', 3, 'g', 7],
                             ['d', 4, 'h', 8]]))
df1.columns = [1, 1, 2, 2]

dictionary = {}
for column in df1.columns:
    items = []
    for item in df1[column].values.tolist():
        items += item
    dictionary[column] = items

new_df = pd.DataFrame(dictionary)

print(new_df)

Solution 4:^[4]

If I understand correctly, this seems to work:

pd.concat([s.reset_index(drop=True) for _, s in df1.melt().groupby("variable")["value"]], axis=1)

Output:

In [3]: pd.concat([s.reset_index(drop=True) for _, s in df1.melt().groupby("variable")["value"]], axis=1)
Out[3]:
  value value
0     a     e
1     b     f
2     c     g
3     d     h
4     1     5
5     2     6
6     3     7
7     4     8

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2
Solution 3	zrvcat
Solution 4	ddejohn

'Combine Columns in Pandas

Solution 1:[1]

Solution 2:[2]

Edit:

Solution 3:[3]

Solution 4:[4]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]

Solution 4:^[4]