'Sort columns values based on floats inside a string, then concat
I'm working on a pretty messy DF. Looking like this, but with 30 columns:
a | b |
---|---|
some text (other text) : 56.3% (text again: 40%) | again text (not same text) : 33% (text text: 60.1%) |
text (always text) : 26.6% (aaand text: 80%) | still text (too much text) : 86% (last text: 10%) |
What I'm trying to do is creating another column, c, which concat a & b, but the concatenation must be sorted based on the first number (I don't whant to change row's order). Result expected:
c |
---|
some text (other text) : 56% (text again: 40%) again text (not same text) : 33% (text text: 60%) |
still text (too much text) : 86% (last text: 10%) text (always text) : 26% (aaand text: 80%) |
Any idea ?
Solution 1:[1]
You can try apply
a customized function
def concat(row):
keys = row.str.extract('(\d+\.?\d*)%')[0].astype(float).tolist()
row = [x for _, x in sorted(zip(keys, row.tolist()))]
return ' '.join(row)
df['c'] = df.apply(concat, axis=1)
print(df)
a b
0 some text (other text) : 56.3% (text again: 40%) again text (not same text) : 33% (text text: 6...
1 text (always text) : 26.6% (aaand text: 80%) still text (too much text) : 86% (last text: 10%)
a \
0 some text (other text) : 56.3% (text again: 40%)
1 text (always text) : 26.6% (aaand text: 80%)
b \
0 again text (not same text) : 33% (text text: 60.1%)
1 still text (too much text) : 86% (last text: 10%)
c
0 again text (not same text) : 33% (text text: 60.1%) some text (other text) : 56.3% (text again: 40%)
1 text (always text) : 26.6% (aaand text: 80%) still text (too much text) : 86% (last text: 10%)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |