'How to completely reorganise a table using aggregate data from qualitative information
I have a pandas dataframe which has the following layout:
Column | data type |
---|---|
'Water-Binder' | float |
'Fly Ash' | float |
'Age' | int |
'Strength %' | float |
The age column is qualitative, features records at 1, 3, 7, 14, and 28 days. I want to group the rows by water-binder and fly ash, calculate the mean Strength % for the groups at each age, resulting in a table looking something like this:
Column | data type |
---|---|
'Water-Binder' | float |
'Fly Ash' | float |
'Mean Strength % 1 day' | float |
'Mean Strength % 3 days' | float |
'Mean Strength % 7 days' | float |
'Mean Strength % 14 days' | float |
'Mean Strength % 28 days' | float |
I've been trying to figure out how this could be done. This is the closest thing to something that works that I've managed to achieve:
age_strength_model = data[['Water-Binder', 'Fly Ash', 'Age', 'Strength %']].copy()
ages = np.unique(age_strength_model['Age'].values)
# create table investigating the relationship between fly ash, age, and compressive strength
for a in ages:
age_strength_model.query(f'`Fly Ash` == 0 & Age == {a}').groupby(['Water-Binder'])['Strength %'].transform(lambda x: x.mean())
However, that just shows me the values rather than organising them into a dataset and doesn't accommodate for grouping water-binder and fly ash together. How would I achieve the desired end result here?
Solution 1:[1]
Try to concatenate 'Water-Binder', 'Fly Ash', 'Age' fields and then group:
data = [
[0.43, 0.0, 3, 26.446759],
[0.43, 0.0, 7, 44.444444],
[0.43, 0.0, 28, 100.00000],
[0.43, 0.0, 3, 11.316173],
[0.43, 0.0, 7, 37.493929]
]
df = pd.DataFrame(data, columns= ['Water-Binder', 'Fly Ash', 'Age', 'Strength %'])
df['Water-Binder-Fly-Ash-Age'] = (df['Water-Binder'].astype(str)
+ '%'
+ df['Fly Ash'].astype(str)
+ '%'
+ df['Age'].astype(str))
df
Water-Binder Fly Ash Age Strength % Water-Binder-Fly-Ash
0 0.43 0.0 3 26.446759 0.43%0.0%3
1 0.43 0.0 7 44.444444 0.43%0.0%7
2 0.43 0.0 28 100.000000 0.43%0.0%28
3 0.43 0.0 3 11.316173 0.43%0.0%3
4 0.43 0.0 7 37.493929 0.43%0.0%7
# Creates a grouped df with the indices reset and 'Strength %' field renamed.
df_grouped = df.groupby(by='Water-Binder-Fly-Ash-Age').mean()['Strength %'].reset_index()
df_grouped.rename(columns={'Strength %': 'Mean Strength %'}, inplace=True)
df_grouped
Water-Binder-Fly-Ash-Age Mean Strength %
0 0.43%0.0%28 100.000000
1 0.43%0.0%3 18.881466
2 0.43%0.0%7 40.969186
If you want to recover 'Water-Binder', 'Fly Ash', 'Age' fields, just split it based on the '%' delimiter and 'expand' it:
df_grouped[['Water-Binder', 'Fly Ash', 'Age']] = df['Water-Binder-Fly-Ash-Age'].str.split('%', expand=True)
df_grouped
Water-Binder-Fly-Ash-Age Mean Strength % Water-Binder Fly Ash Age
0 0.43%0.0%28 100.000000 0.43 0.0 3
1 0.43%0.0%3 18.881466 0.43 0.0 7
2 0.43%0.0%7 40.969186 0.43 0.0 28
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |