Add Unique Groups To Df For Each Row Including Sum From Other Columns
I got a DatFrame looking like this: ID field_1 area_1 field_2 area_2 field_3 area_3 field_4 area_4 1 scoccer 500 basketball 200
Solution 1:
Use pd.wide_to_long
to reshape the DataFrame, which allows you to group by field and ID and sum the areas. Then pivot_table
back to the wide format, after creating the column label with cumcount
.
df = (pd.wide_to_long(df, i='ID', j='num', stubnames=['field', 'area'], sep='_')
.groupby(['ID', 'field'])['area'].sum()
.reset_index())
# ID field area#0 1 basketball 250.0#1 1 scoccer 500.0#2 1 swimming 100.0#3 2 volleyball 100.0#4 3 basketball 1000.0#5 3 football 10.0#6 4 basketball 320.0#7 4 swimming 480.0#8 5 football 160.0#9 5 volleyball 140.0df['idx'] = df.groupby('ID').cumcount()+1
df = (pd.pivot_table(df, index='ID', columns='idx', values=['field', 'area'],
aggfunc='first')
.sort_index(axis=1, level=1))
df.columns = ['_'.join(map(str, tup)) for tup in df.columns]
area_1 field_1 area_2 field_2 area_3 field_3
ID
1250.0 basketball 500.0 scoccer 100.0 swimming
2100.0 volleyball NaNNaNNaNNaN31000.0 basketball 10.0 football NaNNaN4320.0 basketball 480.0 swimming NaNNaN5160.0 football 140.0 volleyball NaNNaN
Just for fun, you could use the undocumented pd.lreshape
instead of wide_to_long
.
# Change range to (1,31) for your real data.
pd.lreshape(df, {'area': [f'area_{i}'for i inrange(1,5)],
'field': [f'field_{i}'for i inrange(1,5)]}
# ID area field#0 1 500.0 scoccer#1 2 100.0 volleyball#2 3 1000.0 basketball#3 4 280.0 swimming#4 5 110.0 volleyball#5 1 200.0 basketball#....#10 4 320.0 basketball#11 5 30.0 volleyball#12 1 50.0 basketball
Post a Comment for "Add Unique Groups To Df For Each Row Including Sum From Other Columns"