Add Unique Groups To Df For Each Row Including Sum From Other Columns

June 10, 2023 Post a Comment

I got a DatFrame looking like this: ID field_1 area_1 field_2 area_2 field_3 area_3 field_4 area_4 1 scoccer 500 basketball 200

Solution 1:

Use pd.wide_to_long to reshape the DataFrame, which allows you to group by field and ID and sum the areas. Then pivot_table back to the wide format, after creating the column label with cumcount.

df = (pd.wide_to_long(df, i='ID', j='num', stubnames=['field', 'area'], sep='_')
        .groupby(['ID', 'field'])['area'].sum()
        .reset_index())
#   ID       field    area#0   1  basketball   250.0#1   1     scoccer   500.0#2   1    swimming   100.0#3   2  volleyball   100.0#4   3  basketball  1000.0#5   3    football    10.0#6   4  basketball   320.0#7   4    swimming   480.0#8   5    football   160.0#9   5  volleyball   140.0df['idx'] = df.groupby('ID').cumcount()+1
df = (pd.pivot_table(df, index='ID', columns='idx', values=['field', 'area'], 
                     aggfunc='first')
        .sort_index(axis=1, level=1))
df.columns = ['_'.join(map(str, tup)) for tup in df.columns]

    area_1     field_1  area_2     field_2  area_3   field_3
ID                                                          
1250.0  basketball   500.0     scoccer   100.0  swimming
2100.0  volleyball     NaNNaNNaNNaN31000.0  basketball    10.0    football     NaNNaN4320.0  basketball   480.0    swimming     NaNNaN5160.0    football   140.0  volleyball     NaNNaN

Just for fun, you could use the undocumented pd.lreshape instead of wide_to_long.

# Change range to (1,31) for your real data.
pd.lreshape(df, {'area': [f'area_{i}'for i inrange(1,5)],
                 'field': [f'field_{i}'for i inrange(1,5)]}

#    ID    area       field#0    1   500.0     scoccer#1    2   100.0  volleyball#2    3  1000.0  basketball#3    4   280.0    swimming#4    5   110.0  volleyball#5    1   200.0  basketball#....#10   4   320.0  basketball#11   5    30.0  volleyball#12   1    50.0  basketball

Python Playground

Add Unique Groups To Df For Each Row Including Sum From Other Columns

Solution 1:

Post a Comment for "Add Unique Groups To Df For Each Row Including Sum From Other Columns"