Recalculate Mean Considering Each Count

July 30, 2022 Post a Comment

if the dataframe is given as below index yearmon college major gpa num 0 20140401 1 a 3.36 29 1 20180401 2 b 2.63 48 2 20160401

Solution 1:

A lazy way, given that the number of students are integers,

(df.loc[df.index.repeat(df['num']), ['major', 'gpa']]
   .groupby('major').mean()
)

Option 2 groupby().apply() and np.average:

(df.groupby('major')
   .apply(lambda x: np.average(x['gpa'], weights=x['num']))
)

Option 3 Most complicated but best performant is to assign the total score, and calculate the average manually:

df['total'] = df['gpa'] * df['num']
groups = df.groupby('major')
out = groups['total'].sum()/groups['num'].sum()

Output:

         gpa
major       
a      3.360
b      3.284
c      3.230
d      4.220