Skip to content Skip to sidebar Skip to footer

Can I Replace Nans With The Mode Of A Column In A Grouped Data Frame?

I have some data that looks like... Year Make Model Trim 2007 Acura TL Base 2010 Dodge Avenger SXT 2009 Dodge Caliber SXT 2008 Dodge Caliber SXT 20

Solution 1:

Use mode

In [215]:dfOut[215]:YearMakeModelTrim02007  AcuraTLBase12010  DodgeAvengerSXT22009  DodgeCaliberNaN32008  DodgeCaliberSXT42008  DodgeAvengerSXTIn [216]:df.Trim.fillna(df.Trim.mode()[0])Out[216]:0Base1SXT2SXT3SXT4SXTName:Trim,dtype:object

Use inplace=True to actually set

In [217]:df.Trim.fillna(df.Trim.mode()[0],inplace=True)In [218]:dfOut[218]:YearMakeModelTrim02007  AcuraTLBase12010  DodgeAvengerSXT22009  DodgeCaliberSXT32008  DodgeCaliberSXT42008  DodgeAvengerSXT

If you're working on groups

In[227]: dfOut[227]:
   YearMakeModelTrim02007AcuraTLBase12007AcuraTLXLR22007AcuraTLNaN32007AcuraTLBaseIn[228]: (df.groupby(['Year', 'Make', 'Model'])['Trim']
             .apply(lambda x: x.fillna(x.mode()[0])))
     ...:
Out[228]:
0Base1XLR2Base3BaseName: Trim, dtype: object

Solution 2:

Use groupby then mode. Note that mode returns an array and you want to grab the first element of it. @John Galt deserves credit for this and gets my upvote.

I use assign to create a copy of df with an overwritten version of the Trim column.

df.assign(Trim=df.groupby(
        ['Year', 'Make', 'Model']
    ).Trim.apply(lambda x:x.fillna(x.mode()[0])))YearMakeModelTrim02007  AcuraTLBase12007  AcuraTLXLR22007  AcuraTLBase32007  AcuraTLBase

You can overwrite the column directly with

df['Trim'] = df.groupby(
    ['Year', 'Make', 'Model']
).Trim.apply(
    lambda x: x.fillna(x.mode()[0])
)

Post a Comment for "Can I Replace Nans With The Mode Of A Column In A Grouped Data Frame?"