Can I Replace Nans With The Mode Of A Column In A Grouped Data Frame?
I have some data that looks like... Year Make Model Trim 2007 Acura TL Base 2010 Dodge Avenger SXT 2009 Dodge Caliber SXT 2008 Dodge Caliber SXT 20
Solution 1:
Use mode
In [215]:dfOut[215]:YearMakeModelTrim02007 AcuraTLBase12010 DodgeAvengerSXT22009 DodgeCaliberNaN32008 DodgeCaliberSXT42008 DodgeAvengerSXTIn [216]:df.Trim.fillna(df.Trim.mode()[0])Out[216]:0Base1SXT2SXT3SXT4SXTName:Trim,dtype:object
Use inplace=True
to actually set
In [217]:df.Trim.fillna(df.Trim.mode()[0],inplace=True)In [218]:dfOut[218]:YearMakeModelTrim02007 AcuraTLBase12010 DodgeAvengerSXT22009 DodgeCaliberSXT32008 DodgeCaliberSXT42008 DodgeAvengerSXT
If you're working on groups
In[227]: dfOut[227]:
YearMakeModelTrim02007AcuraTLBase12007AcuraTLXLR22007AcuraTLNaN32007AcuraTLBaseIn[228]: (df.groupby(['Year', 'Make', 'Model'])['Trim']
.apply(lambda x: x.fillna(x.mode()[0])))
...:
Out[228]:
0Base1XLR2Base3BaseName: Trim, dtype: object
Solution 2:
Use groupby
then mode
. Note that mode
returns an array and you want to grab the first element of it. @John Galt deserves credit for this and gets my upvote.
I use assign
to create a copy of df
with an overwritten version of the Trim
column.
df.assign(Trim=df.groupby(
['Year', 'Make', 'Model']
).Trim.apply(lambda x:x.fillna(x.mode()[0])))YearMakeModelTrim02007 AcuraTLBase12007 AcuraTLXLR22007 AcuraTLBase32007 AcuraTLBase
You can overwrite the column directly with
df['Trim'] = df.groupby(
['Year', 'Make', 'Model']
).Trim.apply(
lambda x: x.fillna(x.mode()[0])
)
Post a Comment for "Can I Replace Nans With The Mode Of A Column In A Grouped Data Frame?"