Cumulative Sum Using 2 Columns
I am trying to create a column that does a cumulative sum using 2 columns , please see example of what I am trying to do :@Faith Akici index lodgement_year words sum cum
Solution 1:
You are almost there, Ian!
cumsum()
method calculates the cumulative sum of a Pandas column. You are looking for that applied to the grouped words
. Therefore:
In [303]:df_2['cumsum']=df_2.groupby(['words'])['sum'].cumsum()In [304]:df_2Out[304]:indexlodgement_yearwordssumcum_sumcumsum002000 the141414112000 australia101010222000 word121212332000 brand888442000 fresh555552001 the82222662001 australia31313772001 banana111882001 brand71515992001 fresh166
Please comment if this fails on your bigger data set, and we'll work on a possibly more accurate version of this.
Solution 2:
If we only need to consider the column 'words', we might need to loop through unique values of the words
forunique_wordsin df_2.words.unique():
if'cum_sum' not in df_2:
df_2['cum_sum'] = df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()
else:
df_2.update(pd.DataFrame({'cum_sum': df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()}))
above will result to:
>>>print(df_2)lodgement_yearsumwordscum_sum02000 14the14.012000 10australia10.022000 12word12.032000 8brand8.042000 5fresh5.052001 8the22.062001 3australia13.072001 1banana1.082001 7brand15.092001 1fresh6.0
Post a Comment for "Cumulative Sum Using 2 Columns"