Skip to content Skip to sidebar Skip to footer

Cumulative Sum Using 2 Columns

I am trying to create a column that does a cumulative sum using 2 columns , please see example of what I am trying to do :@Faith Akici index lodgement_year words sum cum

Solution 1:

You are almost there, Ian!

cumsum() method calculates the cumulative sum of a Pandas column. You are looking for that applied to the grouped words. Therefore:

In [303]:df_2['cumsum']=df_2.groupby(['words'])['sum'].cumsum()In [304]:df_2Out[304]:indexlodgement_yearwordssumcum_sumcumsum002000        the141414112000  australia101010222000       word121212332000      brand888442000      fresh555552001        the82222662001  australia31313772001     banana111882001      brand71515992001      fresh166

Please comment if this fails on your bigger data set, and we'll work on a possibly more accurate version of this.

Solution 2:

If we only need to consider the column 'words', we might need to loop through unique values of the words

forunique_wordsin df_2.words.unique():
    if'cum_sum' not in df_2:
        df_2['cum_sum'] = df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()
    else:
        df_2.update(pd.DataFrame({'cum_sum': df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()}))

above will result to:

>>>print(df_2)lodgement_yearsumwordscum_sum02000   14the14.012000   10australia10.022000   12word12.032000    8brand8.042000    5fresh5.052001    8the22.062001    3australia13.072001    1banana1.082001    7brand15.092001    1fresh6.0

Post a Comment for "Cumulative Sum Using 2 Columns"