Skip to content Skip to sidebar Skip to footer

Enumerate Each Row For Each Group In A Dataframe

In pandas, how can I add a new column which enumerates rows based on a given grouping? For instance, assume the following DataFrame: import pandas as pd import numpy as np a_list

Solution 1:

There's cumcount, for precisely this case:

df['col_c'] = g.cumcount()

As it says in the docs:

Number each item in each group from 0 to the length of that group - 1.


Original answer (before cumcount was defined).

You could create a helper function to do this:

defadd_col_c(x):
    x['col_c'] = np.arange(len(x))
    return x

First sort by column col_a:

In [11]: df.sort('col_a', inplace=True)

then apply this function across each group:

In [12]: g = df.groupby('col_a', as_index=False)

In [13]: g.apply(add_col_c)
Out[13]:
  col_a  col_b  col_c
3     A      308     A      810     A      024     A      436     B      601     B      117     B      729     C      902     C      215     C      52

In order to get 1,2,... you couls use np.arange(1, len(x) + 1).

Solution 2:

The given answers both involve calling a python function for each group, and if you have many groups a vectorized approach should be faster (I havent checked).

Here is my pure numpy suggestion:

In [5]: df.sort(['col_a', 'col_b'], inplace=True, ascending=(False, False))
In [6]: sizes = df.groupby('col_a', sort=False).size().values
In [7]: df['col_c'] = np.arange(sizes.sum()) - np.repeat(sizes.cumsum() - sizes, sizes)
In [8]: print df
  col_a  col_b  col_c
9     C      905     C      512     C      227     B      706     B      611     B      128     A      804     A      413     A      320     A      03

Solution 3:

You could define your own function to deal with that:

In [58]: def func(x):
   ....:     x['col_c'] = x['col_a'].argsort() + 1 
   ....:     return x
   ....: 

In [59]: df.groupby('col_a').apply(func)
Out[59]: 
  col_a  col_b  col_c
0     A      013     A      324     A      438     A      841     B      116     B      627     B      732     C      215     C      529     C      93

Post a Comment for "Enumerate Each Row For Each Group In A Dataframe"