Enumerate Each Row For Each Group In A Dataframe
In pandas, how can I add a new column which enumerates rows based on a given grouping? For instance, assume the following DataFrame: import pandas as pd import numpy as np a_list
Solution 1:
There's cumcount, for precisely this case:
df['col_c'] = g.cumcount()
As it says in the docs:
Number each item in each group from 0 to the length of that group - 1.
Original answer (before cumcount was defined).
You could create a helper function to do this:
defadd_col_c(x):
x['col_c'] = np.arange(len(x))
return x
First sort by column col_a:
In [11]: df.sort('col_a', inplace=True)
then apply this function across each group:
In [12]: g = df.groupby('col_a', as_index=False)
In [13]: g.apply(add_col_c)
Out[13]:
col_a col_b col_c
3 A 308 A 810 A 024 A 436 B 601 B 117 B 729 C 902 C 215 C 52
In order to get 1,2,...
you couls use np.arange(1, len(x) + 1)
.
Solution 2:
The given answers both involve calling a python function for each group, and if you have many groups a vectorized approach should be faster (I havent checked).
Here is my pure numpy suggestion:
In [5]: df.sort(['col_a', 'col_b'], inplace=True, ascending=(False, False))
In [6]: sizes = df.groupby('col_a', sort=False).size().values
In [7]: df['col_c'] = np.arange(sizes.sum()) - np.repeat(sizes.cumsum() - sizes, sizes)
In [8]: print df
col_a col_b col_c
9 C 905 C 512 C 227 B 706 B 611 B 128 A 804 A 413 A 320 A 03
Solution 3:
You could define your own function to deal with that:
In [58]: def func(x):
....: x['col_c'] = x['col_a'].argsort() + 1
....: return x
....:
In [59]: df.groupby('col_a').apply(func)
Out[59]:
col_a col_b col_c
0 A 013 A 324 A 438 A 841 B 116 B 627 B 732 C 215 C 529 C 93
Post a Comment for "Enumerate Each Row For Each Group In A Dataframe"