Skip to content Skip to sidebar Skip to footer

Python Pandas Count Most Frequent Occurrences

This is my sample data frame with data about orders: import pandas as pd my_dict = { 'status' : ['a', 'b', 'c', 'd', 'a','a', 'd'], 'city' : ['London','Berlin','Paris',

Solution 1:

Here are some more "pandas" ways of doing the same thing:

To get top three components

#Using list comprehension usually faster than .str accessor in pandas
pd.concat([pd.Series(i.split(',')) for i in df.components]).value_counts().head(3)
#OR using "pure" pandas methods
df.components.str.split(',', expand=True).stack().value_counts().head(3)

Output:

 e05    6
 e04    5
 d02    4
dtype: int64

Next find cohorts, 3 components reported together n=3:

from itertools import combinations
n=3
pd.concat([pd.Series(list(combinations(i.split(','), n))) for i in df.components])\
  .value_counts().head(3)

Output:

( с43,  e04,  e05)    4
(a02,  e04,  e05)     3
( с43,  d07,  e05)    3
dtype: int64

Solution 2:

@ScottBoston's answer shows vectorized (hence probably faster) ways to achieve this.

Top occurring

from collections import Counter
from itertools import chain

n = 3
individual_components = chain.from_iterable(df['components'].str.split(', '))
counter = Counter(individual_components)
print(counter.most_common(n))
# [('e05', 6), ('e04', 5), ('a02', 4)]

Top-n co-occuring

Note that I'm using n twice, once for "the size of the co-occurrence" and once for the "top-n" part. Obviously, you can use 2 different variables.

from collections import Counter
from itertools import combinations

n = 3
individual_components = []
for components in df['components']:
    order_components = sorted(components.split(', '))
    individual_components.extend(combinations(order_components, n))
counter = Counter(individual_components)
print(counter.most_common(n))
# [(('e04', 'e05', 'с43'), 4), (('a02', 'b08', 'e05'), 3), (('a02', 'd07', 'e05'), 3)]

Post a Comment for "Python Pandas Count Most Frequent Occurrences"