Splitting Groupby() In Pandas Into Smaller Groups And Combining Them
Solution 1:
You can create a helper column via GroupBy
+ cumcount
to count the occurrence of each city.
Then use dict
+ tuple
with another GroupBy
to create a dictionary of dataframes, each one containing exactly one occurence of each city.
# add index column giving count of city occurrence
df['index'] = df.groupby('city').cumcount()
# create dictionary of dataframes
d = dict(tuple(df.groupby('index')))
Result:
print(d)
{0: city temperature windspeed event index
day
2017-01-01 newyork 32 6 Rain 0
2017-01-01 mumbai 90 5 Sunny 0
2017-01-01 paris 45 20 Sunny 0,
1: city temperature windspeed event index
day
2017-01-02 newyork 36 7 Sunny 1
2017-01-02 mumbai 85 12 Fog 1
2017-01-02 paris 50 13 Cloudy 1,
2: city temperature windspeed event index
day
2017-01-03 newyork 28 12 Snow 2
2017-01-03 mumbai 87 15 Fog 2
2017-01-03 paris 54 8 Cloudy 2,
3: city temperature windspeed event index
day
2017-01-04 newyork 33 7 Sunny 3
2017-01-04 mumbai 92 5 Rain 3
2017-01-04 paris 42 10 Cloudy 3}
You can then extract individual "groups" via d[0]
, d[1]
, d[2]
, d[3]
. In this particular case, you may wish to group by dates instead, i.e.
d = {df_.index[0]: df_ for _, df_ in df.groupby('index')}
Solution 2:
This is my approach to this. First sort your dataframe by day
and city
:
df = df.sort_values(by=['day', 'city'])
Next find an even split of 4 groups for your dataframe - if the split is not even then the last group will get the remaining:
n = int(len(df)/4)
groups_n = np.cumsum([0, n, n, n, len(df)-(3*n)])
print(groups_n)
OUT >> array([ 0, 6, 12, 18, 25], dtype=int32)
groups_n
is the start
and end
for each group. So Group 1
I will take df.iloc[0:6]
and Group 4
I will take df.iloc[18:25]
.
So your final dictionary, d
, of the 4 group split of your dataframe will be:
d = {}
for i in range(4):
d[i+1] = df.iloc[groups_n[i]:groups_n[i+1]]
Example Outputs:Group 1 (
d[1]
)
city temperature windspeed event
day
2017-01-01 mumbai 90 5 Sunny
2017-01-01 new york 32 6 Rain
2017-01-01 paris 45 20 Sunny
2017-01-02 mumbai 85 12 Fog
2017-01-02 new york 36 7 Sunny
2017-01-02 paris 50 13 Cloudy
Group 4: (
d[4]
)
city temperature windspeed event
day
2017-01-07 mumbai 85 9 Sunny
2017-01-07 new york 27 12 Rain
2017-01-07 paris 40 14 Rain
2017-01-08 mumbai 89 8 Rain
2017-01-08 new york 23 7 Rain
2017-01-08 paris 42 15 Cloudy
2017-01-09 paris 53 8 Sunny
Post a Comment for "Splitting Groupby() In Pandas Into Smaller Groups And Combining Them"