Skip to content Skip to sidebar Skip to footer

Splitting Groupby() In Pandas Into Smaller Groups And Combining Them

city temperature windspeed event day 2017-01-01 new york 32 6 Rain

Solution 1:

You can create a helper column via GroupBy + cumcount to count the occurrence of each city.

Then use dict + tuple with another GroupBy to create a dictionary of dataframes, each one containing exactly one occurence of each city.

# add index column giving count of city occurrence
df['index'] = df.groupby('city').cumcount()

# create dictionary of dataframes
d = dict(tuple(df.groupby('index')))

Result:

print(d)

{0:                city  temperature  windspeed  event  index
 day                                                      
 2017-01-01  newyork           32          6   Rain      0
 2017-01-01   mumbai           90          5  Sunny      0
 2017-01-01    paris           45         20  Sunny      0,
 1:                city  temperature  windspeed   event  index
 day                                                       
 2017-01-02  newyork           36          7   Sunny      1
 2017-01-02   mumbai           85         12     Fog      1
 2017-01-02    paris           50         13  Cloudy      1,
 2:                city  temperature  windspeed   event  index
 day                                                       
 2017-01-03  newyork           28         12    Snow      2
 2017-01-03   mumbai           87         15     Fog      2
 2017-01-03    paris           54          8  Cloudy      2,
 3:                city  temperature  windspeed   event  index
 day                                                       
 2017-01-04  newyork           33          7   Sunny      3
 2017-01-04   mumbai           92          5    Rain      3
 2017-01-04    paris           42         10  Cloudy      3}

You can then extract individual "groups" via d[0], d[1], d[2], d[3]. In this particular case, you may wish to group by dates instead, i.e.

d = {df_.index[0]: df_ for _, df_ in df.groupby('index')}

Solution 2:

This is my approach to this. First sort your dataframe by day and city:

df = df.sort_values(by=['day', 'city'])

Next find an even split of 4 groups for your dataframe - if the split is not even then the last group will get the remaining:

n = int(len(df)/4)
groups_n = np.cumsum([0, n, n, n, len(df)-(3*n)])
print(groups_n)
OUT >> array([ 0,  6, 12, 18, 25], dtype=int32)

groups_n is the start and end for each group. So Group 1 I will take df.iloc[0:6] and Group 4 I will take df.iloc[18:25].

So your final dictionary, d, of the 4 group split of your dataframe will be:

d = {}
for i in range(4):
    d[i+1] = df.iloc[groups_n[i]:groups_n[i+1]]

Example Outputs:Group 1 (d[1])

            city      temperature  windspeed    event
day             
2017-01-01  mumbai    90           5            Sunny
2017-01-01  new york  32           6            Rain
2017-01-01  paris     45           20           Sunny
2017-01-02  mumbai    85           12           Fog
2017-01-02  new york  36           7            Sunny
2017-01-02  paris     50           13           Cloudy

Group 4: (d[4])

            city       temperature  windspeed   event
day             
2017-01-07  mumbai     85           9           Sunny
2017-01-07  new york   27           12          Rain
2017-01-07  paris      40           14          Rain
2017-01-08  mumbai     89           8           Rain
2017-01-08  new york   23           7           Rain
2017-01-08  paris      42           15          Cloudy
2017-01-09  paris      53           8           Sunny

Post a Comment for "Splitting Groupby() In Pandas Into Smaller Groups And Combining Them"