How To Sort Dataset Based On 2 Custom Lists?
Solution 1:
You can set you data as ordered categories. There are several advantages, smaller memory consumption and faster sorting once the categories are in place:
df4['category'] = pd.Categorical(df4['category'],
categories=list(custom_dict),
ordered=True)
df4['segment'] = pd.Categorical(df4['segment'],
categories=list(custom_dict2),
ordered=True)
df4 = df4.sort_values(by=['category','segment'])
NB. you don't need a dictionary for this solution, a list with the categories in the desired order is sufficient
example output (from random input):
category segment
11 HC Offline
14 HC Offline
1 HC Online
5 HC Independent
16 HC Independent
19 AMG HC Offline
15 AMG HC Online
3 S Offline
4 S Offline
0 S Independent
12 S Independent
9 AMG S Online
10 AMG S Independent
2 HCA Offline
6 HCA Offline
17 HCA Independent
7 AMG HCA Offline
13 AMG HCA Independent
8 MUP Online
18 AMG MUP Independent
Solution 2:
The key function in sort_values is supposed to be applied to the columns category and segment individually, but you are instead trying to map both the columns in one go which is producing the incorrect output. In order to fix your code we can create an additional order dictionary which helps us map the column names to the corresponding mapping dictionary
order = {'category': custom_dict, 'segment': custom_dict2}
df4.sort_values(['category', 'segment'], key=lambda s: s.map(order[s.name]))
category segment ytd2020 ytd2021 Evolution
0 HC Offline 101142 105726 4.5%
1 AMG HC Offline 38541 39463 2.4%
2 S Offline 55653 57537 2.1%
5 S Online 99301 97283 -2.0%
6 AMG S Offline 80212 87011 12.4%
7 HCA Independent 95731 119289 24.6%
4 MUP Online 84921 90310 8.2%
3 AMG MUP Offline 19561 21402 4.3%
Solution 3:
As per my understanding from your previous question, your primary goal is to place all AMG XXX immediately following the corresponding XXX category for the same segment. E.g. AMG HC to immediately followHC in offline segment and AMG S to immediately followS in the same offline segment.
As such, you can use:
idx = (df4[['category','segment']].apply(tuple, axis=1)
.sort_values(key=lambda x: x.str[0].map(custom_dict) * 10 + x.str[1].map(custom_dict2))
.index
)
df5 = df4.loc[idx]
Note that I have placed the sorted dataframe into a new name df5 instead of overwriting df4. You are free to change this df5 to df4 if it is more convenient to you.
Result:
print(df5)
category segment ytd2020 ytd2021 Evolution
0 HC Offline 1011421057264.5%1 AMG HC Offline 38541394632.4%2 S Offline 55653575372.1%5 S Online 9930197283 -2.0%6 AMG S Offline 802128701112.4%7 HCA Independent 9573111928924.6%4 MUP Online 84921903108.2%3 AMG MUP Offline 19561214024.3%
Post a Comment for "How To Sort Dataset Based On 2 Custom Lists?"