How To Sort Dataset Based On 2 Custom Lists?
Solution 1:
You can set you data as ordered categories. There are several advantages, smaller memory consumption and faster sorting once the categories are in place:
df4['category'] = pd.Categorical(df4['category'],
categories=list(custom_dict),
ordered=True)
df4['segment'] = pd.Categorical(df4['segment'],
categories=list(custom_dict2),
ordered=True)
df4 = df4.sort_values(by=['category','segment'])
NB. you don't need a dictionary for this solution, a list with the categories in the desired order is sufficient
example output (from random input):
category segment
11 HC Offline
14 HC Offline
1 HC Online
5 HC Independent
16 HC Independent
19 AMG HC Offline
15 AMG HC Online
3 S Offline
4 S Offline
0 S Independent
12 S Independent
9 AMG S Online
10 AMG S Independent
2 HCA Offline
6 HCA Offline
17 HCA Independent
7 AMG HCA Offline
13 AMG HCA Independent
8 MUP Online
18 AMG MUP Independent
Solution 2:
The key
function in sort_values
is supposed to be applied to the columns category
and segment
individually, but you are instead trying to map both the columns in one go which is producing the incorrect output. In order to fix your code we can create an additional order
dictionary which helps us map the column names to the corresponding mapping dictionary
order = {'category': custom_dict, 'segment': custom_dict2}
df4.sort_values(['category', 'segment'], key=lambda s: s.map(order[s.name]))
category segment ytd2020 ytd2021 Evolution
0 HC Offline 101142 105726 4.5%
1 AMG HC Offline 38541 39463 2.4%
2 S Offline 55653 57537 2.1%
5 S Online 99301 97283 -2.0%
6 AMG S Offline 80212 87011 12.4%
7 HCA Independent 95731 119289 24.6%
4 MUP Online 84921 90310 8.2%
3 AMG MUP Offline 19561 21402 4.3%
Solution 3:
As per my understanding from your previous question, your primary goal is to place all AMG XXX
immediately following the corresponding XXX
category for the same segment. E.g. AMG HC
to immediately followHC
in offline
segment and AMG S
to immediately followS
in the same offline
segment.
As such, you can use:
idx = (df4[['category','segment']].apply(tuple, axis=1)
.sort_values(key=lambda x: x.str[0].map(custom_dict) * 10 + x.str[1].map(custom_dict2))
.index
)
df5 = df4.loc[idx]
Note that I have placed the sorted dataframe into a new name df5
instead of overwriting df4
. You are free to change this df5
to df4
if it is more convenient to you.
Result:
print(df5)
category segment ytd2020 ytd2021 Evolution
0 HC Offline 1011421057264.5%1 AMG HC Offline 38541394632.4%2 S Offline 55653575372.1%5 S Online 9930197283 -2.0%6 AMG S Offline 802128701112.4%7 HCA Independent 9573111928924.6%4 MUP Online 84921903108.2%3 AMG MUP Offline 19561214024.3%
Post a Comment for "How To Sort Dataset Based On 2 Custom Lists?"