Skip to content Skip to sidebar Skip to footer

How To Sort Dataset Based On 2 Custom Lists?

I want to sort df based on 2 custom dictionaries: custom_dict = {'HC': 0, 'AMG HC': 1, 'S': 2, 'AMG S': 3, 'HCA':4, 'AMG HCA':5, 'MUP':6, 'AMG MUP':7} custom_dict2 =

Solution 1:

You can set you data as ordered categories. There are several advantages, smaller memory consumption and faster sorting once the categories are in place:

df4['category'] = pd.Categorical(df4['category'],
                                 categories=list(custom_dict),
                                 ordered=True)
df4['segment'] = pd.Categorical(df4['segment'],
                                categories=list(custom_dict2),
                                ordered=True)

df4 = df4.sort_values(by=['category','segment'])

NB. you don't need a dictionary for this solution, a list with the categories in the desired order is sufficient

example output (from random input):

   category      segment
11       HC      Offline
14       HC      Offline
1        HC       Online
5        HC  Independent
16       HC  Independent
19   AMG HC      Offline
15   AMG HC       Online
3         S      Offline
4         S      Offline
0         S  Independent
12        S  Independent
9     AMG S       Online
10    AMG S  Independent
2       HCA      Offline
6       HCA      Offline
17      HCA  Independent
7   AMG HCA      Offline
13  AMG HCA  Independent
8       MUP       Online
18  AMG MUP  Independent

Solution 2:

The key function in sort_values is supposed to be applied to the columns category and segment individually, but you are instead trying to map both the columns in one go which is producing the incorrect output. In order to fix your code we can create an additional order dictionary which helps us map the column names to the corresponding mapping dictionary

order = {'category': custom_dict, 'segment': custom_dict2}
df4.sort_values(['category', 'segment'], key=lambda s: s.map(order[s.name]))

  category      segment ytd2020 ytd2021 Evolution
0       HC      Offline  101142  105726      4.5%
1   AMG HC      Offline   38541   39463      2.4%
2        S      Offline   55653   57537      2.1%
5        S       Online   99301   97283     -2.0%
6    AMG S      Offline   80212   87011     12.4%
7      HCA  Independent   95731  119289     24.6%
4      MUP       Online   84921   90310      8.2%
3  AMG MUP      Offline   19561   21402      4.3%

Solution 3:

As per my understanding from your previous question, your primary goal is to place all AMG XXX immediately following the corresponding XXX category for the same segment. E.g. AMG HC to immediately followHC in offline segment and AMG S to immediately followS in the same offline segment.

As such, you can use:

idx = (df4[['category','segment']].apply(tuple, axis=1)
                                  .sort_values(key=lambda x: x.str[0].map(custom_dict) * 10 + x.str[1].map(custom_dict2))
                                  .index
      )

df5 = df4.loc[idx]

Note that I have placed the sorted dataframe into a new name df5 instead of overwriting df4. You are free to change this df5 to df4 if it is more convenient to you.

Result:

print(df5)


  category      segment ytd2020 ytd2021 Evolution
0       HC      Offline  1011421057264.5%1   AMG HC      Offline   38541394632.4%2        S      Offline   55653575372.1%5        S       Online   9930197283     -2.0%6    AMG S      Offline   802128701112.4%7      HCA  Independent   9573111928924.6%4      MUP       Online   84921903108.2%3  AMG MUP      Offline   19561214024.3%

Post a Comment for "How To Sort Dataset Based On 2 Custom Lists?"