Skip to content Skip to sidebar Skip to footer

Looking To Transform Continuous Variables Into Categorical

Sample Data: id val1 val2 val3 val4 val5 val6 val7 ///+8yr NaN 0.0 2.0 NaN 1 3 23 ///1vjh NaN NaN NaN NaN NaN 7 62 ///4wu 3

Solution 1:

IIUC you have two questions. The first question of replacing values larger than 5 with 'larger than 5' can be achieved with boolean indexing and the second question of grouping can be achieved with pd.cut()

DEMO:

d = pd.read_clipboard()

Part 1

Obtaining the values that does not satisfy the larger than 5 criteria,

rest = d.loc[:,'val1':'val6'][~(d.loc[:,'val1':'val6']>5)]
rest


   val1  val2  val3  val4  val5  val6
0NaN0.02.0NaN1.03.01NaNNaNNaNNaNNaNNaN23.0NaNNaNNaNNaNNaN

Obtaining the larger than 5 values

larger_than_5=d.loc[:,'val1':'val6'][d.loc[:,'val1':'val6']>5]
print(larger_than_5)

   val1  val2  val3  val4  val5  val6
0NaNNaNNaNNaNNaNNaN1NaNNaNNaNNaNNaN7.02NaNNaN6.0NaN7.08.0

Updating with your logic,

larger_than_5[larger_than_5.notnull()]='Larger than 5'
print(larger_than_5)

   val1  val2           val3  val4           val5           val6
0NaNNaNNaNNaNNaNNaN1NaNNaNNaNNaNNaN  Larger than 52NaNNaN  Larger than 5NaN  Larger than 5  Larger than 5

Updating rest with the logic,

rest.update(larger_than_5)
print(rest)

   val1  val2           val3  val4           val5           val6
0NaN0.02NaN131NaNNaNNaNNaNNaN  Larger than 523.0NaN  Larger than 5NaN  Larger than 5  Larger than 5

Replacing values of the original df with updated values as per logic 1

d.loc[:,'val1':'val6']= rest
print(d)

        id  val1  val2           val3  val4           val5           val6  \0///+8yr   NaN0.02NaN131///1vjh   NaNNaNNaNNaNNaN  Larger than 52///4wu   3.0NaN  Larger than 5NaN  Larger than 5  Larger than 5   

   val7  
0231622180

Part 2

Obtaining bins

bins = np.arange(0, d['val7'].max()+1, 30)
bins

array([  0,  30,  60,  90, 120, 150, 180], dtype=int64)

Creating a new series

val7_groups = pd.cut(d['val7'], bins)
val7_groups

0       (0, 30]
1      (60, 90]
2    (150, 180]

Adding that to the dataframe

d['val7_groups']= val7_groups
print(d)

        id  val1  val2           val3  val4           val5           val6  \0///+8yr   NaN0.02NaN131///1vjh   NaNNaNNaNNaNNaN  Larger than 52///4wu   3.0NaN  Larger than 5NaN  Larger than 5  Larger than 5   

   val7 val7_groups  
023(0,30]162(60,90]2180(150,180]

you can also set group labels by passing values to the labels parameter in pd.cut()

Post a Comment for "Looking To Transform Continuous Variables Into Categorical"