Sort A Dataframe And Count A Value With Percentages

November 16, 2024 Post a Comment

I have a DataFrame like this: Kind Status 1 True 2 False 3 True 2 False 2 True I counted the kinds with it df.Kind.sort_values() and got this: 1

Solution 1:

crosstab + div

Using pandas.crosstab:

res = pd.crosstab(df['Kind'], df['Status'])

res[['Pct False', 'Pct True']] = res.div(res.sum(axis=1), axis=0)

print(res)

Status  FalseTrue  Pct False   Pct True
Kind                                     
1010.0000001.0000002210.6666670.3333333010.0000001.000000

In my opinion, this is the most natural way to display your data. Combining counts with percentages in a single series is not recommended.

crosstab + crosstab normalize

Alternatively, you can join a couple of crosstab results, one normalized, the other not.

res = pd.crosstab(df['Kind'], df['Status'])\
        .join(pd.crosstab(df['Kind'], df['Status'], normalize='index'), rsuffix='_pct')

print(res)

Status  False  True  False_pct  True_pct
Kind                                    
1           0     1   0.000000  1.000000
2           2     1   0.666667  0.333333
3           0     1   0.000000  1.000000

crosstab normalize only

If you are looking only for percentages, you can just use the normalize argument:

res = pd.crosstab(df['Kind'], df['Status'], normalize='index')

print(res)

Status     False     True 
Kind                      
1       0.000000  1.000000
2       0.666667  0.333333
3       0.000000  1.000000

Solution 2:

Use groupby with size and unstack for pivot by counts:

df1 = df.groupby(['Kind','Status']).size().unstack(fill_value=0)
#alternative solution, slowier in large data#df1 = pd.crosstab(df['Kind'], df['Status'])print (df1)
Status  FalseTrue 
Kind                
101221301

Then divide by sum and append to original:

df = df1.append(df1.div(df1.sum(axis=1), axis=0)).sort_index()
print (df)
Status     FalseTrue 
Kind                      
10.0000001.00000010.0000001.00000022.0000001.00000020.6666670.33333330.0000001.00000030.0000001.000000

print (df.loc[2])
Status     FalseTrue 
Kind                      
22.0000001.00000020.6666670.333333

But if want avoid converting integers to floats change append to join and for unique columns add add_prefix:

df = df1.join(df1.div(df1.sum(axis=1), axis=0).add_prefix('pct '))
print (df)
Status  FalseTrue  pct False  pct True
Kind                                    
1010.0000001.0000002210.6666670.3333333010.0000001.000000print (df.loc[[2]])

Status  FalseTrue  pct False  pct True
Kind                                    
2210.6666670.333333

Solution 3:

You can simply use:

g = df.loc[df['Kind']==2].groupby(['Kind', 'Status']).size().unstack()
pd.concat([g,g.apply(lambda x: round(x / (x[False]+x[True]), 2), axis=1)])

Output:

Status  FalseTrue
Kind        
22.001.0020.670.33

Python Playground