Sort A Dataframe And Count A Value With Percentages
I have a DataFrame like this: Kind Status 1 True 2 False 3 True 2 False 2 True I counted the kinds with it df.Kind.sort_values() and got this: 1
Solution 1:
crosstab + div
Using pandas.crosstab
:
res = pd.crosstab(df['Kind'], df['Status'])
res[['Pct False', 'Pct True']] = res.div(res.sum(axis=1), axis=0)
print(res)
Status FalseTrue Pct False Pct True
Kind
1010.0000001.0000002210.6666670.3333333010.0000001.000000
In my opinion, this is the most natural way to display your data. Combining counts with percentages in a single series is not recommended.
crosstab + crosstab normalize
Alternatively, you can join a couple of crosstab
results, one normalized, the other not.
res = pd.crosstab(df['Kind'], df['Status'])\
.join(pd.crosstab(df['Kind'], df['Status'], normalize='index'), rsuffix='_pct')
print(res)
Status False True False_pct True_pct
Kind
1 0 1 0.000000 1.000000
2 2 1 0.666667 0.333333
3 0 1 0.000000 1.000000
crosstab normalize only
If you are looking only for percentages, you can just use the normalize
argument:
res = pd.crosstab(df['Kind'], df['Status'], normalize='index')
print(res)
Status False True
Kind
1 0.000000 1.000000
2 0.666667 0.333333
3 0.000000 1.000000
Solution 2:
Use groupby
with size
and unstack
for pivot by count
s:
df1 = df.groupby(['Kind','Status']).size().unstack(fill_value=0)
#alternative solution, slowier in large data#df1 = pd.crosstab(df['Kind'], df['Status'])print (df1)
Status FalseTrue
Kind
101221301
Then divide by sum
and append to original:
df = df1.append(df1.div(df1.sum(axis=1), axis=0)).sort_index()
print (df)
Status FalseTrue
Kind
10.0000001.00000010.0000001.00000022.0000001.00000020.6666670.33333330.0000001.00000030.0000001.000000
print (df.loc[2])
Status FalseTrue
Kind
22.0000001.00000020.6666670.333333
But if want avoid converting integer
s to float
s change append
to join
and for unique columns add add_prefix
:
df = df1.join(df1.div(df1.sum(axis=1), axis=0).add_prefix('pct '))
print (df)
Status FalseTrue pct False pct True
Kind
1010.0000001.0000002210.6666670.3333333010.0000001.000000print (df.loc[[2]])
Status FalseTrue pct False pct True
Kind
2210.6666670.333333
Solution 3:
You can simply use:
g = df.loc[df['Kind']==2].groupby(['Kind', 'Status']).size().unstack()
pd.concat([g,g.apply(lambda x: round(x / (x[False]+x[True]), 2), axis=1)])
Output:
Status FalseTrue
Kind
22.001.0020.670.33
Post a Comment for "Sort A Dataframe And Count A Value With Percentages"