Get The Count Of Matching And Not Matching Columns Data In A Dataframe
I have two dataframes which are like, This is the input csv data . Document_ID OFFSET PredictedFeature 0 0 2000 0 8 2000 0
Solution 1:
One idea is convert new
column to integers by Series.view
and then aggregate column new
with size
and sum
by list of tuples for specify new columns names:
df1['new'] = (df1['PredictedFeature'] == df2['PredictedFeature']).view('i1')
df = (df1.groupby("PredictedFeature")['new']
.agg([('inputCsvOccured','size'), ('outputcsvmatched','sum')])
.reset_index())
print (df)
PredictedFeature inputCsvOccured outputcsvmatched
0 2000 2 1
1 2100 3 1
2 2200 3 1
Pandas 0.25+ solution:
df1['new'] = (df1['PredictedFeature'] == df2['PredictedFeature']).view('i1')
df = (df1.groupby("PredictedFeature")
.agg(inputCsvOccured=pd.NamedAgg(column='new', aggfunc='size'),
outputcsvmatched=pd.NamedAgg(column='new', aggfunc='sum'))
.reset_index())
Solution 2:
you can do it using groupby like below
df1_inputPredictedFeature_column = pd.DataFrame([['0', '0', '2000'], ['0', '8', '2000'], ['0', '16', '2200'], ['0', '23', '2200'], ['0', '30', '2200'], ['1', '0', '2100'], ['1', '5', '2100'], ['1', '7', '2100']], columns=('Document_ID', 'OFFSET', 'PredictedFeature'))
df1_predictedFeature_column = pd.DataFrame([['0', '0', '2000'], ['0', '8', '2100'], ['0', '16', '2100'], ['0', '23', '2100'], ['0', '30', '2200'], ['1', '0', '2000'], ['1', '5', '2000'], ['1', '7', '2100']], columns=('Document_ID', 'OFFSET', 'PredictedFeature'))
df1_inputPredictedFeature_column['new'] = (df1_inputPredictedFeature_column['PredictedFeature'] == df1_predictedFeature_column['PredictedFeature']).astype(np.int)
result = df1_inputPredictedFeature_column.groupby("PredictedFeature").agg({"PredictedFeature":"count", "new":np.sum})
result.columns = ["inputCsvOccured", "outputcsvmatched"]
result.index.name = "predictedFeatureClass"
result.reset_index(inplace=True)
print(result)
Result
predictedFeatureClassinputCsvOccuredoutputcsvmatched02000 2112100 3122200 31
Post a Comment for "Get The Count Of Matching And Not Matching Columns Data In A Dataframe"