Skip to content Skip to sidebar Skip to footer

Comparing Data Frames And Getting The Differences With Python

I have two data frames as shown below. Where we have hours for a project by resources. One was the info about 10 days ago & the other is as of today. I want to find ONLY the di

Solution 1:

To compare and compute the difference, first set the indices of the data frames to be the PR No. and Resource columns. Combine the data frames using append. Then, group by index (which is the combination of PR No. and Resource) and compute the difference within each group. This will generate NaNs in the groups containing two values, they are not needed so, the dropna function takes care of that. Finally, call reset_index to bring back PR No. and Resource as columns.

# setup
data1 = [
    ["PN1", "Chris", 1],
    ["PN2", "Julie", 80],
    ["PN3", "John", 2.4],
    ["PN4", "Steve", 2]
]

data2 = [
    ["PN1", "Chris", 11],
    ["PN2", "Julie", 76],
    ["PN8", "John", 2.4],
    ["PN9", "Jonas", 2]
]

df1 = pd.DataFrame(data1, columns = ["PR No.", "Resource", "hours"])
df2 = pd.DataFrame(data2, columns = ["PR No.", "Resource", "hours"])

print(df1)
print(df2)

# solution
group_by_cols = ["PR No.", "Resource"]
indexed_by_group_cols_1 = df1.set_index(group_by_cols)
indexed_by_group_cols_2 = df2.set_index(group_by_cols)
appended = indexed_by_group_cols_1.append(indexed_by_group_cols_2)
grouped_by_index = appended.groupby(appended.index)

compare_diff = grouped_by_index.apply(lambda x: x.diff() if len(x) > 1 else x) \
    .dropna().reset_index()

print(compare_diff)

Output:

DF1:

  PR No. resource  hours
0    PN1    Chris    1.0
1    PN2    Julie   80.0
2    PN3     John    2.4
3    PN4    Steve    2.0

DF2:

  PR No. resource  hours
0    PN1    Chris   11.0
1    PN2    Julie   76.0
2    PN8     John    2.4
3    PN9    Jonas    2.0

Result:

  PR No. resource  hours
0    PN1    Chris   10.0
1    PN2    Julie   -4.0
2    PN3     John    2.4
3    PN4    Steve    2.0
4    PN8     John    2.4
5    PN9    Jonas    2.0

Post a Comment for "Comparing Data Frames And Getting The Differences With Python"