Modify Function To Return Dataframe With Specified Values
With reference to the test data below and the function I use to identify values within variable thresh of each other. Can anyone please help me modify this to show the desired out
Solution 1:
use mask and sub with axis=1
df2.mask(df2.sub(df2.apply(closeCols2, 1), 0).abs() > thresh)
AAA BBB CCC DDD EEE
0 NaN NaN 100 98 103
1 NaN NaN 50 50 50
2 NaN 30.0 25 25 25
3 7.0 NaN 10 10 10
4 9.0 11.0 10 10 10
5 10.0 10.0 11 11 11
note:
I'd redefine closeCols to include thresh as a parameter. Then you could pass it in the apply call.
def closeCols2(df, thresh):
max_value = None
for k1,k2 in combinations(df.keys(),2):
if abs(df[k1] - df[k2]) < thresh:
if max_value is None:
max_value = max(df[k1],df[k2])
else:
max_value = max(max_value, max(df[k1],df[k2]))
return max_value
df2.apply(closeCols2, 1, thresh=5)
extra credit
I vectorized and embedded your closeCols for some mind numbing fun.
Notice there is no apply
numpybroadcasting to get all combinations of columns subtracted from each other.np.abs<= 5sum(-1)I arranged the broadcasting such that the difference of say row0, columnAAAwith all of row0will be laid out across the last dimension.-1in thesum(-1)says to sum across last dimension.<= 1all values are less than 5 away from themselves. So I want the sum of these to be greater than 1. Thus, we mask all less than or equal to one.
v = df2.values
df2.mask((np.abs(v[:, :, None] - v[:, None]) <= 5).sum(-1) <= 1)
AAA BBB CCC DDD EEE
0 NaN NaN 100 98 103
1 NaN NaN 50 50 50
2 NaN 30.0 25 25 25
3 7.0 NaN 10 10 10
4 9.0 11.0 10 10 10
5 10.0 10.0 11 11 11
Post a Comment for "Modify Function To Return Dataframe With Specified Values"