Modify Function To Return Dataframe With Specified Values
With reference to the test data below and the function I use to identify values within variable thresh of each other. Can anyone please help me modify this to show the desired out
Solution 1:
use mask
and sub
with axis=1
df2.mask(df2.sub(df2.apply(closeCols2, 1), 0).abs() > thresh)
AAA BBB CCC DDD EEE
0 NaN NaN 100 98 103
1 NaN NaN 50 50 50
2 NaN 30.0 25 25 25
3 7.0 NaN 10 10 10
4 9.0 11.0 10 10 10
5 10.0 10.0 11 11 11
note:
I'd redefine closeCols
to include thresh
as a parameter. Then you could pass it in the apply
call.
def closeCols2(df, thresh):
max_value = None
for k1,k2 in combinations(df.keys(),2):
if abs(df[k1] - df[k2]) < thresh:
if max_value is None:
max_value = max(df[k1],df[k2])
else:
max_value = max(max_value, max(df[k1],df[k2]))
return max_value
df2.apply(closeCols2, 1, thresh=5)
extra credit
I vectorized and embedded your closeCols
for some mind numbing fun.
Notice there is no apply
numpy
broadcasting to get all combinations of columns subtracted from each other.np.abs
<= 5
sum(-1)
I arranged the broadcasting such that the difference of say row0
, columnAAA
with all of row0
will be laid out across the last dimension.-1
in thesum(-1)
says to sum across last dimension.<= 1
all values are less than 5 away from themselves. So I want the sum of these to be greater than 1. Thus, we mask all less than or equal to one.
v = df2.values
df2.mask((np.abs(v[:, :, None] - v[:, None]) <= 5).sum(-1) <= 1)
AAA BBB CCC DDD EEE
0 NaN NaN 100 98 103
1 NaN NaN 50 50 50
2 NaN 30.0 25 25 25
3 7.0 NaN 10 10 10
4 9.0 11.0 10 10 10
5 10.0 10.0 11 11 11
Post a Comment for "Modify Function To Return Dataframe With Specified Values"