Skip to content Skip to sidebar Skip to footer

Finding Correct Person Among Multiple Person Names

I have a dataframe. In its one column there are single values and in its coresponding column there are subset of values. df = pd.DataFrame() Index Values_1

Solution 1:

Use:

#convert values to list and subtract index by 1 for match by next group
s = df.groupby(level=0)['Values_1'].agg(list)
s.index = s.index - 1
print (s)
Index
0    [Muhammad bin Bashr bin al-Farafsa, Muhammad b...
1    [Yahya bin Sa'id bin Farroukh al-Qatan, Yahya ...
2            [Hamza bin al-Mughira bin Shu'ba, Shu'ba]
Name: Values_1, dtype: object

#replace NaN to emty list
df['test'] = df.index.map(s).map(lambda x: [] if isinstance(x, float) else x)

#test if at least one value match from list from previous group
f = lambda x: any([y in x['Values_2'] for y in x['test']])
mask = df.apply(f, axis=1)

#filter by mask and remove helper column
df = df[mask].drop('test',axis=1)
print (df)
                         Values_1  \
Index                               
1      Muhammad bin Bashar Bindar   

                                                Values_2  
Index                                                     
1      Mua'dh bin Hisham bin Aby [20287], Yahya bin S...  

Post a Comment for "Finding Correct Person Among Multiple Person Names"