Manipulate Values In Pandas Dataframe Columns Based On Matching Ids From Another Dataframe

February 23, 2024 Post a Comment

I have two dataframes like the following examples: import pandas as pd import numpy as np df = pd.DataFrame({'a': ['20', '50', '100'], 'b': [1, np.nan, 1], 'c': [

Solution 1:

You can solve your problem using this instead:

for letter in ['b','c']: # took off enumerate cuz i didn't need it here, maybe you do for the rest of your codedf[letter] = df.apply(lambda row: row[letter] if row['a'] in (df_id[letter].tolist()) else np.nan,axis=1)

just replace isin with in.

The problem is that when you use apply on df, x will represent df rows, so when you select x['a'] you're actually selecting one element.

However, isin is applicable for series or list-like structures which raises the error so instead we just use in to check if that element is in the list.

Hope that was helpful. If you have any questions please ask.

Solution 2:

Adapting a hard-to-find answer from Pandas New Column Calculation Based on Existing Columns Values:

for i, letter in enumerate(['b','c']):
    mask = df['a'].isin(df_id[letter])
    name = letter + '_new'# for some reason, df[letter] = df.loc[mask, letter] does not work
    df.loc[mask, name] = df.loc[mask, letter]
    df[letter] = df[name]
    del df[name]

This isn't pretty, but seems to work.

Solution 3:

If you have a bigger Dataframe and performance is important to you, you can first build a mask df and then apply it to your dataframe. First create the mask:

mask = df_id.apply(lambda x: df['a'].isin(x))
       b      c
0TrueFalse1TrueFalse2FalseTrue

This can be applied to the original dataframe:

df.iloc[:,1:]= df.iloc[:,1:].mask(~mask, np.nan)
     a    b    c0201.0NaN150NaNNaN2100NaN1.0

Python Playground

Manipulate Values In Pandas Dataframe Columns Based On Matching Ids From Another Dataframe

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Manipulate Values In Pandas Dataframe Columns Based On Matching Ids From Another Dataframe"