Pandas | Merge Rows With Same Id

March 31, 2024 Post a Comment

Here is the example data set id firstname lastname email update date A1 wendy smith ws@mail.com 2018-01-02 A1 w

Solution 1:

Use GroupBy.ffill to only forward fill for the same group. Then use drop_duplicates:

df['lastname'] = df.groupby('id')['lastname'].ffill()
df = df.drop_duplicates('id', keep='last')

Or in one line (but less readable in my opinion), using assign:

df.assign(lastname=df.groupby('id')['lastname'].ffill()).drop_duplicates('id', keep='last')

Output

   id firstname lastname              email updatedate1  A1     wendy    smith     smith@mail.com  2019-02-033  A2     harry     lynn  harylynn@mail.com  2019-03-125  A3     tinna   dickey     tinna@mail.com  2013-06-126  A4       Tom      Lee       Tom@mail.com  2012-06-127  A5      Ella      NaN      Ella@mail.com  2019-07-128  A6       Ben     Lang       Ben@mail.com  2019-03-12

Solution 2:

Use

DataFrame.groupby - Group DataFrame or Series using a mapper or by a Series of columns.
.groupby.GroupBy.last - Compute last of group values.
DataFrame.replace - Replace values given in to_replace with value.

Ex.

Baca Juga

df = df.replace('',np.nan, regex=True)
df1 = df.groupby('id',as_index=False,sort=False).last()
print(df1)

   id firstname lastname              email  updatedate
0  A1     wendy    smith     smith@mail.com  2019-02-031  A2     harry     lynn  harylynn@mail.com  2019-03-122  A3     tinna   dickey     tinna@mail.com  2013-06-123  A4       Tom      Lee       Tom@mail.com  2012-06-124  A5      Ella      NaN      Ella@mail.com  2019-07-125  A6       Ben     Lang       Ben@mail.com  2019-03-12

Solution 3:

Try this:

df.groupby('id').ffill().drop_duplicates('id', keep='last')

output:

   id firstname lastname              email  updatedate1  A1     wendy    smith     smith@mail.com  2019-02-033  A2     harry     lynn  harylynn@mail.com   2019-03-125  A3     tinna   dickey     tinna@mail.com   2013-06-126  A4       Tom      Lee       Tom@mail.com   2012-06-127  A5      Ella      NaN      Ella@mail.com   2019-07-128  A6       Ben     Lang       Ben@mail.com   2019-03-12

Solution 4:

Use a combination of groupby, apply, and iloc:

df.groupby('id', as_index=False).apply(lambda x: x.fillna(method='ffill').iloc[0])

   id firstname lastname              email  updatedate0  A1     wendy    smith     smith@mail.com  2019-02-031  A2     harry     lynn  harylynn@mail.com  2019-03-122  A3     tinna   dickey     tinna@mail.com  2019-03-123  A4       Tom      Lee       Tom@mail.com  2019-06-124  A5      Ella      NaN      Ella@mail.com  2019-07-125  A6       Ben     Lang       Ben@mail.com  2019-03-12