Pandas | Merge Rows With Same Id
Here is the example data set id firstname lastname email update date A1 wendy smith ws@mail.com 2018-01-02 A1 w
Solution 1:
Use GroupBy.ffill
to only forward fill for the same group. Then use drop_duplicates
:
df['lastname'] = df.groupby('id')['lastname'].ffill()
df = df.drop_duplicates('id', keep='last')
Or in one line (but less readable in my opinion), using assign
:
df.assign(lastname=df.groupby('id')['lastname'].ffill()).drop_duplicates('id', keep='last')
Output
id firstname lastname email updatedate1 A1 wendy smith smith@mail.com 2019-02-033 A2 harry lynn harylynn@mail.com 2019-03-125 A3 tinna dickey tinna@mail.com 2013-06-126 A4 Tom Lee Tom@mail.com 2012-06-127 A5 Ella NaN Ella@mail.com 2019-07-128 A6 Ben Lang Ben@mail.com 2019-03-12
Solution 2:
Use
DataFrame.groupby
- Group DataFrame or Series using a mapper or by a Series of columns..groupby.GroupBy.last
- Compute last of group values.DataFrame.replace
- Replace values given in to_replace with value.
Ex.
df = df.replace('',np.nan, regex=True)
df1 = df.groupby('id',as_index=False,sort=False).last()
print(df1)
id firstname lastname email updatedate
0 A1 wendy smith smith@mail.com 2019-02-031 A2 harry lynn harylynn@mail.com 2019-03-122 A3 tinna dickey tinna@mail.com 2013-06-123 A4 Tom Lee Tom@mail.com 2012-06-124 A5 Ella NaN Ella@mail.com 2019-07-125 A6 Ben Lang Ben@mail.com 2019-03-12
Solution 3:
Try this:
df.groupby('id').ffill().drop_duplicates('id', keep='last')
output:
id firstname lastname email updatedate1 A1 wendy smith smith@mail.com 2019-02-033 A2 harry lynn harylynn@mail.com 2019-03-125 A3 tinna dickey tinna@mail.com 2013-06-126 A4 Tom Lee Tom@mail.com 2012-06-127 A5 Ella NaN Ella@mail.com 2019-07-128 A6 Ben Lang Ben@mail.com 2019-03-12
Solution 4:
Use a combination of groupby
, apply
, and iloc
:
df.groupby('id', as_index=False).apply(lambda x: x.fillna(method='ffill').iloc[0])
id firstname lastname email updatedate0 A1 wendy smith smith@mail.com 2019-02-031 A2 harry lynn harylynn@mail.com 2019-03-122 A3 tinna dickey tinna@mail.com 2019-03-123 A4 Tom Lee Tom@mail.com 2019-06-124 A5 Ella NaN Ella@mail.com 2019-07-125 A6 Ben Lang Ben@mail.com 2019-03-12
groupby
groups the dataframe by unique idsfillna
fills all the NaN values with the row with non-NaN valuesiloc[-1]
gets you the row with the latest data
Post a Comment for "Pandas | Merge Rows With Same Id"