Skip to content Skip to sidebar Skip to footer

Parse Multiple Date Formats Into A Single Format

I have one column called published (date). As you can see, it has multiple date formats and also nan values. I would like to skip nan values, convert all the other formats to %Y-%-

Solution 1:

First was added 2 new formats to fmt list:

fmt=['%Y-%m-%d', '%d-%m-%Y', '%d/%m/%Y',
     '%Y-%d-%m', '%Y-%d-%b', '%d-%b-%Y', '%d/%b/%Y','Year: %d; month',
     'month: %d;Year','%Y','%b %d %Y','%b %Y %d',
     '%Y %b %d', '%Y %b']

Then in list comprehension convert column to datetimes, parameter errors='coerce' is for non matched values to missing values. Last join together by concat.

Last because possible multiple values per rows because dd/mm/YYYY vs mm/dd/YYYY formats (not sure if month of day) is used back filling with select first column. It means what format is first in list it is selected with high priority.

dfs = [pd.to_datetime(df['publish_time'], format=f, errors='coerce') for f in fmt]
df['publish_time1']= pd.concat(dfs, axis=1).bfill(axis=1).iloc[:, 0]

print (df)
    publish_time publish_time1
0    2014 Jul 22    2014-07-22
1       2003 Aug    2003-08-01
2    2019 Nov 26    2019-11-26
3     2012-12-07    2012-12-07
4    2020 Jan 21    2020-01-21
5     2015-01-01    2015-01-01
6     2010-11-30    2010-11-30
7     2007-05-10    2007-05-10
8           2020    2020-01-01
9     2012-02-29    2012-02-29
10   2016 Apr 19    2016-04-19
11    2006-12-31    2006-12-31
12   2013 Jun 27    2013-06-27
13   2019 Jun 19    2019-06-19
14   2015 Jun 12    2015-06-12
15  2006 Jun-Dec           NaT
16    2006-07-31    2006-07-31
17           NaN           NaT
18    2017-04-15    2017-04-15
19   2016 May 22    2016-05-22
20      2020 Feb    2020-02-01
21    2017 May 6    2017-05-06
22   2020 Mar 11    2020-03-11
23    2013-04-30    2013-04-30
24    2020-03-07    2020-03-07
25           NaN           NaT
26          2018    2018-01-01

Post a Comment for "Parse Multiple Date Formats Into A Single Format"