Parse Multiple Date Formats Into A Single Format
I have one column called published (date). As you can see, it has multiple date formats and also nan values. I would like to skip nan values, convert all the other formats to %Y-%-
Solution 1:
First was added 2 new formats to fmt
list:
fmt=['%Y-%m-%d', '%d-%m-%Y', '%d/%m/%Y',
'%Y-%d-%m', '%Y-%d-%b', '%d-%b-%Y', '%d/%b/%Y','Year: %d; month',
'month: %d;Year','%Y','%b %d %Y','%b %Y %d',
'%Y %b %d', '%Y %b']
Then in list comprehension convert column to datetimes, parameter errors='coerce'
is for non matched values to missing values. Last join together by concat
.
Last because possible multiple values per rows because dd/mm/YYYY
vs mm/dd/YYYY
formats (not sure if month of day) is used back filling with select first column. It means what format is first in list it is selected with high priority.
dfs = [pd.to_datetime(df['publish_time'], format=f, errors='coerce') for f in fmt]
df['publish_time1']= pd.concat(dfs, axis=1).bfill(axis=1).iloc[:, 0]
print (df)
publish_time publish_time1
0 2014 Jul 22 2014-07-22
1 2003 Aug 2003-08-01
2 2019 Nov 26 2019-11-26
3 2012-12-07 2012-12-07
4 2020 Jan 21 2020-01-21
5 2015-01-01 2015-01-01
6 2010-11-30 2010-11-30
7 2007-05-10 2007-05-10
8 2020 2020-01-01
9 2012-02-29 2012-02-29
10 2016 Apr 19 2016-04-19
11 2006-12-31 2006-12-31
12 2013 Jun 27 2013-06-27
13 2019 Jun 19 2019-06-19
14 2015 Jun 12 2015-06-12
15 2006 Jun-Dec NaT
16 2006-07-31 2006-07-31
17 NaN NaT
18 2017-04-15 2017-04-15
19 2016 May 22 2016-05-22
20 2020 Feb 2020-02-01
21 2017 May 6 2017-05-06
22 2020 Mar 11 2020-03-11
23 2013-04-30 2013-04-30
24 2020-03-07 2020-03-07
25 NaN NaT
26 2018 2018-01-01
Post a Comment for "Parse Multiple Date Formats Into A Single Format"