Skip to content Skip to sidebar Skip to footer

How To Drop Rows Based On Timestamp Where Hours Are Not In List

I have a large dataframe (several million rows) where one of my columns is a timestamp (labeled 'Timestamp') in the format 'hh:mm:ss' e.g. '07:00:04'. I want to drop the rows where

Solution 1:

First you can give your DataFrame a proper DatetimeIndex as follows:

dtidx = pd.DatetimeIndex(df['Date'].astype(str) + ' ' + df['Timestamp'].astype(str))
df.index = dtidx

and then use between_time to get the hours between hours 07 and 21 inclusive:

df.between_time('07:00', '22:00')
# returnsDateTimestampClose2018-01-0207:05:002018010207:05:00129262018-01-0221:05:022018010221:05:0212925.52018-01-0307:05:072018010307:05:0712925.8

Solution 2:

Since you mentioned about slicing and someone already mentioned about how to go with it, I would like to introduce you to extracting the hour using dt.hour

First convert your date with type string to date with type datetime:

df['date'] = pd.to_datetime(df['date'])

You can now easily extract the hour part using dt.hour:

df['hour'] = df['date'].dt.hour

You can also extract year, month, second, and so on in a similar way.

Now you can do normal filtering as you would do with other dataframes:

df[(df.hour >= 7) & (df.hour <= 21)]

Solution 3:

I prefer the other answers which work with proper timestamp data types, but since you mentioned trying and failing with a string slicing method, it might be helpful for you to see a solution using string slicing that does work:

df['Hour'] = df['Timestamp'].str.slice(0, 2).astype(int)
df[(df['Hour'] >= 7) & (df['Hour'] <= 21)]

The first line creates a new integer column from the slice of the string which represents the hour, and the second line filters on said new column.

DateTimestampCloseHour02018010207:05:0012925.979712018010221:05:0212925.4792152018010307:05:0712925.7807

Solution 4:

My guess would be to use pd.between_time.

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df.set_index('Timestamp').between_time('07:00:00', '21:59:59')
TimestampDateClose2019-07-2207:05:002018010212925.9792019-07-2221:05:022018010212925.4792019-07-2207:05:072018010312925.78

Post a Comment for "How To Drop Rows Based On Timestamp Where Hours Are Not In List"