Skip to content Skip to sidebar Skip to footer

Adding 1 Hours To Time Stamp Columns In Pyspark Data Frame

In pyspark I have a column called test_time. This is a timestamp column. The column has a records like below. 2017-03-12 03:19:51.0 2017-03-12 03:29:51.0 Now I want to add 1 hour

Solution 1:

Should be very easy once you convert it to a UTC timestamp. Here is one way to do it :

from pyspark.sql.functions import to_utc_timestamp,from_utc_timestamp
from datetime import timedelta

## Create a dummy dataframe
df = sqlContext.createDataFrame([('1997-02-28 10:30:00',)], ['t'])

## Add column to convert time to utc timestamp in PST
df2 = df.withColumn('utc_timestamp',to_utc_timestamp(df.t,"PST"))

## Add one hour with the timedelta function
df3 = df2.map(lambda x: (x.t,x.utc_timestamp+timedelta(hours=1))).toDF(['t','new_utc_timestamp'])

## Convert back to original time zone and format
df4 = df3.withColumn('new_t',from_utc_timestamp(df3.new_utc_timestamp,"PST"))

The "new_t" column in df4 is your required column converted back to the appropriate time zone according to your system.


Solution 2:

The correct way to do this in pyspark is:

from pyspark.sql.functions expr
df = df.withColumn("test_time_plus_hour", df['test_time'] + expr('INTERVAL 1 HOURS'))

Post a Comment for "Adding 1 Hours To Time Stamp Columns In Pyspark Data Frame"