Groupby And Resample Timeseries So Date Ranges Are Consistent
I have a dataframe which is basically several timeseries stacked on top of one another. Each time series has a unique label (group) and they have different date ranges. date = pd.t
Solution 1:
Another way:
import pandas as pd
from itertools import product
date = pd.to_datetime(pd.Series(['2010-01-01', '2010-01-02', '2010-01-03',
'2010-01-06', '2010-01-01', '2010-01-03']))
group = [1,1,1,1, 2, 2]
value = [1,2,3,4,5,6]
df = pd.DataFrame({'date':date, 'group':group, 'value':value})
dates = pd.date_range(df.date.min(), df.date.max())
groups = df.group.unique()
df = (pd.DataFrame(list(product(dates, groups)), columns=['date', 'group'])
.merge(df, on=['date', 'group'], how='left')
.sort_values(['group', 'date'])
.reset_index(drop=True))
df# date group value#0 2010-01-01 1 1.0#1 2010-01-02 1 2.0#2 2010-01-03 1 3.0#3 2010-01-04 1 NaN#4 2010-01-05 1 NaN#5 2010-01-06 1 4.0#6 2010-01-01 2 5.0#7 2010-01-02 2 NaN#8 2010-01-03 2 6.0#9 2010-01-04 2 NaN#10 2010-01-05 2 NaN#11 2010-01-06 2 NaN
Solution 2:
Credit to zipa for getting the dates correct. I've edited my post to correct my mistake.
Set the index then use pandas.MultiIndex.from_product
to produce the Cartesian product of values. I also use fill_value=0
to fill in those missing values.
d=df.set_index(['date','group'])midx=pd.MultiIndex.from_product(
[pd.date_range(df.date.min(), df.date.max()), df.group.unique()],names=d.index.names)d.reindex(midx,fill_value=0).reset_index()dategroupvalue02010-01-01 1112010-01-01 2522010-01-02 1232010-01-02 2042010-01-03 1352010-01-03 2662010-01-04 1072010-01-04 2082010-01-05 1092010-01-05 20102010-01-06 14112010-01-06 20
Or
d=df.set_index(['date','group'])midx=pd.MultiIndex.from_product(
[pd.date_range(df.date.min(), df.date.max()), df.group.unique()],names=d.index.names)d.reindex(midx).reset_index()dategroupvalue02010-01-01 11.012010-01-01 25.022010-01-02 12.032010-01-02 2NaN42010-01-03 13.052010-01-03 26.062010-01-04 1NaN72010-01-04 2NaN82010-01-05 1NaN92010-01-05 2NaN102010-01-06 14.0112010-01-06 2NaN
Another dance we could do is a cleaned up version of OP's attempt. Again I use fill_value=0
to fill in missing values. We could leave that out to produce the NaN
.
df.set_index(['date','group'])\.unstack(fill_value=0)\.asfreq('D',fill_value=0)\.stack().reset_index()dategroupvalue02010-01-01 1112010-01-01 2522010-01-02 1232010-01-02 2042010-01-03 1352010-01-03 2662010-01-04 1072010-01-04 2082010-01-05 1092010-01-05 20102010-01-06 14112010-01-06 20
Or
df.set_index(['date','group'])\.unstack()\.asfreq('D')\.stack(dropna=False).reset_index()dategroupvalue02010-01-01 11.012010-01-01 25.022010-01-02 12.032010-01-02 2NaN42010-01-03 13.052010-01-03 26.062010-01-04 1NaN72010-01-04 2NaN82010-01-05 1NaN92010-01-05 2NaN102010-01-06 14.0112010-01-06 2NaN
Post a Comment for "Groupby And Resample Timeseries So Date Ranges Are Consistent"