How To Fill In Rows With Repeating Data In Pandas?

October 07, 2024 Post a Comment

In R, when adding new data of unequal length to a data frame, the values repeat to fill the data frame: df <- data.frame(first=c(1,2,3,4,5,6)) df$second <- c(1,2,3) yielding

Solution 1:

The cycle method from itertools is good for repeating a common pattern.

from itertools import cycle

seq = cycle([1, 2, 3])
df['Seq'] = [next(seq) for count inrange(df.shape[0])]

Solution 2:

Seems there is no elegant way. This is the workaround I just figured out. Basically create a repeating list just bigger than original dataframe, and then left join them.

import pandas
df = pandas.DataFrame(range(100), columns=['first'])
repeat_arr = [1, 2, 3]
df = df.join(pandas.DataFrame(repeat_arr * (len(df)/len(repeat_arr)+1),
    columns=['second']))

Solution 3:

import pandas as pd
import numpy as np

defput(df, column, values):
    df[column] = 0
    np.put(df[column], np.arange(len(df)), values)

df = pd.DataFrame({'first':range(1, 8)})    
put(df, 'second', [1,2,3])

yields

firstsecond011122233341452563671

Not particularly beautiful, but one "feature" it possesses is that you do not have to worry if the length of the DataFrame is a multiple of the length of the repeated values. np.put repeats the values as necessary.

My first answer was:

import itertools as IT
df['second'] = list(IT.islice(IT.cycle([1,2,3]), len(df)))

but it turns out this is significantly slower:

In [312]: df = pd.DataFrame({'first':range(10**6)})

In [313]: %timeit df['second'] = list(IT.islice(IT.cycle([1,2,3]), len(df)))
10 loops, best of 3: 143 ms per loop

In [316]: %timeit df['second'] = 0; np.put(df['second'], np.arange(N), [1,2,3])
10 loops, best of 3: 27.9 ms per loop

Solution 4:

How general of a solution are you looking for? I tried to make this a little less hard-coded:

import numpy as np
import pandas 

df = pandas.DataFrame(np.arange(1,7), columns=['first'])

base = [1, 2, 3]
df['second'] = base * (df.shape[0]/len(base))
print(df.to_string())


   first  second
011122233341452563

Solution 5:

In my case I needed to repeat the values without knowing the length of the sub-list, i.e. checking the length of every group. This was my solution:

import numpy as np
import pandas 

df = pandas.DataFrame(['a','a','a','b','b','b','b'], columns=['first'])

list = df.groupby('first').apply(lambda x: range(len(x))).tolist()
loop = [val for sublist inlistfor val in sublist]
df['second']=loop

df
  first  second
0     a       01     a       12     a       23     b       04     b       15     b       26     b       3

Python Playground