Skip to content Skip to sidebar Skip to footer

Pandas: "distribute" Column Values Into Multiple Rows

I am a pandas newbie, and I am trying to solve the following problem. I have a large DataFrame (10000 x 28) as follows. Col1 Col2 Col3 Col4 Col5 A B C D E How can I r

Solution 1:

Set the ['Col1, 'Col2] as index and use .stack().

df.set_index(['Col1', 'Col2']).stack()

Col1  Col2   
A     B     0    C
            0    D
            0    E

Then do .reset_index() to format as in your example (you can also add name='Col' for the same result as suggested by @jezrael:

df.reset_index(-1, drop=True).reset_index(name='Col')

  Col1 Col2  0
0    A    B  C
1    A    B  D
2    A    B  E

Solution 2:

You can use melt and drop:

print pd.melt(df, id_vars=['Col1','Col2'],value_name='Col').drop('variable', axis=1)
  Col1 Col2 Col
0    A    B   C
1    A    B   D
2    A    B   E

Timings:

df = pd.concat([df]*1000).reset_index(drop=True)

In [58]: %timeit pd.melt(df, id_vars=['Col1','Col2'],value_name='Col').drop('variable', axis=1)
100 loops, best of 3: 2.48 ms per loop

In [59]: %timeit df.set_index(['Col1', 'Col2']).stack().reset_index(-1, drop=True).reset_index(name='Col')
100 loops, best of 3: 3.83 ms per loop

Post a Comment for "Pandas: "distribute" Column Values Into Multiple Rows"