Skip to content Skip to sidebar Skip to footer

Python - Accessing Columns Of A Panda Dataframe Effectively

I was working with Python Pandas for quite a while and now staring at the two commands below thinking what would be the difference between both. df1['Col1'] #Shows only t

Solution 1:

Usually pandas take one index value while selecting the data using [] . Either pass the one column name or pass a list of columns names as one. When you pass two value it will be treated that as a tuple and will search for the same in the dataframe. There are cases tuples are used as column names. Thats the reason why there will be a key error.

You can have a column name like df['Col1','Col2'] = 'x' then this df['Col1','Col2'] will work. To avoid this kind of ambugity there is a need of passing column names more than one as a list.


Solution 2:

Setup

df = pd.DataFrame([[1, 2], [3, 4]], columns=['col1', 'col2'])

In python, [] is syntactic sugar for the __getitem__ method.

This:

df['col1']

0    1
1    3
Name: col1, dtype: int64

Is equivalent to:

df.__getitem__('col1')

0    1
1    3
Name: col1, dtype: int64

And this:

df[['col1', 'col2']]

   col1  col2
0     1     2
1     3     4

Is the same as this:

df.__getitem__(['col1', 'col2'])

   col1  col2
0     1     2
1     3     4

So.... when you do this

df['col1', 'col2']

It's trying to force whatever is there into a single argument and it's the same as

df.__getitem__(('col1', 'col2'))

Which gets you

KeyError: ('col1', 'col2')


Post a Comment for "Python - Accessing Columns Of A Panda Dataframe Effectively"