Skip to content Skip to sidebar Skip to footer

Fastest Way To Iterate Pandas Series/column

I'm more used to for loops but they can become slow in pandas once you get large sets of data. I keep finding iterrows, iter..., etc. examples but want to know if there's a faster

Solution 1:

Just use the vectorised string operation:

newnames = df['name'].str.replace(' ', '_', regex=False).tolist()

Generally, with Pandas, you want to avoid doing loops if possible. There is usually some way to get around doing a loop if you look in the library, so there's some level of syntax research with Pandas (unless what you're looking for is pretty nonstandard).

Basically, if something you want to do would prima facie require a for loop and doing that is probably something that people would want to do regularly, it's probably in the library.

Solution 2:

If you finally want to add the newnames to df, you could do it directly by:

df['newnames'] = df['name'].str.replace(' ', '_')

If you just want to change name column to replace all spaces by _, you can also do it directly on the original column (overwrite it), as follows:

df['name'] = df['name'].str.replace(' ', '_')

In both ways, we are doing it using Pandas' vectorized operation which has been optimized for faster execution, instead of using looping which has not been optimized and is slow.

Post a Comment for "Fastest Way To Iterate Pandas Series/column"