Is Searchsorted Faster Than Get_loc To Find Label Location In A DataFrame Index?
I need to find the integer location for a label in a Pandas index. I know I can use get_loc method, but then I discovered searchsorted. Just wondering if I should use the latter fo
Solution 1:
It will depend on your usecase. using @ayhan's example.
With get_loc
there is a big upfront cost of creating the hash table on the first lookup.
In [22]: idx = pd.Index(['R{0:07d}'.format(i) for i in range(10**7)])
In [23]: to_search = np.random.choice(idx, 10**5, replace=False)
In [24]: %time idx.get_loc(to_search[0])
Wall time: 1.57 s
But, subsequent lookups may be faster. (not guaranteed, depends on data)
In [9]: %%time
...: for i in to_search:
...: idx.get_loc(i)
Wall time: 200 ms
In [10]: %%time
...: for i in to_search:
...: np.searchsorted(idx, i)
Wall time: 486 ms
Also, as Jeff noted, get_loc
is guaranteed to always work, where searchsorted
requires monotonicity (and doesn't check).
Post a Comment for "Is Searchsorted Faster Than Get_loc To Find Label Location In A DataFrame Index?"