Fast Way To Obtain A Random Index From An Array Of Weights In Python
Solution 1:
Cumulative summing and bisect
In any generic case, it seems advisable to calculate the cumulative sum of weights, and use bisect from the bisect module to find a random point in the resulting sorted array
def weighted_choice(weights):
cs = numpy.cumsum(weights)
return bisect.bisect(cs, numpy.random.random() * cs[-1])
if speed is a concern. A more detailed analysis is given below.
Note: If the array is not flat, numpy.unravel_index
can be used to transform a flat index into a shaped index, as seen in https://stackoverflow.com/a/19760118/1274613
Experimental Analysis
There are four more or less obvious solutions using numpy
builtin functions. Comparing all of them using timeit
gives the following result:
import timeit
weighted_choice_functions = [
"""import numpy
wc = lambda weights: numpy.random.choice(
range(len(weights)),
p=weights/weights.sum())
""",
"""import numpy
# Adapted from https://stackoverflow.com/a/19760118/1274613
def wc(weights):
cs = numpy.cumsum(weights)
return cs.searchsorted(numpy.random.random() * cs[-1], 'right')
""",
"""import numpy, bisect
# Using bisect mentioned in https://stackoverflow.com/a/13052108/1274613
def wc(weights):
cs = numpy.cumsum(weights)
return bisect.bisect(cs, numpy.random.random() * cs[-1])
""",
"""import numpy
wc = lambda weights: numpy.random.multinomial(
1,
weights/weights.sum()).argmax()
"""]
for setup in weighted_choice_functions:
for ps in ["numpy.ones(40)",
"numpy.arange(10)",
"numpy.arange(200)",
"numpy.arange(199,-1,-1)",
"numpy.arange(4000)"]:
timeit.timeit("wc(%s)"%ps, setup=setup)
print()
The resulting output is
178.45797914802097161.72161589498864223.53492237901082224.809361800027551901.629826753982315.19778998004039719.98568787699332520.79507007700158320.91911376098869441.650940307998114.24094998504733717.33580147096654419.43371090502478219.5220504060271235.6053614219999926.619582256011220.50128275697352431.27199579699663427.20013752405066243.09768892999273
This means that numpy.random.choice
is surprisingly very slow, and even the dedicated numpy searchsorted
method is slower than the type-naive bisect
variant. (These results were obtained using Python 3.3.5 with numpy 1.8.1, so things may be different for other versions.) The function based on numpy.random.multinomial
is less efficient for large weights than the methods based on cumulative summing. Presumably the fact that argmax has to iterate over the whole array and run comparisons each step plays a significant role, as can be seen as well from the four second difference between an increasing and a decreasing weight list.
Post a Comment for "Fast Way To Obtain A Random Index From An Array Of Weights In Python"