Skip to content Skip to sidebar Skip to footer

Generate Lexicographic Series Efficiently In Python

I want to generate a lexicographic series of numbers such that for each number the sum of digits is a given constant. It is somewhat similar to 'subset sum problem'. For example if

Solution 1:

Very effective algorithm adapted from Jorg Arndt book "Matters Computational" (Chapter 7.2 Co-lexicographic order for compositions into exactly k parts)

n = 4
k = 3

x = [0] * n
x[0] = k

while True:
    print(x)
    v = x[-1]
    if (k==v ):
        break
    x[-1] = 0
    j = -2while (0==x[j]):
        j -= 1
    x[j] -= 1
    x[j+1] = 1 + v

[3, 0, 0, 0]
[2, 1, 0, 0]
[2, 0, 1, 0]
[2, 0, 0, 1]
[1, 2, 0, 0]
[1, 1, 1, 0]
[1, 1, 0, 1]
[1, 0, 2, 0]
[1, 0, 1, 1]
[1, 0, 0, 2]
[0, 3, 0, 0]
[0, 2, 1, 0]
[0, 2, 0, 1]
[0, 1, 2, 0]
[0, 1, 1, 1]
[0, 1, 0, 2]
[0, 0, 3, 0]
[0, 0, 2, 1]
[0, 0, 1, 2]
[0, 0, 0, 3]

Number of compositions and time on seconds for plain Python (perhaps numpy arrays are faster) for n=100, and k = 2,3,4,5 (2.8 ghz Cel-1840)

250500.04000020027160644531717000.99000144004821784442127520.02204465866089591962520372.03577995300293I expect time2 hours for 100/6 generation

Same with numpy arrays (x = np.zeros((n,), dtype=int)) gives worse results - but perhaps because I don't know how to use them properly

25050 0.0799999237060546931717002.3900032043457034442127554.74532389640808

Native code (this is Delphi, C/C++ compilers might optimize better) generates 100/6 in 21 seconds

3  171700  0.012
4  4421275  0.125
5  91962520  1.544
6  1609344100 20.748

Cannot go sleep until all measurements aren't done :)

MSVS VC++: 18 seconds! (O2 optimization)

5  91962520 1.466
6  1609344100 18.283

So 100 millions variants per second. A lot of time is wasted for checking of empty cells (because fill ratio is small). Speed described by Arndt is reached on higher k/n ratios and is about 300-500 millions variants per second:

n=25, k=152514084066060.981400 millions per second

Solution 2:

My recommendations:

  1. Rewrite it as a generator utilizing yield, rather than a loop that concatenates a global variable on each iteration.
  2. Keep a running sum instead of calculating the sum of some subset of the array representation of the number.
  3. Operate on a single instance of your working number representation instead of splicing a copy of it to a temporary variable on each iteration.

Note no particular order is implied.

Solution 3:

I have a better solution using itertools as follows,

from itertools import product
n = 4#number of elements
s = 3#sum of elements
r = []
for x inrange(n):
    r.append(x)
result = [p for p in product(r, repeat=n) ifsum(p) == s]
print(len(result))
print(result)

I am saying this is better because it took 0.1 secs on my system, while your code with numpy took 0.2 secs.

enter image description here

enter link description here

But as far as n=100 and s=6, this code takes time to go through all the combinations, I think it will take days to compute the results.

Solution 4:

I found a solution using itertools as well (Source: https://bugs.python.org/msg144273). Code follows:

import itertools
import operator

defcombinations_with_replacement(iterable, r):
    # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
    pool = tuple(iterable)
    n = len(pool)
    ifnot n and r:
        return
    indices = [0] * r
    yieldtuple(pool[i] for i in indices)
    whileTrue:
        for i inreversed(range(r)):
            if indices[i] != n - 1:
                breakelse:
            return
        indices[i:] = [indices[i] + 1] * (r - i)
        yieldtuple(pool[i] for i in indices)

int_part = lambda n, k: (tuple(map(c.count, range(k))) for c in combinations_with_replacement(range(k), n))
for item in int_part(3,4): print(item)

Post a Comment for "Generate Lexicographic Series Efficiently In Python"