How To Get All Intersections Of Sets In Python Fast
Solution 1:
Here is a recursive solution. It is almost instantaneous on your test example:
defallIntersections(frozenSets):
iflen(frozenSets) == 0:
return []
else:
head = frozenSets[0]
tail = frozenSets[1:]
tailIntersections = allIntersections(tail)
newIntersections = [head]
newIntersections.extend(tailIntersections)
newIntersections.extend(head & s for s in tailIntersections)
returnlist(set(newIntersections))
defall_intersections(lists):
sets = allIntersections([frozenset(s) for s in lists])
return [list(s) for s in sets]
On Edit Here is a cleaner, nonrecursive implementation of the same ideas.
The problem is easiest if you define the intersection of an empty collection of sets to be the universal set, and an adequate universal set can be obtained by taking the union of all elements. This is a standard move in lattice-theory, and is dual to taking the union of an empty collection of sets to be the empty set. You could always throw away this universal set if you don't want it:
defallIntersections(frozenSets):
universalSet = frozenset.union(*frozenSets)
intersections = set([universalSet])
for s in frozenSets:
moreIntersections = set(s & t for t in intersections)
intersections.update(moreIntersections)
return intersections
defall_intersections(lists):
sets = allIntersections([frozenset(s) for s in lists])
return [list(s) for s in sets]
The reason that this is so fast with your test example is that, even though your collection has 24 sets, hence having 2**24 (16.8 million) potential intersections, there are in fact only 242 (or 241 if you don't count the empty intersection) distinct intersections. Thus the number of intersections in each pass through the loop is in the low hundreds at most.
It is possible to pick 24 sets so that all of the 2**24 possible intersections are in fact different, so it is easy to see that the worst-case behavior is exponential. But if, as in your test example, the number of intersections is small, this approach will allow you to rapidly compute them.
A potential optimization might be to sort the sets in increasing size before you loop over them. Processing the smaller sets up front might result in more empty intersections appearing earlier, thus keeping the total number of distinct intersections smaller until towards the end of the loop.
Solution 2:
Iterative solution that takes about 3.5 ms on my machine for your large test input:
from itertools import starmap, product
from operator import and_
defall_intersections(sets):
# Convert to set of frozensets for uniquification/type correctness
last = new = sets = set(map(frozenset, sets))
# Keep going until further intersections add nothing to resultswhile new:
# Compute intersection of old values with newly found values
new = set(starmap(and_, product(last, new)))
last = sets.copy() # Save off prior state
new -= last # Determine truly newly added values
sets |= new # Accumulate newly added values in complete set# No more intersections being generated, convert results to canonical# form, list of lists, where each sublist is displayed in order, and# the top level list is ordered first by size of sublist, then by contentsreturnsorted(map(sorted, sets), key=lambda x: (len(x), x))
Basically, it just keeps doing two way intersections among the old result set and the newly found intersections until a round of intersections doesn't change anything, then it's done.
Note: This is not actually the best solution (recursion is sufficiently better algorithmically to win on the test data, where John Coleman's solution, after adding sorting to the outer wrapper so it matches format, takes about 0.94 ms, vs. 3.5 ms for mine). I'm mostly providing it as an example of solving the problem in other ways.
Post a Comment for "How To Get All Intersections Of Sets In Python Fast"