Search A List Of List Of Strings For A List Of Strings In Python Efficiently
Solution 1:
Supposing that you might have, more than one "control" item in your L1 sublists.
I'd do it using set()
and itertools.product()
:
from itertools import product
defgenerate_edges(iterable, control):
edges = []
control_set = set(control)
for e in iterable:
e_set = set(e)
common = e_set & control_set
to_pair = e_set - common
edges.extend(product(to_pair, common))
return edges
Example:
>>> L1 = [["cat","dog","apple"],
... ["orange","green","red"],
... ["hand","cat","red"]]
>>> L2 = ["cat","red"]
>>> generate_edges(L1, L2)
[('apple', 'cat'),
('dog', 'cat'),
('orange', 'red'),
('green', 'red'),
('hand', 'red'),
('hand', 'cat')]
Solution 2:
I'd suggest transforming them all to set
s and using set operations (intersection) to figure out what terms from L2 are in each L1 item. You can then use set subtraction to get the list of items you need to pair.
edges = []
L2set = set(L2)
for L1item in L1:
L1set = set(L1item)
items_in_L1item = L1set & L2set
for item in items_in_L1item:
items_to_pair = L1set - set([item])
edges.extend((item, i) for i in items_to_pair)
Solution 3:
To make this code optimal even if L1
and L2
are huge, use izip
that produces a generator instead of creating a huge list of tuples. If you're working in Python3, just use zip
.
from itertools import izip
pairs = []
for my_list, elem in izip(L1, L2):
if elem in my_list:
pairs += [(elem, e) for e in my_list if e!=elem]
printpairs
The code is very comprehesible, it's almost pure english! First, you're looping over each list and its corresponding element, then you're asking if the element is inside the list, if it is, print all pairs except the pair (x, x).
Output:
[('cat', 'dog'), ('cat', 'apple'), ('red', 'orange'), ('red', 'green')]
Solution 4:
If L1 is very large you might want to look into using bisect. It requires that yo flatten and sort L1 first. You could do something like:
from bisect import bisect_left, bisect_right
from itertools import chain
L1=[["cat","dog","apple"],["orange","green","red","apple"]]
L2=["apple", "cat","red"]
M1 = [[i]*len(j) for i, j inenumerate(L1)]
M1 = list(chain(*M1))
L1flat = list(chain(*L1))
I = sorted(range(len(L1flat)), key=L1flat.__getitem__)
L1flat = [L1flat[i] for i in I]
M1 = [M1[i] for i in I]
for item in L2:
s = bisect_left(L1flat, item)
e = bisect_right(L1flat, item)
print item, M1[s:e]
#apple [0, 1]#cat [0]#red [1]
Post a Comment for "Search A List Of List Of Strings For A List Of Strings In Python Efficiently"