Skip to content Skip to sidebar Skip to footer

Search A List Of List Of Strings For A List Of Strings In Python Efficiently

I have a list of list of strings and a list of strings. for example: L1=[['cat','dog','apple'],['orange','green','red']] L2=['cat','red'] if L1[i] contains any item from L2 I need

Solution 1:

Supposing that you might have, more than one "control" item in your L1 sublists.

I'd do it using set() and itertools.product():

from itertools import product

defgenerate_edges(iterable, control):
    edges = []
    control_set = set(control)
    for e in iterable:
        e_set = set(e)
        common = e_set & control_set
        to_pair = e_set - common
        edges.extend(product(to_pair, common))
    return edges

Example:

>>> L1 = [["cat","dog","apple"],
...       ["orange","green","red"],
...       ["hand","cat","red"]]
>>> L2 = ["cat","red"]
>>> generate_edges(L1, L2)
[('apple', 'cat'),
 ('dog', 'cat'),
 ('orange', 'red'),
 ('green', 'red'),
 ('hand', 'red'),
 ('hand', 'cat')]

Solution 2:

I'd suggest transforming them all to sets and using set operations (intersection) to figure out what terms from L2 are in each L1 item. You can then use set subtraction to get the list of items you need to pair.

edges = []
L2set = set(L2)
for L1item in L1:
    L1set = set(L1item)
    items_in_L1item = L1set & L2set
    for item in items_in_L1item:
        items_to_pair = L1set - set([item])
        edges.extend((item, i) for i in items_to_pair)

Solution 3:

To make this code optimal even if L1 and L2 are huge, use izip that produces a generator instead of creating a huge list of tuples. If you're working in Python3, just use zip.

from itertools import izip

pairs = []
for my_list, elem in izip(L1, L2):
    if elem in my_list:
        pairs += [(elem, e) for e in my_list if e!=elem]
printpairs

The code is very comprehesible, it's almost pure english! First, you're looping over each list and its corresponding element, then you're asking if the element is inside the list, if it is, print all pairs except the pair (x, x).

Output:

[('cat', 'dog'), ('cat', 'apple'), ('red', 'orange'), ('red', 'green')]

Solution 4:

If L1 is very large you might want to look into using bisect. It requires that yo flatten and sort L1 first. You could do something like:

from bisect import bisect_left, bisect_right
from itertools import chain

L1=[["cat","dog","apple"],["orange","green","red","apple"]]
L2=["apple", "cat","red"]

M1 = [[i]*len(j) for i, j inenumerate(L1)]
M1 = list(chain(*M1))
L1flat = list(chain(*L1))
I = sorted(range(len(L1flat)), key=L1flat.__getitem__)
L1flat = [L1flat[i] for i in I]
M1 = [M1[i] for i in I]

for item in L2:
    s = bisect_left(L1flat, item)
    e = bisect_right(L1flat, item)
    print item, M1[s:e]

#apple [0, 1]#cat [0]#red [1]

Post a Comment for "Search A List Of List Of Strings For A List Of Strings In Python Efficiently"