Skip to content Skip to sidebar Skip to footer

Find A Repeating Pattern In A List Of Strings

I'm looking for a way to clean strings from their longest repeating pattern. I have a list of approximately 1000 web pages titles, and they all share a common suffix, which is the

Solution 1:

Here's a solution using the os.path.commonprefix function on the reversed titles:

titles = ['art gallery - museum and visits | expand knowledge',
 'lasergame - entertainment | expand knowledge',
 'coffee shop - confort and food | expand knowledge',
]

# Find the longest common suffix by reversing the strings and using a # library function to find the common "prefix".common_suffix = os.path.commonprefix([title[::-1] for title in titles])[::-1]

# Strips all titles from the number of characters in the common suffix.stripped_titles = [title[:-len(common_suffix)] for title in titles]

Result:

['art gallery - museum and visits', 'lasergame - entertainment', 'coffee shop - confort and food']

Because it finds the common suffix by itself, it should work on any group of titles, even if you don't know the suffix.

Solution 2:

If you actually know the suffix you want to strip, you could simply do:

suffix = " | expand knowledge"your_list = ['art gallery - museum and visits | expand knowledge',
 'lasergame - entertainment | expand knowledge',
 'coffee shop - confort and food | expand knowledge',
...]

new_list = [name.rstrip(suffix) for name in your_list]

Solution 3:

If you are certain that all strings have the common suffix, then this will do the trick:

strings = [
  'art gallery - museum and visits | expand knowledge',
  'lasergame - entertainment | expand knowledge']
suffixlen = len(" | expand knowledge")
print [s[:-suffixlen] for s in strings]    

output:

['art gallery - museum and visits', 'lasergame - entertainment']

Post a Comment for "Find A Repeating Pattern In A List Of Strings"