Find A Repeating Pattern In A List Of Strings
I'm looking for a way to clean strings from their longest repeating pattern. I have a list of approximately 1000 web pages titles, and they all share a common suffix, which is the
Solution 1:
Here's a solution using the os.path.commonprefix
function on the reversed titles:
titles = ['art gallery - museum and visits | expand knowledge',
'lasergame - entertainment | expand knowledge',
'coffee shop - confort and food | expand knowledge',
]
# Find the longest common suffix by reversing the strings and using a # library function to find the common "prefix".common_suffix = os.path.commonprefix([title[::-1] for title in titles])[::-1]
# Strips all titles from the number of characters in the common suffix.stripped_titles = [title[:-len(common_suffix)] for title in titles]
Result:
['art gallery - museum and visits', 'lasergame - entertainment', 'coffee shop - confort and food']
Because it finds the common suffix by itself, it should work on any group of titles, even if you don't know the suffix.
Solution 2:
If you actually know the suffix you want to strip, you could simply do:
suffix = " | expand knowledge"your_list = ['art gallery - museum and visits | expand knowledge',
'lasergame - entertainment | expand knowledge',
'coffee shop - confort and food | expand knowledge',
...]
new_list = [name.rstrip(suffix) for name in your_list]
Solution 3:
If you are certain that all strings have the common suffix, then this will do the trick:
strings = [
'art gallery - museum and visits | expand knowledge',
'lasergame - entertainment | expand knowledge']
suffixlen = len(" | expand knowledge")
print [s[:-suffixlen] for s in strings]
output:
['art gallery - museum and visits', 'lasergame - entertainment']
Post a Comment for "Find A Repeating Pattern In A List Of Strings"