Skip to content Skip to sidebar Skip to footer

How To Add The Characters Between The Strings With Conditions

if %20 there in the string it has to replace with OR, abc %20 def. Expected out -- > '*abc* OR *def*' if , there in the string it has to replace with OR, abc,def.: Expected out

Solution 1:

After looking at the awesome answers given by Kraigolas and Will, I tried a different approach which requires only one regex.

Input (stolen from Will's answer :D)

import re

test_cases = (
    'abc %20 def',
    'abc %20 def',
    'abc or def',
    'abc OR def',
    'abc+def',
    'abc + def',
    'abc&def',
    'abc & def',
    'abc AND def',
    'abc and def',
)

Pattern capturing 5 groups as described below.

group1: (\w+)\s? Captures all letters before first space

group2: ((or|OR|\+|%20)|(&|and|AND)) Wrapping group for group 3 and 4 (This is what make it possible to create one regex)

group3: (or|OR|\+|%20) Captures or, OR, +, %20

group4: (&|and|AND) Captures &, and, AND

group5: \s?(\w+) Captures all letters after the last space.

Note that \s? captures one or 0 spaces.

pattern = re.compile(r'(\w+)\s?((or|OR|\+|%20)|(&|and|AND))\s?(\w+)')

Format the strings as follow. If group 3 exits then replace with OR. Else replace with AND. (Note that when group 3 is null, group 4 is non-null and vice versa.)

def format_value(text):
    match = pattern.match(text)
    if match is not None and match.group(3):
        return pattern.sub(r'*\1* OR *\5*', text)
    else:
        return pattern.sub(r'*\1* AND *\5*', text)
for x in test_cases:
    print(format_value(x))

Output

*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*

EDIT To capture abc def ghi here is a small hack.

Create another pattern to capture the spaces. This will not capture already formatted strings with * on both sides as I'm searching for a space surrounded by 2 word characters.

space_pattern = re.compile(r'(\w)(\s)(\w)')

Update format value method by removing leading and trailing asterisk.

def format_value(text):
    match = pattern.match(text)
    if match is not None and match.group(3):
        return pattern.sub(r'\1* OR *\5', text)
    else:
        return pattern.sub(r'\1* AND *\5', text)

Reformat the string as follow and add trailing and leading asterisk back.

for x in test_cases:
    formatted_value = format_value(x)
    print("*" + space_pattern.sub(r'\1* OR *\3', formatted_value) + "*")

Output

*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc*
*abc* OR *def* OR *ghi*

Solution 2:

Edit

This answer was created before the question was updated to show that the expected output needed to surround abc and def with \*'s. Feel free to borrow from it to create a more relevant answer to the new question.

Original Answer

This can actually be done in a couple lines. Here, I'll just substitute all matches (not just one occurrence):

import re

text = """
abc %20 def
abc %20 def
abc or def
abc+def
abc + def

abc&def
abc & def
abc AND def
"""

or_pattern = re.compile("\s*(%20|\+)\s*|\s+or\s+")
text = or_pattern.sub(" OR ", text)

and_pattern = re.compile("\s*&\s*|\s+AND\s+")
text = and_pattern.sub(" AND ", text)

The output for text is now:

abc OR def
abc OR def
abc OR def
abc OR def
abc OR def

abc AND def
abc AND def
abc AND def

or pattern

\s*(%20|\+)\s*|\s+or\s+

This is split into two parts separated by a regex "or" |:

\s*(%20|\+)\s*
  • \s* says match 0 or more spaces (to be restrictive, you could say s{0,1} to capture 0 or 1 spaces only)
  • (%20|\+) says match (and capture, which is necessary for the regex |'s, which mean "or") any one of %20, or + between the two \s*

and

\s+or\s+

This part is separated because we need at least one space on each side of the or, otherwise door would be replaced with do OR .

Case insensitivity

In your case, you might also want oR and Or to match, in which case you can use re.compile("pattern", re.IGNORECASE).


Solution 3:

Working off of what Kraigolas provided, here is a solution to your problem:

import re

or_pattern = re.compile(r'\s*(%20|\+)\s*|\s+or\s+', re.IGNORECASE)
and_pattern = re.compile(r'\s*&\s*|\s+and\s+', re.IGNORECASE)
operand_pattern = re.compile(r'(\w+)\s*(OR|AND)\s*(\w+)')

def format_search_value(search_value):
    search_value = or_pattern.sub(' OR ', search_value)
    search_value = and_pattern.sub(' AND ', search_value)
    return operand_pattern.sub(r'*\1* \2 *\3*', search_value)

It does all of what Kraigolas' answer does, and then additionally uses the operand_pattern to surround the operands with asterisks. It uses 3 capture groups in the pattern:

  1. The first operand: (\w+)
  2. The operator: (OR|AND)
  3. The second operand: (\w+) These three captured values are then inserted into the substitution string with the asterisks using the special values \1, \2, and \3: *\1* \2 *\3*

Usage:

test_cases = (
    'abc %20 def',
    'abc %20 def',
    'abc or def',
    'abc OR def',
    'abc+def',
    'abc + def',
    'abc&def',
    'abc & def',
    'abc AND def',
    'abc and def',
)

for search_value in test_cases:
    print(format_search_value(search_value))

Output:

*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*

Post a Comment for "How To Add The Characters Between The Strings With Conditions"