How To Add The Characters Between The Strings With Conditions

if %20 there in the string it has to replace with OR, abc %20 def. Expected out -- > '*abc* OR *def*' if , there in the string it has to replace with OR, abc,def.: Expected out

Solution 1:

After looking at the awesome answers given by Kraigolas and Will, I tried a different approach which requires only one regex.

Input (stolen from Will's answer :D)

import re

test_cases = (
    'abc %20 def',
    'abc %20 def',
    'abc or def',
    'abc OR def',
    'abc + def',
    'abc & def',
    'abc AND def',
    'abc and def',

Pattern capturing 5 groups as described below.

group1: (\w+)\s? Captures all letters before first space

group2: ((or|OR|\+|%20)|(&|and|AND)) Wrapping group for group 3 and 4 (This is what make it possible to create one regex)

group3: (or|OR|\+|%20) Captures or, OR, +, %20

group4: (&|and|AND) Captures &, and, AND

group5: \s?(\w+) Captures all letters after the last space.

Note that \s? captures one or 0 spaces.

pattern = re.compile(r'(\w+)\s?((or|OR|\+|%20)|(&|and|AND))\s?(\w+)')

Format the strings as follow. If group 3 exits then replace with OR. Else replace with AND. (Note that when group 3 is null, group 4 is non-null and vice versa.)

def format_value(text):
    match = pattern.match(text)
    if match is not None and
        return pattern.sub(r'*\1* OR *\5*', text)
        return pattern.sub(r'*\1* AND *\5*', text)
for x in test_cases:


*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*

EDIT To capture abc def ghi here is a small hack.

Create another pattern to capture the spaces. This will not capture already formatted strings with * on both sides as I'm searching for a space surrounded by 2 word characters.

space_pattern = re.compile(r'(\w)(\s)(\w)')

Update format value method by removing leading and trailing asterisk.

def format_value(text):
    match = pattern.match(text)
    if match is not None and
        return pattern.sub(r'\1* OR *\5', text)
        return pattern.sub(r'\1* AND *\5', text)

Reformat the string as follow and add trailing and leading asterisk back.

for x in test_cases:
    formatted_value = format_value(x)
    print("*" + space_pattern.sub(r'\1* OR *\3', formatted_value) + "*")


*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc* OR *def* OR *ghi*

Solution 2:


This answer was created before the question was updated to show that the expected output needed to surround abc and def with \*'s. Feel free to borrow from it to create a more relevant answer to the new question.

Original Answer

This can actually be done in a couple lines. Here, I'll just substitute all matches (not just one occurrence):

import re

text = """
abc %20 def
abc %20 def
abc or def
abc + def

abc & def
abc AND def

or_pattern = re.compile("\s*(%20|\+)\s*|\s+or\s+")
text = or_pattern.sub(" OR ", text)

and_pattern = re.compile("\s*&\s*|\s+AND\s+")
text = and_pattern.sub(" AND ", text)

The output for text is now:

abc OR def
abc OR def
abc OR def
abc OR def
abc OR def

abc AND def
abc AND def
abc AND def

or pattern


This is split into two parts separated by a regex "or" |:

  • \s* says match 0 or more spaces (to be restrictive, you could say s{0,1} to capture 0 or 1 spaces only)
  • (%20|\+) says match (and capture, which is necessary for the regex |'s, which mean "or") any one of %20, or + between the two \s*



This part is separated because we need at least one space on each side of the or, otherwise door would be replaced with do OR .

Case insensitivity

In your case, you might also want oR and Or to match, in which case you can use re.compile("pattern", re.IGNORECASE).

Solution 3:

Working off of what Kraigolas provided, here is a solution to your problem:

import re

or_pattern = re.compile(r'\s*(%20|\+)\s*|\s+or\s+', re.IGNORECASE)
and_pattern = re.compile(r'\s*&\s*|\s+and\s+', re.IGNORECASE)
operand_pattern = re.compile(r'(\w+)\s*(OR|AND)\s*(\w+)')

def format_search_value(search_value):
    search_value = or_pattern.sub(' OR ', search_value)
    search_value = and_pattern.sub(' AND ', search_value)
    return operand_pattern.sub(r'*\1* \2 *\3*', search_value)

It does all of what Kraigolas' answer does, and then additionally uses the operand_pattern to surround the operands with asterisks. It uses 3 capture groups in the pattern:

  1. The first operand: (\w+)
  2. The operator: (OR|AND)
  3. The second operand: (\w+) These three captured values are then inserted into the substitution string with the asterisks using the special values \1, \2, and \3: *\1* \2 *\3*


test_cases = (
    'abc %20 def',
    'abc %20 def',
    'abc or def',
    'abc OR def',
    'abc + def',
    'abc & def',
    'abc AND def',
    'abc and def',

for search_value in test_cases:


*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*

