How To Add The Characters Between The Strings With Conditions
Solution 1:
After looking at the awesome answers given by Kraigolas and Will, I tried a different approach which requires only one regex.
Input (stolen from Will's answer :D)
import re
test_cases = (
'abc %20 def',
'abc %20 def',
'abc or def',
'abc OR def',
'abc+def',
'abc + def',
'abc&def',
'abc & def',
'abc AND def',
'abc and def',
)
Pattern capturing 5 groups as described below.
group1
: (\w+)\s?
Captures all letters before first space
group2
: ((or|OR|\+|%20)|(&|and|AND))
Wrapping group for group 3 and 4 (This is what make it possible to create one regex)
group3
: (or|OR|\+|%20)
Captures or
, OR
, +
, %20
group4
: (&|and|AND)
Captures &
, and
, AND
group5
: \s?(\w+)
Captures all letters after the last space.
Note that \s?
captures one or 0 spaces.
pattern = re.compile(r'(\w+)\s?((or|OR|\+|%20)|(&|and|AND))\s?(\w+)')
Format the strings as follow. If group 3 exits then replace with OR
. Else replace with AND
. (Note that when group 3 is null, group 4 is non-null and vice versa.)
def format_value(text):
match = pattern.match(text)
if match is not None and match.group(3):
return pattern.sub(r'*\1* OR *\5*', text)
else:
return pattern.sub(r'*\1* AND *\5*', text)
for x in test_cases:
print(format_value(x))
Output
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
EDIT
To capture abc def ghi
here is a small hack.
Create another pattern to capture the spaces. This will not capture already formatted strings with * on both sides as I'm searching for a space surrounded by 2 word characters.
space_pattern = re.compile(r'(\w)(\s)(\w)')
Update format value method by removing leading and trailing asterisk.
def format_value(text):
match = pattern.match(text)
if match is not None and match.group(3):
return pattern.sub(r'\1* OR *\5', text)
else:
return pattern.sub(r'\1* AND *\5', text)
Reformat the string as follow and add trailing and leading asterisk back.
for x in test_cases:
formatted_value = format_value(x)
print("*" + space_pattern.sub(r'\1* OR *\3', formatted_value) + "*")
Output
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc*
*abc* OR *def* OR *ghi*
Solution 2:
Edit
This answer was created before the question was updated to show that the expected output needed to surround abc
and def
with \*
's. Feel free to borrow from it to create a more relevant answer to the new question.
Original Answer
This can actually be done in a couple lines. Here, I'll just substitute all matches (not just one occurrence):
import re
text = """
abc %20 def
abc %20 def
abc or def
abc+def
abc + def
abc&def
abc & def
abc AND def
"""
or_pattern = re.compile("\s*(%20|\+)\s*|\s+or\s+")
text = or_pattern.sub(" OR ", text)
and_pattern = re.compile("\s*&\s*|\s+AND\s+")
text = and_pattern.sub(" AND ", text)
The output for text
is now:
abc OR def
abc OR def
abc OR def
abc OR def
abc OR def
abc AND def
abc AND def
abc AND def
or pattern
\s*(%20|\+)\s*|\s+or\s+
This is split into two parts separated by a regex "or" |
:
\s*(%20|\+)\s*
\s*
says match 0 or more spaces (to be restrictive, you could says{0,1}
to capture 0 or 1 spaces only)(%20|\+)
says match (and capture, which is necessary for the regex|
's, which mean "or") any one of%20
, or+
between the two\s*
and
\s+or\s+
This part is separated because we need at least one space on each side of the or
, otherwise door
would be replaced with do OR
.
Case insensitivity
In your case, you might also want oR
and Or
to match, in which case you can use re.compile("pattern", re.IGNORECASE)
.
Solution 3:
Working off of what Kraigolas provided, here is a solution to your problem:
import re
or_pattern = re.compile(r'\s*(%20|\+)\s*|\s+or\s+', re.IGNORECASE)
and_pattern = re.compile(r'\s*&\s*|\s+and\s+', re.IGNORECASE)
operand_pattern = re.compile(r'(\w+)\s*(OR|AND)\s*(\w+)')
def format_search_value(search_value):
search_value = or_pattern.sub(' OR ', search_value)
search_value = and_pattern.sub(' AND ', search_value)
return operand_pattern.sub(r'*\1* \2 *\3*', search_value)
It does all of what Kraigolas' answer does, and then additionally uses the operand_pattern
to surround the operands with asterisks. It uses 3 capture groups in the pattern:
- The first operand:
(\w+)
- The operator:
(OR|AND)
- The second operand:
(\w+)
These three captured values are then inserted into the substitution string with the asterisks using the special values\1
,\2
, and\3
:*\1* \2 *\3*
Usage:
test_cases = (
'abc %20 def',
'abc %20 def',
'abc or def',
'abc OR def',
'abc+def',
'abc + def',
'abc&def',
'abc & def',
'abc AND def',
'abc and def',
)
for search_value in test_cases:
print(format_search_value(search_value))
Output:
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* OR *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
*abc* AND *def*
Post a Comment for "How To Add The Characters Between The Strings With Conditions"