Skip to content Skip to sidebar Skip to footer

Picking Multiple Values From String

I have data like the sample data below, and I'm trying to pattern match and parse it to create something like the output data. The idea is, if I have a string value that contains

Solution 1:

You can use str.extract to capture pattern in the string and convert each into a column:

pd.concat([
        SampleDf, 
        SampleDf.OtherField.str.extract(r"Aggr\((?P<Part1>.*?)\),(?P<Part2>[^\(]*)", expand=True)
    ], axis=1)

#   ReportField                             OtherField      Part1        Part2#0          tom           words Aggr(stuff),something1      stuff   something1#1          bob   Morewords Aggr(Diffstuff),something2  Diffstuff   something2

regexAggr\\((?P<Part1>.*?)\\),(?P<Part2>[^\\(]*) captures two patterns you needed (with one being Aggr\\((?P<Part1>.*?)\\) named part1: the content in the first parenthesis after Aggr, another being ,(?P<Part2>[^\\(]*) named part2: the pattern after the comma following the first pattern before the next parenthesis).

Solution 2:

You can use str.extractall with regex pattern matching

SampleDf[['Part1', 'Part2']]=SampleDf.OtherField.str.extractall('\((.*)\),(.*)').reset_index(drop = True)

You get

    ReportField OtherField                              Part1       Part2
0   tom         words Aggr(stuff),something1            stuff       something1
1   bob         Morewords Aggr(Diffstuff),something2    Diffstuff   something2

Post a Comment for "Picking Multiple Values From String"