Skip to content Skip to sidebar Skip to footer

Split Sentences In Pandas Into Sentence Number And Words

I have a pandas dataframe like this: Text start end entity value I love apple 7 11 fruit apple I ate potato 6 11 vegetable po

Solution 1:

Use split, stack and map:

u = df.Text.str.split(expand=True).stack()

pd.DataFrame({
    'Sentence': u.index.get_level_values(0) + 1, 
    'Word': u.values, 
    'Entity': u.map(dict(zip(df.value, df.entity))).fillna('Object').values
})

   Sentence    Word     Entity
01       I     Object11    love     Object21   apple      fruit
32       I     Object42     ate     Object52  potato  vegetable

Side note: If running v0.24 or later, please use .to_numpy() instead of .values.

Solution 2:

I am using unnesting here after str.split

df.Text=df.Text.str.split(' ')
yourdf=unnesting(df,['Text'])
yourdf.loc[yourdf.Text.values!=yourdf.value.values,'entity']='object'
yourdf
     Text  start  end     entity   value
0       I      711object   apple
0    love      711object   apple
0   apple      711      fruit   apple
1       I      611object  potato
1     ate      611object  potato
1  potato      611  vegetable  potato

Solution 3:

Using the expand function I posted in this thread, you can

df = expand(df, 'Text', sep=' ')

Then simple

df['Tag'] = np.where(df.Text.ne(df.value), ['Object'], df.entity)


>>> df[['Text', 'Tag']]

    TextTag0   I       Object1   love    Object2   apple   fruit
3   I       Object4   ate     Object5   potato  vegetable

defexpand(df, col, sep=','):
    r = df[col].str.split(sep)
    d = {c: df[c].values.repeat(r.str.len(), axis=0) for c in df.columns}
    d[col] = [i for sub in r for i in sub]
    return pd.DataFrame(d)

Post a Comment for "Split Sentences In Pandas Into Sentence Number And Words"