Split Sentences In Pandas Into Sentence Number And Words
I have a pandas dataframe like this: Text start end entity value I love apple 7 11 fruit apple I ate potato 6 11 vegetable po
Solution 1:
Use split
, stack
and map
:
u = df.Text.str.split(expand=True).stack()
pd.DataFrame({
'Sentence': u.index.get_level_values(0) + 1,
'Word': u.values,
'Entity': u.map(dict(zip(df.value, df.entity))).fillna('Object').values
})
Sentence Word Entity
01 I Object11 love Object21 apple fruit
32 I Object42 ate Object52 potato vegetable
Side note: If running v0.24 or later, please use .to_numpy()
instead of .values
.
Solution 2:
I am using unnesting here after str.split
df.Text=df.Text.str.split(' ')
yourdf=unnesting(df,['Text'])
yourdf.loc[yourdf.Text.values!=yourdf.value.values,'entity']='object'
yourdf
Text start end entity value
0 I 711object apple
0 love 711object apple
0 apple 711 fruit apple
1 I 611object potato
1 ate 611object potato
1 potato 611 vegetable potato
Solution 3:
Using the expand
function I posted in this thread, you can
df = expand(df, 'Text', sep=' ')
Then simple
df['Tag'] = np.where(df.Text.ne(df.value), ['Object'], df.entity)
>>> df[['Text', 'Tag']]
TextTag0 I Object1 love Object2 apple fruit
3 I Object4 ate Object5 potato vegetable
defexpand(df, col, sep=','):
r = df[col].str.split(sep)
d = {c: df[c].values.repeat(r.str.len(), axis=0) for c in df.columns}
d[col] = [i for sub in r for i in sub]
return pd.DataFrame(d)
Post a Comment for "Split Sentences In Pandas Into Sentence Number And Words"