如何将数据框中的列传递给NLTK的wordnet.synsets()?
我有一个数据表,其中有一列是英文单词。我想把这一列里的每个单词都通过NLTK的synsets()函数处理一下。我的问题是,synsets()一次只能处理一个单词。
比如说,wordnet.synsets('father')
现在如果我有这样的数据表:
dc = {'A':[0,9,4,5],'B':['father','mother','kid','sister']}
df = pd.DataFrame(dc)
df
A B
0 0 father
1 9 mother
2 4 kid
3 5 sister
我想把B列的单词通过synsets()函数处理,然后再新加一列,把结果放在那儿。我希望能做到这一点,而不需要一个一个地去处理整个数据表。
我该怎么做呢?
1 个回答
2
你可以使用 apply
方法:
In [4]: df['C'] = df['B'].apply(wordnet.synsets)
In [5]: df
Out[5]:
A B C
0 0 father [Synset('father.n.01'), Synset('forefather.n.0...
1 9 mother [Synset('mother.n.01'), Synset('mother.n.02'),...
2 4 kid [Synset('child.n.01'), Synset('kid.n.02'), Syn...
3 5 sister [Synset('sister.n.01'), Synset('sister.n.02'),...
不过,拥有一列列表通常不是一个很实用的数据结构。把每个同义词放在自己的列里可能会更好。你可以通过让回调函数返回一个 pd.Series
来实现:
In [29]: df.join(df['B'].apply(lambda word: pd.Series([w.name for w in wordnet.synsets(word)])))
Out[29]:
A B 0 1 2 3 \
0 0 father father.n.01 forefather.n.01 father.n.03 church_father.n.01
1 9 mother mother.n.01 mother.n.02 mother.n.03 mother.n.04
2 4 kid child.n.01 kid.n.02 kyd.n.01 child.n.02
3 5 sister sister.n.01 sister.n.02 sister.n.03 baby.n.05
4 5 6 7 8
0 father.n.05 father.n.06 founder.n.02 don.n.03 beget.v.01
1 mother.n.05 mother.v.01 beget.v.01 NaN NaN
2 kid.n.05 pull_the_leg_of.v.01 kid.v.02 NaN NaN
3 NaN NaN NaN NaN NaN
(我选择只显示每个 Synset
的 name
属性;当然你也可以使用
df.join(df['B'].apply(lambda word: pd.Series(wordnet.synsets(word))))
如果你想要 Synset
对象本身。)