如何将数据框中的列传递给NLTK的wordnet.synsets()?

0 投票
1 回答
1187 浏览
提问于 2025-04-18 09:10

我有一个数据表,其中有一列是英文单词。我想把这一列里的每个单词都通过NLTK的synsets()函数处理一下。我的问题是,synsets()一次只能处理一个单词。

比如说,wordnet.synsets('father')

现在如果我有这样的数据表:

dc = {'A':[0,9,4,5],'B':['father','mother','kid','sister']}
df = pd.DataFrame(dc)
df
   A       B
0  0  father
1  9  mother
2  4     kid
3  5  sister

我想把B列的单词通过synsets()函数处理,然后再新加一列,把结果放在那儿。我希望能做到这一点,而不需要一个一个地去处理整个数据表。

我该怎么做呢?

1 个回答

2

你可以使用 apply 方法:

In [4]: df['C'] = df['B'].apply(wordnet.synsets)

In [5]: df
Out[5]: 
   A       B                                                  C
0  0  father  [Synset('father.n.01'), Synset('forefather.n.0...
1  9  mother  [Synset('mother.n.01'), Synset('mother.n.02'),...
2  4     kid  [Synset('child.n.01'), Synset('kid.n.02'), Syn...
3  5  sister  [Synset('sister.n.01'), Synset('sister.n.02'),...

不过,拥有一列列表通常不是一个很实用的数据结构。把每个同义词放在自己的列里可能会更好。你可以通过让回调函数返回一个 pd.Series 来实现:

In [29]: df.join(df['B'].apply(lambda word: pd.Series([w.name for w in wordnet.synsets(word)])))
Out[29]: 
   A       B            0                1            2                   3  \
0  0  father  father.n.01  forefather.n.01  father.n.03  church_father.n.01   
1  9  mother  mother.n.01      mother.n.02  mother.n.03         mother.n.04   
2  4     kid   child.n.01         kid.n.02     kyd.n.01          child.n.02   
3  5  sister  sister.n.01      sister.n.02  sister.n.03           baby.n.05   

             4                     5             6         7           8  
0  father.n.05           father.n.06  founder.n.02  don.n.03  beget.v.01  
1  mother.n.05           mother.v.01    beget.v.01       NaN         NaN  
2     kid.n.05  pull_the_leg_of.v.01      kid.v.02       NaN         NaN  
3          NaN                   NaN           NaN       NaN         NaN  

(我选择只显示每个 Synsetname 属性;当然你也可以使用

df.join(df['B'].apply(lambda word: pd.Series(wordnet.synsets(word))))

如果你想要 Synset 对象本身。)

撰写回答