NLTK中的Wordnet选择性限制

7 投票
2 回答
865 浏览
提问于 2025-04-16 14:46

有没有办法通过NLTK获取WordNet中的选择限制,比如说+animate(表示生物体)、+human(表示人类)等等?或者有没有其他方法可以提供关于同义词集合(synset)的语义信息?我找到的最接近的就是上位词关系。

2 个回答

0

你可以试着用一些相似度函数,配合自己挑选的同义词集合,来进行筛选。不过,这基本上和跟随上位词树是一样的——据我所知,所有的WordNet相似度函数在计算时都会用到上位词的距离。此外,同义词集合还有很多可选的属性,值得去探索一下,但这些属性的存在情况可能会很不稳定。

5

这要看你的“选择限制”是什么,或者我更愿意称之为语义特征。在经典语义学中,有一个由概念组成的世界,而要比较这些概念,我们需要找到

  • 区分特征(也就是用来区分不同概念的特征)和
  • 相似特征(也就是概念之间的相似之处,这些特征强调了我们需要区分它们的原因)

举个例子:

Man is [+HUMAN], [+MALE], [+ADULT]
Woman is [+HUMAN], [-MALE], [+ADULT]

[+HUMAN] and [+ADULT] = similarity features
[+-MALE] is the discrimating features

传统语义学和将这一理论应用于计算语义学的一个共同问题是

“是否有一个特定的特征列表,可以用来比较任何

“如果有的话,这个列表上的特征是什么?”

概念?”

(更多细节请见 www.acl.ldc.upenn.edu/E/E91/E91-1034.pdf‎)

回到WordNet,我可以建议两种方法来解决“选择限制”问题。

首先,检查上位词的区分特征,但你必须先决定什么是区分特征。比如,要把动物和人类区分开来,我们可以把区分特征设定为[+-人类]和[+-动物]。

from nltk.corpus import wordnet as wn

# Concepts to compare
dog_sense = wn.synsets('dog')[0] # It's http://goo.gl/b9sg9X
jb_sense = wn.synsets('James_Baldwin')[0] # It's http://goo.gl/CQQIG9

# To access the hypernym_paths()[0]
# It's weird for that hypernym_paths gives a list of list rather than a list, nevertheless it works.
dog_hypernyms = dog_sense.hypernym_paths()[0]
jb_hypernyms = jb_sense.hypernym_paths()[0]


# Discriminating features in terms of concepts in WordNet
human = wn.synset('person.n.01') # i.e. [+human]
animal = wn.synset('animal.n.01') # i.e. [+animal]

try:
  assert human in jb_hypernyms and animal not in jb_hypernyms
  print "James Baldwin is human"
except:
  print "James Baldwin is not human"

try:
  assert human in dog_hypernyms and animal not in dog_hypernyms
  print "Dog is an animal"
except:
  print "Dog is not an animal"

其次,检查相似度度量,正如@Jacob所建议的那样。

dog_sense = wn.synsets('dog')[0] # It's http://goo.gl/b9sg9X
jb_sense = wn.synsets('James_Baldwin')[0] # It's http://goo.gl/CQQIG9

# Features to check against whether the 'dubious' concept is a human or an animal
human = wn.synset('person.n.01') # i.e. [+human]
animal = wn.synset('animal.n.01') # i.e. [+animal]

if dog_sense.wup_similarity(animal) > dog_sense.wup_similarity(human):
  print "Dog is more of an animal than human"
elif dog_sense.wup_similarity(animal) < dog_sense.wup_similarity(human):
  print "Dog is more of a human than animal"

撰写回答