在动词/名词/形容词形式间转换单词

36 投票
5 回答
28368 浏览
提问于 2025-04-17 13:27

我想要一个Python库的函数,可以在不同的词性之间进行转换。有时候它应该输出多个词,比如“coder”和“code”都是动词“to code”的名词,一个是主语,另一个是宾语。

# :: String => List of String
print verbify('writer') # => ['write']
print nounize('written') # => ['writer']
print adjectivate('write') # => ['written']

我主要关心动词和名词之间的转换,因为我想写一个记笔记的程序。也就是说,我可以写“caffeine antagonizes A1”或者“caffeine is an A1 antagonist”,然后通过一些自然语言处理(NLP),它能搞明白这两句话的意思是一样的。(我知道这并不简单,而且需要解析的NLP,而不仅仅是标记,但我想先做个原型出来。)

类似的问题有…… 将形容词和副词转换为名词形式 (这个答案只涉及到词性的根本形式。我想在不同的词性之间转换。)

顺便提一下,这在语言学中叫做转换,详细信息可以查看这个链接 http://en.wikipedia.org/wiki/Conversion_%28linguistics%29

5 个回答

4

我知道这并没有完全回答你的问题,但它确实解决了你问题中的一大部分。我建议你去看看 http://nodebox.net/code/index.php/Linguistics#verb_conjugation 这个网站上的Python库可以帮你变动动词的形式,还能识别一个词是动词、名词还是形容词。

示例代码

print en.verb.present("gave")
print en.verb.present("gave", person=3, negate=False)
>>> give
>>> gives

它还可以对单词进行分类。

print en.is_noun("banana")
>>> True

下载链接在页面的顶部。

15

这里有一个函数,理论上可以在名词、动词、形容词和副词之间转换。我是从这里更新过来的(最初是由bogs写的,我记得)以符合nltk 3.2.5的要求,因为现在synset.lemmassynset.name都是函数。

from nltk.corpus import wordnet as wn

# Just to make it a bit more readable
WN_NOUN = 'n'
WN_VERB = 'v'
WN_ADJECTIVE = 'a'
WN_ADJECTIVE_SATELLITE = 's'
WN_ADVERB = 'r'


def convert(word, from_pos, to_pos):    
    """ Transform words given from/to POS tags """

    synsets = wn.synsets(word, pos=from_pos)

    # Word not found
    if not synsets:
        return []

    # Get all lemmas of the word (consider 'a'and 's' equivalent)
    lemmas = []
    for s in synsets:
        for l in s.lemmas():
            if s.name().split('.')[1] == from_pos or from_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and s.name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
                lemmas += [l]

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) for l in lemmas]

    # filter only the desired pos (consider 'a' and 's' equivalent)
    related_noun_lemmas = []

    for drf in derivationally_related_forms:
        for l in drf[1]:
            if l.synset().name().split('.')[1] == to_pos or to_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and l.synset().name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
                related_noun_lemmas += [l]

    # Extract the words from the lemmas
    words = [l.name() for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w)) / len_words) for w in set(words)]
    result.sort(key=lambda w:-w[1])

    # return all the possibilities sorted by probability
    return result


convert('direct', 'a', 'r')
convert('direct', 'a', 'n')
convert('quick', 'a', 'r')
convert('quickly', 'r', 'a')
convert('hunger', 'n', 'v')
convert('run', 'v', 'a')
convert('tired', 'a', 'r')
convert('tired', 'a', 'v')
convert('tired', 'a', 'n')
convert('tired', 'a', 's')
convert('wonder', 'v', 'n')
convert('wonder', 'n', 'a')

如你所见,它的效果不是很好。它无法在形容词和副词之间切换(这是我特别想要的),但在其他情况下,它确实给出了一些有趣的结果。

>>> convert('direct', 'a', 'r')
[]
>>> convert('direct', 'a', 'n')
[('directness', 0.6666666666666666), ('line', 0.3333333333333333)]
>>> convert('quick', 'a', 'r')
[]
>>> convert('quickly', 'r', 'a')
[]
>>> convert('hunger', 'n', 'v')
[('hunger', 0.75), ('thirst', 0.25)]
>>> convert('run', 'v', 'a')
[('persistent', 0.16666666666666666), ('executive', 0.16666666666666666), ('operative', 0.16666666666666666), ('prevalent', 0.16666666666666666), ('meltable', 0.16666666666666666), ('operant', 0.16666666666666666)]
>>> convert('tired', 'a', 'r')
[]
>>> convert('tired', 'a', 'v')
[]
>>> convert('tired', 'a', 'n')
[('triteness', 0.25), ('banality', 0.25), ('tiredness', 0.25), ('commonplace', 0.25)]
>>> convert('tired', 'a', 's')
[]
>>> convert('wonder', 'v', 'n')
[('wonder', 0.3333333333333333), ('wonderer', 0.2222222222222222), ('marveller', 0.1111111111111111), ('marvel', 0.1111111111111111), ('wonderment', 0.1111111111111111), ('question', 0.1111111111111111)]
>>> convert('wonder', 'n', 'a')
[('curious', 0.4), ('wondrous', 0.2), ('marvelous', 0.2), ('marvellous', 0.2)]

希望这个能帮到某些人,省点麻烦。

24

这其实是一种经验性的方法。我刚刚写了这个代码,所以风格可能不太好。它使用了wordnet中的derivationally_related_forms()函数。我实现了一个叫做nounify的功能。我想verbify的工作原理是类似的。从我测试的结果来看,效果还不错:

from nltk.corpus import wordnet as wn

def nounify(verb_word):
    """ Transform a verb to the closest noun: die -> death """
    verb_synsets = wn.synsets(verb_word, pos="v")

    # Word not found
    if not verb_synsets:
        return []

    # Get all verb lemmas of the word
    verb_lemmas = [l for s in verb_synsets \
                   for l in s.lemmas if s.name.split('.')[1] == 'v']

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) \
                                    for l in    verb_lemmas]

    # filter only the nouns
    related_noun_lemmas = [l for drf in derivationally_related_forms \
                           for l in drf[1] if l.synset.name.split('.')[1] == 'n']

    # Extract the words from the lemmas
    words = [l.name for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w))/len_words) for w in set(words)]
    result.sort(key=lambda w: -w[1])

    # return all the possibilities sorted by probability
    return result

撰写回答