在动词/名词/形容词之间转换单词

2024-03-28 13:01:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我想要一个python库函数,它可以跨语音的不同部分进行翻译/转换。有时它应该输出多个单词(例如,“coder”和“code”都是动词“to code”的名词,一个是主语,另一个是宾语)

# :: String => List of String
print verbify('writer') # => ['write']
print nounize('written') # => ['writer']
print adjectivate('write') # => ['written']

我最关心的是动词和名词,作为一个我想写的笔记程序。i、 我可以写“咖啡因拮抗剂A1”或者“咖啡因是A1拮抗剂”,用一些NLP,它可以发现它们的意思是一样的。(我知道这不容易,而且需要NLP来解析,而不仅仅是标记,但我想创建一个原型)。

类似的问题。。。 Converting adjectives and adverbs to their noun forms (这个答案只源于词根位置,我想在词根之间切换。)

语言学中的转换


Tags: tostringnlpa1code动词writewriter
3条回答

这是一个理论上能够在名词/动词/形容词/副词形式之间转换单词的函数,我从here(我相信最初由bogs编写)更新为符合nltk 3.2.5,现在synset.lemmassysnset.name是函数。

from nltk.corpus import wordnet as wn

# Just to make it a bit more readable
WN_NOUN = 'n'
WN_VERB = 'v'
WN_ADJECTIVE = 'a'
WN_ADJECTIVE_SATELLITE = 's'
WN_ADVERB = 'r'


def convert(word, from_pos, to_pos):    
    """ Transform words given from/to POS tags """

    synsets = wn.synsets(word, pos=from_pos)

    # Word not found
    if not synsets:
        return []

    # Get all lemmas of the word (consider 'a'and 's' equivalent)
    lemmas = []
    for s in synsets:
        for l in s.lemmas():
            if s.name().split('.')[1] == from_pos or from_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and s.name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
                lemmas += [l]

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) for l in lemmas]

    # filter only the desired pos (consider 'a' and 's' equivalent)
    related_noun_lemmas = []

    for drf in derivationally_related_forms:
        for l in drf[1]:
            if l.synset().name().split('.')[1] == to_pos or to_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and l.synset().name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
                related_noun_lemmas += [l]

    # Extract the words from the lemmas
    words = [l.name() for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w)) / len_words) for w in set(words)]
    result.sort(key=lambda w:-w[1])

    # return all the possibilities sorted by probability
    return result


convert('direct', 'a', 'r')
convert('direct', 'a', 'n')
convert('quick', 'a', 'r')
convert('quickly', 'r', 'a')
convert('hunger', 'n', 'v')
convert('run', 'v', 'a')
convert('tired', 'a', 'r')
convert('tired', 'a', 'v')
convert('tired', 'a', 'n')
convert('tired', 'a', 's')
convert('wonder', 'v', 'n')
convert('wonder', 'n', 'a')

正如你在下面所看到的,它并没有那么好用。它不能在形容词和副词之间切换(我的具体目标),但在其他情况下它确实给出了一些有趣的结果。

>>> convert('direct', 'a', 'r')
[]
>>> convert('direct', 'a', 'n')
[('directness', 0.6666666666666666), ('line', 0.3333333333333333)]
>>> convert('quick', 'a', 'r')
[]
>>> convert('quickly', 'r', 'a')
[]
>>> convert('hunger', 'n', 'v')
[('hunger', 0.75), ('thirst', 0.25)]
>>> convert('run', 'v', 'a')
[('persistent', 0.16666666666666666), ('executive', 0.16666666666666666), ('operative', 0.16666666666666666), ('prevalent', 0.16666666666666666), ('meltable', 0.16666666666666666), ('operant', 0.16666666666666666)]
>>> convert('tired', 'a', 'r')
[]
>>> convert('tired', 'a', 'v')
[]
>>> convert('tired', 'a', 'n')
[('triteness', 0.25), ('banality', 0.25), ('tiredness', 0.25), ('commonplace', 0.25)]
>>> convert('tired', 'a', 's')
[]
>>> convert('wonder', 'v', 'n')
[('wonder', 0.3333333333333333), ('wonderer', 0.2222222222222222), ('marveller', 0.1111111111111111), ('marvel', 0.1111111111111111), ('wonderment', 0.1111111111111111), ('question', 0.1111111111111111)]
>>> convert('wonder', 'n', 'a')
[('curious', 0.4), ('wondrous', 0.2), ('marvelous', 0.2), ('marvellous', 0.2)]

希望这能帮别人省点麻烦

这更像是一种启发式方法。我刚刚把它编码成了这种风格。它使用wordnet中派生的相关形式。我已经实现了名词化。我想verbify的工作原理类似。从我的测试来看,效果相当不错:

from nltk.corpus import wordnet as wn

def nounify(verb_word):
    """ Transform a verb to the closest noun: die -> death """
    verb_synsets = wn.synsets(verb_word, pos="v")

    # Word not found
    if not verb_synsets:
        return []

    # Get all verb lemmas of the word
    verb_lemmas = [l for s in verb_synsets \
                   for l in s.lemmas if s.name.split('.')[1] == 'v']

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) \
                                    for l in    verb_lemmas]

    # filter only the nouns
    related_noun_lemmas = [l for drf in derivationally_related_forms \
                           for l in drf[1] if l.synset.name.split('.')[1] == 'n']

    # Extract the words from the lemmas
    words = [l.name for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w))/len_words) for w in set(words)]
    result.sort(key=lambda w: -w[1])

    # return all the possibilities sorted by probability
    return result

我知道这并不能回答你的全部问题,但它确实回答了大部分问题。我要退房 http://nodebox.net/code/index.php/Linguistics#verb_conjugation 这个python库能够使动词共轭,并识别单词是动词、名词还是形容词。

示例代码

print en.verb.present("gave")
print en.verb.present("gave", person=3, negate=False)
>>> give
>>> gives

它还可以对单词进行分类。

print en.is_noun("banana")
>>> True

下载在链接的顶部。

相关问题 更多 >