Python中语料库中最常见的词汇上下文

0 投票

1 回答

729 浏览

提问于 2025-04-17 13:31

我在用下面这个函数找出文本中使用频率最高的10个单词（用Python编写）之后，想要比较这十个单词在不同子类别中的上下文。

def meest_freq(mycorpus):
    import string
    woorden = mycorpus.words()
    zonderhoofdletters = [word.lower() for word in woorden]
    filtered = [word for word in zonderhoofdletters if word not in stopList]
    no_punct = [s.translate(None, string.punctuation) for s in filtered]
    word_counter = {}
    D = defaultdict(int)
    for word in no_punct:
        D[word] +=1
    popular_words = sorted(D, key = D.get, reverse = True)
    woord1 = popular_words[1]
    woord2 = popular_words[2]
    woord3 = popular_words[3]
    woord4 = popular_words[4]
    woord5 = popular_words[5]
    woord6 = popular_words[6]
    woord7 = popular_words[7]
    woord8 = popular_words[8]
    woord9 = popular_words[9]
    woord10 = popular_words[10]
    print "De 10 meest frequente woorden zijn: ", woord1, ",", woord2, ',', woord3, ',', woord4, ',', woord5, ',', woord6, ',', woord7, ',', woord8, ',', woord9, "en", woord10
    return popular_words

为此我想用以下代码：

def context(cat):
    words = popular_words[:10]
    context = words.concordance()
    print context

可惜我总是遇到“AttributeError: 'str' object has no attribute 'concordance'”这个错误。有没有人知道为什么我不能在第二个函数中使用第一个代码块的结果？我以为用返回语句就可以正常工作。

错误处理自然语言处理文本分析语料库词汇频率上下文比较

1 个回答

有没有人知道为什么我不能在第二个函数中使用第一个代码块的结果？我以为用返回语句就可以了。

因为函数返回的是值，而不是变量。

你在context中使用的popular_words并不是来自meest_freq; 它来自某个全局变量。meest_freq内部的popular_words是一个局部变量。这是因为有个规则：如果你在函数内部给一个名字赋值，那它就是局部变量，除非你用global语句特别说明。在context中，没有对popular_words进行赋值，所以Python会去找一个全局变量，结果可能是你不期望的内容，可能是因为你在解释器中测试这些函数（也许你之前测试和修复过的版本留下了这个变量...）。

请不要试图使用全局变量。你已经正确地学到了，从函数中获取信息的方式是通过返回值。与此相对的，从函数中获取信息的方式是通过参数传入。就像meest_freq知道语料库（因为你把它作为mycorpus传入），context也应该知道流行词。

你一定有代码同时调用这两个函数。那段代码应该把meest_freq返回的值传递给context，就像它把语料库传给meest_freq一样。

另外，如果你把语料库传给context，那么你可以在里面进行调用。由于你的命名，我很难知道该如何组织这些内容；我不知道cat是什么意思，context和什么有关，或者在这个上下文中concordance是什么意思。

回答于 2025-04-17 由 Python大师

分享举报

Python中语料库中最常见的词汇上下文

1 个回答

撰写回答