Python 二元组字典格式

1 投票

1 回答

3047 浏览

数据工程师

提问于 2025-04-17 21:47

为了学校的作业，我需要制作一个字典，这个字典要包含文本文件中连续单词的信息。

对于文件中的每个单词，我需要创建一个条目，这个条目的“键”是单词本身，而“值”则是一个列表，里面包含可以跟在这个单词后面的单词。

举个例子，下面这句话：

“我认为你认为他会认为这很不错”

会产生以下输出：

{'': ['I'], 'I': ['think'], 'it': ['pretty.'] (...) 'think': ['you', 'he', 'it'], 'he': ['will']}

你可以看到，第一个条目 '' 有点奇怪，但这是故意的。我需要在代码中明确这个条目，值是一个只包含文本中第一个单词的列表。显然，没有以'pretty'作为键的条目。

我编程不太好，这道题我卡了超过一天，现在这几乎是我所有的代码：

def fill_up_dict(words):
    style_dict = {}
    prev_word = ''  #empty string
         for word in words
         style_dict[prev_word]
    #at a total loss here
    return style_dict

也许你能看出来，我在尝试创建一个所有单词的键列表，然后把值分配给它们前面的单词。但是无论我怎么做，都一点用都没有。

数据结构列表文本处理二元组键值对字典编程连续单词

1 个回答

要修改你的方法：

def fill_up_dict(words):
    style_dict = {}
    prev_word = ''  #empty string
    for word in words
         if prev_word not in style_dict:
             style_dict[prev_word] = []
         style_dict[prev_word].append(word)
         prev_word = word
    return style_dict

注意，你需要在 style_dict 中创建列表，以便可以把单词添加进去，并且在每次循环时需要更新 prev_word。

不过，处理连续单词最简单的方法是使用 zip：

def fill_up_dict(words):
    style_dict = {"": [words[0]]}
    for word1, word2 in zip(words, words[1:]):
        if word1 not in style_dict:
            style_dict[word1] = []
        style_dict[word1].append(word2)
    return style_dict

另外，你可以稍微简化一下，使用 collections.defaultdict：

from collections import defaultdict

def fill_up_dict(words):
    style_dict = defaultdict(list)
    style_dict[""] = [words[0]]
    for word1, word2 in zip(words, words[1:]):
        style_dict[word1].append(word2)
    return style_dict

回答于 2025-04-17 由 Python大师

分享举报

Python 二元组字典格式

1 个回答

撰写回答