Markov模型在Python中的实现

示例

输入
文件名：示例.txt

I Love you I Miss you Miss you Baby You are the best I Miss you

代码段

from collections import Counter import pprint class TextAnalyzer: text_file = 'example.txt' def __init__(self): self.raw_data = '' self.word_map = dict() self.prepare_data() self.analyze() pprint.pprint(self.word_map) def prepare_data(self): with open(self.text_file, 'r') as example: self.raw_data=example.read().replace('\n', ' ') example.close() def analyze(self): words = self.raw_data.split() word_pairs = [[words[i],words[i+1]] for i in range(len(words)-1)] self.word_map = dict() for word in list(set(words)): for pair in word_pairs: if word == pair[0]: self.word_map.setdefault(word, []).append(pair[1]) self.word_map[word] = Counter(self.word_map[word]).most_common(11) TextAnalyzer()

实际产量

{'Baby': ['You'], 'I': ['Love', 'Miss', 'Miss'], 'Love': ['you'], 'Miss': ['you', 'you', 'you'], 'You': ['are'], 'are': ['the'], 'best': ['I'], 'the': ['best'], 'you': [('I', 1), ('Miss', 1), ('Baby', 1)]}

预期输出：

{'Miss': [('you',3)], 'I': [('Love',1), ('Miss',2)], 'Love': ['you',1], 'Baby': ['You',1], 'You': ['are',1], 'are': ['the',1], 'best': ['I',1], 'the': ['best'], 'you': [('I', 1), ('Miss', 1), ('Baby', 1)]}

我希望输出按最大频率排序。如何改进代码以实现该输出。

1条回答

网友

1楼 · 发布于 2024-05-13 19:37:15

为了更接近预期结果，您可以编辑analize方法：

def analyze(self):
    words = self.raw_data.split()
    word_pairs = [[words[i],words[i+1]] for i in range(len(words)-1)]
    self.word_map = dict()

    for word in list(set(words)):
        pairword = []
        for pair in word_pairs:
            if word == pair[0]:
                pairword.append(pair[1])
        self.word_map[word] = Counter(pairword).most_common()

这张照片：

{'Baby': [('You', 1)],
 'I': [('Miss', 2), ('Love', 1)],
 'Love': [('you', 1)],
 'Miss': [('you', 3)],
 'You': [('are', 1)],
 'are': [('the', 1)],
 'best': [('I', 1)],
 'the': [('best', 1)],
 'you': [('I', 1), ('Miss', 1), ('Baby', 1)]}

这是你想要的，但没有分类。你需要写一个定制的打印方法来为你排序。你知道吗

例如，向类中添加以下方法：

def printfreq(self):
    sortkeys = sorted(self.word_map, key=lambda k:max(self.word_map[k], key=lambda val:val[1], default=(None, 0))[1], reverse=True)
    for kk in sortkeys:
        pprint.pprint(f"{kk} : {self.word_map[kk]}")

用self.printfreq()替换pprint.pprint(self.word_map)行将导致打印：

"Miss : [('you', 3)]"
"I : [('Miss', 2), ('Love', 1)]"
"you : [('I', 1), ('Miss', 1), ('Baby', 1)]"
"Love : [('you', 1)]"
"the : [('best', 1)]"
"You : [('are', 1)]"
"best : [('I', 1)]"
"Baby : [('You', 1)]"
"are : [('the', 1)]"

长排序键允许按列表中的最大频率对字典键进行排序。你知道吗

编辑

我给max添加了一个默认参数。这允许避免在输入中有一个或多个不重复的单词时可能出现的ValueError: max() arg is an empty sequence。你知道吗

示例

编辑

相关问题更多 >

编程相关推荐

热门问题

热门文章