给定一个文本,计算所有两个连续单词的出现次数

2024-03-28 23:32:43 发布

您现在位置:Python中文网/ 问答频道 /正文


输入:

Once upon a time a time this upon a


输出:

^{pr2}$


代码:

def countTuples(path):
    dic = dict()
    with codecs.open(path, 'r', 'utf-8') as f:
        for line in f:
            s = line.split()
            for i in range (0, len(s)-1):
                dic[str(s[i]) + ' ' + str(s[i+1])] += 1
    return dic

我得到了这个错误:

File "C:/Users/user/Anaconda3/hw2.py", line 100, in countTuples
    dic[str(s[i]) + ' ' + str(s[i+1])] += 1
TypeError: list indices must be integers or slices, not str

如果我删除+=并放置=1一切正常,我想问题是当我试图访问一个条目来提取一个还不存在的值时?在

我能做些什么来解决这个问题?在


Tags: path代码infortimedeflinethis
3条回答

不需要那么难,只需使用Counter并使用zip将二元组输入计数器,如:

from collections import Counter

def countTuples(path):
    dic = Counter()
    with codecs.open(path, 'r', 'utf-8') as f
        for line in f:
            s = line.split()
            dic.update('%s %s'%t for t in zip(s,s[1:]))
    return dic

只需对代码进行最小更改的一个解决方案是使用defaultdict

from collections import defaultdict

line = 'Once upon a time a time this upon a'

dic = defaultdict(int)

s = line.split()

for i in range(0, len(s)-1):
    dic[str(s[i]) + ' ' + str(s[i+1])] += 1

这会产生:

^{pr2}$

你的功能就是:

def countTuples(path):
    dic = defaultdict(int)
    with codecs.open(path, 'r', 'utf-8') as f:
        for line in f:
            s = line.split()
            for i in range (0, len(s)-1):
                dic[str(s[i]) + ' ' + str(s[i+1])] += 1
    return dic

您可以使用^{}使您的解决方案有效。使用defaultdict,可以指定键值对的值的默认类型。这允许您像+=1那样对尚未显式创建的键进行赋值:

import codecs
from collections import defaultdict

def countTuples(path):
    dic = defaultdict(int)
    with codecs.open(path, 'r', 'utf-8') as f:
        for line in f:
            s = line.split()
            for i in range (0, len(s)-1):
                dic[str(s[i]) + ' ' + str(s[i+1])] += 1
    return dic

>>> {'Once upon': 1,
     'a time': 2,
     'this upon': 1,
     'time a': 1,
     'time this': 1,
     'upon a': 2})

相关问题 更多 >