Python：计算词频并将它们放入多个字典中

# split te text file into 11 documents(paragraphs) f = open('filename.txt', 'r') data = f.read() docs = data.split("\n\n") # creat 11 tf dictionaries dictstr = 'tf' dictlist = [dictstr + str(i) for i in range(10)] for i in range(10): for line in docs[i]: tokens = line.split() for term in tokens: term = term.lower() term = term.replace(',', '') term = term.replace('"', '') term = term.replace('.', '') term = term.replace('/', '') term = term.replace('(', '') term = term.replace(')', '') if not term in dict['tfi']: dict['tfi'][term] = 1 else: dict['tfi'][term] += 1

1条回答

网友

1楼 · 发布于 2024-06-16 13:56:50

这段代码读入您提供的文件，一次性删除不需要的字符（与每次使用.replace创建新字符串相比），并将字数保存在名为result的dict中。键是doc nums（'XXX9'->；'tf9'），值是带有单词计数的collections.Counter对象。你知道吗

>>> import re
... from collections import Counter
... 
... with open('filename.txt', 'r') as f:
...     data = f.read().lower()
... 
... clean_data = re.sub(r'[,"./()]', '', data)
... 
... result = {}
... for line in clean_data.splitlines():
...     if not line:
...         continue  # skip blank lines
...     elif line.startswith('xxx'):
...         doc_num = 'tf{}'.format(line[3:])
...     else:
...         result[doc_num] = Counter(line.split())
... 
>>> list(result.keys())
['tf7', 'tf10', 'tf5', 'tf2', 'tf9', 'tf4', 'tf11', 'tf3', 'tf6', 'tf8', 'tf1']

>>> for k, v in list(result['tf1'].items())[:15]:
...     print("'{}': {}".format(k, v))
... 
'class': 1
'then': 1
'emerge': 1
'industry': 1
'common': 1
'ourselves': 2
'models': 1
'short': 1
'mgi': 1
'it': 1
'actionable': 1
'time': 1
'why': 1
'theory': 1
'equip': 2

如果需要做任何更改来帮助回答您的问题，请告诉我！你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章