从多值字典访问最频繁的子值

2024-04-26 03:19:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写一个脚本,在一个目录中查找所有文本文件,然后查找文件中的行数和最常用的单词。我知道这不是最简单/最整洁的方法,但我对python还很陌生(2周)。你知道吗

我遇到的一个小问题是我有两本主要的词典。一个存储文件和行数,另一个存储文件、行数和字数,其频率如下:

dict1_example = {'file':'lines'}
dict2_example = {'file': 'lines', ('word':'count')}

我希望能够从所有文件中提取最频繁的单词,即访问第二个字典的('word':'count')位。你知道吗

有没有一种方法可以仅仅从这个部分获取信息,或者我需要使用函数来创建一个额外的字典??你知道吗

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import glob
import os
from sys import argv
import re
from collections import Counter

script, directory = argv

def file_len2(filename2):
    with open(filename2) as f2:
        l2 = [x for x in f2.readlines() if x != "\n"]
    return len(l2)

def word_count(filename3):
    with open(filename3) as f3:
        passage = f3.read()

    stop_words = ("THE", "OF", "A", "TO", "AND", "IS", "IN", "YOU", "THAT", "IT", "THIS", "YOUR", "AS", "AN", "BUT", "FOR")
    words = re.findall(r'\w+', passage)
    cap_words = [word.upper() for word in words if word.upper() not in stop_words]
    word_counts = Counter(cap_words)
    return max(word_counts, key=word_counts.get), word_counts[max(word_counts, key=word_counts.get)]



files = glob.glob(directory + "/*.txt")


length = {}
file_info = {}

for file in files:
    lines = file_len2(file)
    length[file] = lines
    file_info[file] = lines, word_count(file)


for file, lines in length.iteritems():
    print '{}: {}'.format(os.path.basename(file), lines), word_count(file);




maximum_file = max(length, key=length.get)
minimum_file = min(length, key=length.get)

maximum_lines = os.path.basename(max(length, key=length.get))
minimum_lines = os.path.basename(min(length, key=length.get))


print "The file with the maximum number of lines:" 
print "%r lines in %r " % (length[maximum_file], maximum_lines)

print "The file with the minimum number of lines:" 
print "%r lines in %r" % (length[minimum_file], minimum_lines)

sum_lines = sum(length.values())
number_of_values = len(length)

average = sum_lines / number_of_values

print "The average number of lines in a text file in given directory: ", average, "- Rounded down"

Tags: keyinimportnumbergetcountlengthword
1条回答
网友
1楼 · 发布于 2024-04-26 03:19:01

我似乎又做了一个口述来解决我的问题:

word_freq[file] = word_count(file)

通过切换

def word_count(filename3)

然后我用这个来得到最常见的词:

print word_freq[max(word_freq, key=word_freq.get)]

相关问题 更多 >