Python查找最大值并打印文件前5行

-1 投票

1 回答

819 浏览

提问于 2025-04-18 11:46

我正在尝试写一个程序，程序里有一个文件夹，里面有一堆文本文件。如果我找到“color=”这个词，就会找出这个文件的模糊值和文件的起始行号。

我需要做的是：找出模糊值中的最大值，并且找出这个最大值对应的文件的前五行内容。

我写了一段代码，可以找到所有的模糊值，但我不知道怎么找出最大值，然后打印出前五个模糊值最大的文件。请帮帮我！

import os
from fuzzywuzzy import fuzz

path = r'C:\Python27' 
data = {}


for dir_entry in os.listdir(path):
    dir_entry_path = os.path.join(path, dir_entry)
    if os.path.isfile(dir_entry_path):
        with open(dir_entry_path, 'r') as my_file:
            for line in my_file:
                for part in line.split():
                    if "color=" in part:
                        print part
                        string1= "Filename:", dir_entry_path
                        print(string1)
                        string2= "Start line of file:", list(my_file)[0]
                        print(string1)
                        string3=(fuzz.ratio(string1, string2))
                        print(string3)

现在我的输出结果是：

"color="
('Filename:', 'C:\\Python27\\maybeee.py')
('Filename:', 'C:\\Python27\\maybeee.py')
20
"color="
('Filename:', 'C:\\Python27\\mayp.py')
('Filename:', 'C:\\Python27\\mayp.py')
28
part.startswith('color='):
('Filename:', 'C:\\Python27\\mayp1.py')
('Filename:', 'C:\\Python27\\mayp1.py')
29

我希望输出结果是，假设这里的最大值是29，那么我需要打印出这个最大值对应文件的前五行内容。请帮帮我！非常感谢大家的回答。

文件处理最大值查找数据输出文本分析行号提取模糊值文件内容打印

1 个回答

你的代码试图重新读取整个文件（在 list(myfile)[0] 这一行），而实际上已经有一个迭代器在遍历这个文件了。这会造成一些麻烦。

更好的做法是把前五行（这就是你想要的，对吧？）存储在一个变量里，然后在条件满足的时候再打印出来。

另外，你打印了 string1 两次。

你可以把循环改成：

from collections import defaultdict
filenames2fuzz = defaultdict(list)

for dir_entry in os.listdir(path):
    dir_entry_path = os.path.join(path, dir_entry)
    if os.path.isfile(dir_entry_path):
        first5lines = []
        condition_matched_in_file = False
        with open(dir_entry_path, 'r') as my_file:
            for line_nbr, line in enumerate(my_file):
                if line_nbr < 5: 
                    first5lines.append(line)
                for part in line.split():
                    if "color=" in part:
                        print part
                        string1= "Filename:", dir_entry_path
                        print(string1)
                        condition_matched_in_file = True

                        fuzziness = fuzz.ratio(string1, first5lines[0])
                        filenames2fuzz[dir_entry_path].append(fuzziness)
                        print(fuzziness)
        if condition_matched_in_file:
            print('\n'.join(first5lines))

# Now that you have a dictionary that holds all filenames with 
# their fuzziness values, you can easily find the first 5 lines again
# of the file that has the best fuzziness value.

best_fuzziness_ratio = 0  # as far as I can tell, the docs indicate it is between 0 and 100
for k, v in filenames2fuzz.items():
    if max(v) > best_fuzziness_ratio:
        best_fuzzy_file = k
        best_fuzziness_ratio = max(v)
print('File {} has the highest fuzzy value '
    'of {}. \nThe first 5 lines are:\n'
    ''.format(best_fuzzy_file, best_fuzziness_ratio))
with open(best_fuzzy_file) as f:
    for i in range(5):
        print(f.readline())

还有一些其他的优化可以做（可以看看 os.walk），不过如果不更详细地解释一下问题（比如你正在遍历的文件是什么，里面的内容大概是什么），我也只能帮到这里。

回答于 2025-04-18 由 Python大师

分享举报

Python查找最大值并打印文件前5行

1 个回答

撰写回答