基于文件内匹配字符串统计文件数量的Python脚本

0 投票

3 回答

1457 浏览

提问于 2025-04-18 15:29

我有一个文件夹，里面有很多文件。我想统计一下这些文件中，有多少个文件包含特定的文字，比如“Pathology”，或者包含某种模式，比如“ORC|||||xxxxxxxx||||||”。我试过以下的脚本：

import re, os
import glob

list_of_files = glob.glob('./*.hl7')

for fileName in list_of_files:
    fin = open( fileName, "r" )
    count = 0

for line in fin:
    if re.match("Pathology", line):
            count +=1
fin.close()

print count

但是运行后结果是0。我现在用的是Python 2.6.6，无法升级我的Python版本。请给我一些建议，看看怎么才能做到这一点。

文件处理模式匹配文本搜索数据分析文件匹配字符串统计

3 个回答

最简单的方法就是用 grep --files-with-matches StringOrPattern *.hl7 或者 grep -l StringOrPattern *.hl7 这两个命令。不过如果你想用Python来做这个事情，你需要调整一下你的代码缩进，因为你现在的代码只会报告最后一个文件中匹配的数量。

import re, os
import glob

list_of_files = glob.glob('./*.hl7')
files_with_matches = 0

for fileName in list_of_files:
    fin = open( fileName, "r" )
    count = 0

    for line in fin:
        if re.match("Pathology", line):
            count +=1
    fin.close()

    if count > 0:
        files_with_matches += 1
        print filename, count

print "Done", files_with_matches, "Matches"

回答于 2025-04-18 由 Python大师

分享举报

你可以用 grep 和 wc 来做到这一点：

grep Pathology *.hl7 | wc -l

这样可以告诉你有多少次匹配。

grep -c Pathology *.hl7

这个命令会列出哪些文件有匹配，并且每个文件的匹配次数。

回答于 2025-04-18 由 Python大师

分享举报

如果你愿意接受用Perl语言的解决方案，那这个方法就合适了。

现在这个代码会打印出所有匹配文件的名字。如果你只想要数量的话，可以把这一行 print $ARGV, "\n" 去掉。

use strict;
use warnings;

local @ARGV = glob './*.hl7';

my $count;

while (<>) {
  next unless /Pathology/i;
  ++$count;
  print $ARGV, "\n";
  close ARGV;
}

print "\n\n$count files found\n";

回答于 2025-04-18 由 Python大师

分享举报

基于文件内匹配字符串统计文件数量的Python脚本

3 个回答

撰写回答