Python - 检查文件夹中不在文件里的词

0 投票

4 回答

1607 浏览

提问于 2025-04-17 03:37

我正在写一个脚本，用来检查某个单词是否在一个路径里。

我遇到的问题是，我无法得到一个具体的结果，而是每个文件的结果都被列出来了。

Example:
path = "/opt/webserver/logs/"

file1.txt
file2.txt
file3.txt
....
...
..
file10000.txt

下面是代码：

#checkWordinFiles.py
import os

words = [ "Apple", "Oranges", "Starfruit" ]
path = "/opt/webserver/logs"
files = os.listdir(path)
for infile in files:
        for word in words:
                if word not in infile:
                        print word

问题是这个单词并不是每个文件都有。这个脚本会打印出那些不在文件里的单词，但我想要的是只有在所有文件都没有这个单词时才打印出来。

我希望这个脚本能打印出那些在路径下所有文件都没有的单词。

有点像每次都执行“grep Apple *”那样。

有什么想法吗？

脚本编写字符串匹配文件处理文件遍历 grep命令结果过滤词汇检查

4 个回答

-1

#checkWordinFiles.py
import os

words = [ "Apple", "Oranges", "Starfruit" ]
path = "/opt/webserver/logs"
files = os.listdir(path)

for word in words:
    for infile in files:
        if word in infile:
            break;
    else:
        print 'word - %s not found in any of the files' % (word,)

编辑：我之前没有注意到文件读取的逻辑。正如@Karl提到的，你应该先读取路径下的所有文件，然后再在文件中搜索单词。你可以使用 os.walk() 来获取路径下所有文件的列表，包括子目录中的文件。

回答于 2025-04-17 由 Python大师

分享举报

假设你想在 /path/to/file 这个路径下搜索一个词 "foo"。

你可以这样做：

for line in open("/path/to/file"):
    if "foo" in line:
         print "hurray. you found it"

你可以根据自己的需要修改这段代码。你可以使用 os.listdir() 来获取文件名，然后再进行后续操作。

回答于 2025-04-17 由 Python大师

分享举报

这里的问题是，os.listdir 这个函数会给你一个文件夹里所有文件的名字列表；所以你实际上是在文件名字里找单词，而不是在文件内容里找。要解决这个问题，你需要用文件名去打开并读取文件的内容。

下面是一个炫酷的写法：

import os

def contents(filename):
    with file(filename) as f: return f.read()

words = set(["Apple", "Oranges", "Starfruit"])
path = "/opt/webserver/logs"
filenames = os.listdir(path)
print words.difference(
    reduce(lambda x, y: x.union(y), (
        # Note that the following assumes we really want to treat the file
        # as a sequence of words, and not do general substring searching.
        # For example, it will miss "apple" if the file contains "pineapples".
        set(contents(filename).split()).intersection(words)
        for filename in filenames
        # In fact, the .intersection call there is redundant, but might improve
        # performance and will probably save memory at least.
    ))
)

回答于 2025-04-17 由 Python大师

分享举报

Python - 检查文件夹中不在文件里的词

4 个回答

撰写回答