对大型文本文件排序并进行二分查找

1 投票

1 回答

569 浏览

提问于 2025-04-17 22:45

假设有一个很大的文件，里面有一些文本信息 -

内容如下：

"Hello, How are you?
This is Bob
The contents of the file needs to be searched
and I'm a very huge file"

搜索字符串：

Bob

现在我想在这个文件里找一个词“Bob”，并且想用二分查找的方法来做……我该怎么做呢？

我试着用UNIX的SORT命令对文件进行排序，得到了以下输出 -

and I'm a very huge file
How are you?
The contents of the file needs to be searched
This is Bob

文件是排序了，但“Bob”这个词却在最后一行。

这样的问题在于，我并不是在搜索整行，而是想在文件中找一个单独的词……

那么，有什么更有效的方法来做到这一点呢？

效率优化数据处理字符串匹配文本搜索二分查找 unix命令文本文件排序

1 个回答

最有效的方法是创建一个生成器，这个生成器会一个一个地输出单词，然后你可以把这些单词和你要找的单词进行比较。

def get_next_word():
    with open("Input.txt") as in_file:
        for line in in_file:
            for word in line.strip().split():
                yield word

print any(word == "Bob" for word in get_next_word())
# True

我们使用了 any 函数，这个函数在找到匹配的单词时会立即停止运行。所以，我们不需要处理整个文件。

编辑：

如果你需要搜索多次，最好的办法是把单词列表转换成一个集合，然后用 in 操作符来检查这个单词是否存在。

words_set = set(get_next_word())

print "Bob" in words_set
# True
print "the" in words_set
# True
print "thefourtheye" in words_set
# False

回答于 2025-04-17 由 Python大师

分享举报

对大型文本文件排序并进行二分查找

1 个回答

撰写回答