C++中用于缓冲读取的Python生成器等价物

3 投票

1 回答

941 浏览

提问于 2025-04-16 09:55

Guido Van Rossum在这篇文章中展示了Python的简单易用，并使用了一个函数来处理未知长度文件的缓冲读取：

def intsfromfile(f):
    while True:
        a = array.array('i')
        a.fromstring(f.read(4000))
        if not a:
            break
        for x in a:
            yield x

我需要在C++中做同样的事情，因为速度更重要！我有很多文件，里面包含了无符号64位整数的排序列表，我需要把它们合并。我找到了一段不错的代码来合并向量。

我现在遇到的问题是，如何让一个未知长度的文件的ifstream表现得像一个vector，这样我就可以一直遍历它，直到文件结束。有没有什么建议？我用istreambuf_iterator的方向对吗？

迭代器排序算法 c# 文件处理无符号整数 vector 缓冲读取 ifstream

1 个回答

为了把一个 ifstream（或者说任何输入流）伪装成像迭代器那样的形式，你可以使用 istream_iterator 或者 istreambuf_iterator 这两个模板类。前者适合处理格式比较重要的文件。比如说，一个里面全是用空格分开的整数的文件，可以这样读进一个向量的迭代器范围构造函数：

#include <fstream>
#include <vector>
#include <iterator> // needed for istream_iterator

using namespace std;

int main(int argc, char** argv)
{
    ifstream infile("my-file.txt");

    // It isn't customary to declare these as standalone variables,
    // but see below for why it's necessary when working with
    // initializing containers.
    istream_iterator<int> infile_begin(infile);
    istream_iterator<int> infile_end;

    vector<int> my_ints(infile_begin, infile_end);

    // You can also do stuff with the istream_iterator objects directly:
    // Careful! If you run this program as is, this won't work because we
    // used up the input stream already with the vector.

    int total = 0;
    while (infile_begin != infile_end) {
        total += *infile_begin;
        ++infile_begin;
    }

    return 0;
}

istreambuf_iterator 则是用来逐个字符读取文件的，它不管输入的格式。也就是说，它会把所有字符都返回，包括空格、换行符等等。根据你的应用场景，这种方式可能更合适。

注意：Scott Meyers 在他的书 Effective STL 中解释了为什么上面需要单独声明 istream_iterator 的变量。通常情况下，你会这样做：

ifstream infile("my-file.txt");
vector<int> my_ints(istream_iterator<int>(infile), istream_iterator<int>());

但是，C++ 对第二行的解析方式非常奇怪。它把这行看成是一个名为 my_ints 的函数声明，这个函数有两个参数，并且返回一个 vector<int>。第一个参数是类型为 istream_iterator<int> 的变量，名叫 infile（括号会被忽略）。第二个参数是一个没有名字的函数指针，它不接受任何参数（因为有括号）并返回一个类型为 istream_iterator<int> 的对象。

这挺酷的，但如果你不注意的话，也会让人很烦。

编辑

下面是一个使用 istreambuf_iterator 来读取一个64位数字文件的例子，这些数字是连在一起的：

#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>

using namespace std;

int main(int argc, char** argv)
{
    ifstream input("my-file.txt");
    istreambuf_iterator<char> input_begin(input);
    istreambuf_iterator<char> input_end;

    // Fill a char vector with input file's contents:
    vector<char> char_input(input_begin, input_end);
    input.close();

    // Convert it to an array of unsigned long with a cast:
    unsigned long* converted = reinterpret_cast<unsigned long*>(&char_input[0]);
    size_t num_long_elements = char_input.size() * sizeof(char) / sizeof(unsigned long);

    // Put that information into a vector:
    vector<unsigned long> long_input(converted, converted + num_long_elements);

    return 0;
}

不过，我个人不太喜欢这个解决方案（使用 reinterpret_cast，暴露了 char_input 的数组），但我对 istreambuf_iterator 不够熟悉，所以不敢轻易使用一个针对64位字符的模板，这样会简单很多。

回答于 2025-04-16 由 Python大师

分享举报

C++中用于缓冲读取的Python生成器等价物

1 个回答

撰写回答