在“pyarrow”测试中使用inmemory文件系统

import pyarrow.parquet as pq pq.write_to_dataset( score_table, root_path=AWS_ZEBRA_OUTPUT_S3_PREFIX, filesystem=filesystem, partition_cols=[ EQF_SNAPSHOT_YEAR_PARTITION, EQF_SNAPSHOT_MONTH_PARTITION, EQF_SNAPSHOT_DAY_PARTITION, ZEBRA_COMPUTATION_TIMESTAMP ] )

2条回答

网友

1楼 · 编辑于 2024-05-15 02:48:33

最后，我手动实现了pyarrow.FileSystemABC的一个实例。似乎使用mock进行测试是失败的，因为pyarrow（不是以最Pythonic的方式）检查传递给write_to_dataset：https://github.com/apache/arrow/blob/5e201fed061f2a95e66889fa527ae8ef547e9618/python/pyarrow/filesystem.py#L383的filesystem参数的类型。我建议将此方法中的逻辑更改为不显式检查类型（甚至isinstance也更好！）以便于测试。你知道吗

网友

2楼 · 编辑于 2024-05-15 02:48:33

如果filesystem是None，则可以将内存中的文件对象传递给write_to_dataset。你知道吗

所以你的电话可能会变成：

from io import BytesIO
import pyarrow.parquet as pq

with BytesIO() as f:
    pq.write_to_dataset(
        score_table,
        root_path=f,
        filesystem=None,
        partition_cols=[
            EQF_SNAPSHOT_YEAR_PARTITION,
            EQF_SNAPSHOT_MONTH_PARTITION,
            EQF_SNAPSHOT_DAY_PARTITION,
            ZEBRA_COMPUTATION_TIMESTAMP
        ]
    )

pyarrow来源的相关行：

def resolve_filesystem_and_path(where, filesystem=None):
    """
    Return filesystem from path which could be an HDFS URI, a local URI,
    or a plain filesystem path.
    """
    if not _is_path_like(where):
        if filesystem is not None:
            raise ValueError("filesystem passed but where is file-like, so"
                             " there is nothing to open with filesystem.")
        return filesystem, where

https://github.com/apache/arrow/blob/207b3507be82e92ebf29ec7d6d3b0bb86091c09a/python/pyarrow/filesystem.py#L402-L411

相关问题更多 >

编程相关推荐

热门问题

热门文章