获取zarr数组切片的视图

import matplotlib.pyplot as plt import numpy as np import zarr arr = zarr.open( 'temp.zarr', mode='a', shape=(4, 32, 32), chunks=(1, 16, 16), dtype=np.float32, ) arr[:] = np.random.random((4, 32, 32)) fig, ax = plt.subplots(1, 2) arr[2, ...] = 0 # works fine, "wipes" slice 2 ax[0].imshow(arr[2]) # all 0s arr_slice = arr[1] # returns a NumPy array — loses ties to zarr on disk arr_slice[:] = 0 ax[1].imshow(arr[1]) # no surprises — shows original random data plt.show()

2条回答

网友

1楼 · 编辑于 2024-06-02 08:59:35

一种方法是使用自定义存储对象。您可以将DirectoryStore或数据所在的任何其他基存储子类化，并重写getitem/setitem方法。这可能比你希望的要难

一个更好的选择是复制Xarray的LazilyIndexedArray类型，这是Stephan Hoyer写的一个魔术：https://github.com/pydata/xarray/blob/master/xarray/core/indexing.py#L516。我想这些正是你想要的。它们不是Xarray公共API的一部分，但在我看来，它们非常有用，实际上应该放在一个独立的包中

这里还有一篇与此相关的博客文章： https://medium.com/informatics-lab/creating-a-data-format-for-high-momentum-datasets-a394fa48b671

网友

2楼 · 编辑于 2024-06-02 08:59:35

TensorStore库是专门为实现这一点而设计的-所有索引操作都会生成惰性视图：

import tensorstore as ts
import numpy as np
arr = ts.open({
  'driver': 'zarr',
  'kvstore': {
    'driver': 'file',
    'path': '.',
  },
  'path': 'temp.zarr',
  'metadata': {
    'dtype': '<f4',
    'shape': [4, 32, 32],
    'chunks': [1, 16, 16],
    'order': 'C',
    'compressor': None,
    'filters': None,
    'fill_value': None,
  },
}, create=True).result()
arr[1] = 42  # Overwrites, just like numpy/zarr library
view = arr[1] # Returns a lazy view, no I/O performed
np.array(view) # Reads from the view
# Returns JSON spec that can be passed to `ts.open` to reopen the view.
view.spec().to_json()

您可以在此处阅读有关这些惰性视图的“索引转换”机制的更多信息： https://google.github.io/tensorstore/index_space.html#index-transform https://google.github.io/tensorstore/python/indexing.html

免责声明：我是TensorStore的作者

相关问题更多 >

编程相关推荐

热门问题

热门文章