从cStringIO对象生成Numpy数组并避免复制

import numpy as np import cstringIO c = cStringIO.StringIO('\x01\x00\x00\x00\x01\x00\x00\x00') #Trying the iterartor abstraction b = np.fromiter(c,int) # The above fails with: ValueError: setting an array element with a sequence. #Trying the file abstraction b = np.fromfile(c,int) # The above fails with: IOError: first argument must be an open file #Trying the sequence abstraction b = np.array(c, int) # The above fails with: TypeError: long() argument must be a string or a number #Trying the string abstraction b = np.fromstring(c) #The above fails with: TypeError: argument 1 must be string or read-only buffer b = np.fromstring(c.getvalue(), int) # does work

2条回答

网友

1楼 · 编辑于 2024-06-08 02:42:47

由于cStringIO没有实现缓冲区接口，如果它的getvalue返回数据的副本，那么在不复制的情况下无法获取其数据。在

如果getvalue以字符串的形式返回缓冲区，而不进行复制，numpy.frombuffer(x.getvalue(), dtype='S1')将给出一个引用字符串的（只读）numpy数组，而不附加副本。在

np.fromiter(c, int)和np.array(c, int)不起作用的原因是cStringIO在迭代时，一次返回一行，与文件类似：

^{1}$

这样长的字符串不能转换为单个整数。在

^{pr2}$

最好不要太担心复印，除非它真的是个问题。原因是，使用生成器并将其传递给numpy.fromiter中的额外开销实际上可能比构造一个列表并将其传递给numpy.array所涉及的额外开销要大——与Python运行时开销相比，复制可能要便宜一些。在

但是，如果问题出在内存上，那么一个解决方案是将这些项直接放入最终的Numpy数组中。如果事先知道大小，可以预先分配。如果大小未知，可以使用数组中的.resize()方法根据需要增大它。在

网友

2楼 · 编辑于 2024-06-08 02:42:47

问题似乎是纽比不喜欢被赋予字符而不是数字。请记住，在Python中，单个字符和字符串具有相同的类型-numpy必须在幕后进行某种类型检测，并将'\x01'作为一个嵌套序列。在

另一个问题是cStringIO迭代它的行，而不是它的字符。在

下面的迭代器可以解决这两个问题：

^{1}$

像这样使用它（注意搜索！）公司名称：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章