在recarray中自动字符串长度

4 投票

2 回答

904 浏览

提问于 2025-04-15 15:37

如果我这样创建一个记录数组：

In [29]: np.rec.fromrecords([(1,'hello'),(2,'world')],names=['a','b'])

结果看起来不错：

Out[29]: 
rec.array([(1, 'hello'), (2, 'world')], 
      dtype=[('a', '<i8'), ('b', '|S5')])

但是如果我想指定数据类型：

In [32]: np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),('b',np.str)])

字符串的长度被设置为零：

Out[32]: 
rec.array([(1, ''), (2, '')], 
      dtype=[('a', '|i1'), ('b', '|S0')])

我需要为所有数字类型指定数据类型，因为我关心的是 int8、int16、int32 等等，但我又想利用自动检测字符串长度的功能，这个功能在我不指定数据类型时可以正常工作。我试着把 np.str 替换成 None，但没有成功。我知道我可以指定 '|S5' 这样的格式，但我不知道字符串的长度应该设置成多少。

numpy 数据类型数字类型字符串长度 recarray 自动检测记录数组

2 个回答

我不知道怎么让numpy帮你判断某些数据类型的特性，而不去判断其他的。不过你可以试试，比如：

data = [(1,'hello'),(2,'world')]
dlen = max(len(s) for i, s in data)
st = '|S%d' % dlen
np.rec.fromrecords(data, dtype=[('a',np.int8), ('b',st)])

回答于 2025-04-15 由 Python大师

分享举报

如果你不需要把字符串当作字节来处理，可以用对象数据类型来表示它们。这样做其实是存储一个指针，而不是实际的字节内容：

In [38]: np.array(data, dtype=[('a', np.uint8), ('b', np.object)])
Out[38]: 
array([(1, 'hello'), (2, 'world')], 
      dtype=[('a', '|u1'), ('b', '|O8')])

另外，Alex的想法也很好：

new_dt = []

# For each field of a given type and alignment, determine
# whether the field is an integer.  If so, represent it as a byte.

for f, (T, align) in dt.fields.iteritems():
    if np.issubdtype(T, int):
        new_dt.append((f, np.uint8))
    else:
        new_dt.append((f, T))

new_dt = np.dtype(new_dt)
np.array(data, dtype=new_dt)

这样应该会得到

array([(1, 'hello'), (2, 'world')], 
      dtype=[('f0', '|u1'), ('f1', '|S5')])

回答于 2025-04-15 由 Python大师

分享举报

在recarray中自动字符串长度

2 个回答

撰写回答