numpy - 如何给数组第一列的每个元素加上一个值?
我有一个这样的数组:
array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
('6601', 2.2452745388799898e-27, 0.99999999995270605),
('21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
('45164194', 1.0413482803123399e-24, 0.99999999997453404),
('45164198', 1.09470356446595e-24, 0.99999999997635303),
('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
我想把它变成这样:(在第一列的每个值前面加上前缀'2R')
array([('2R:6506', 4.6725971801473496e-25, 0.99999999995088695),
('2R:6601', 2.2452745388799898e-27, 0.99999999995270605),
('2R:21801', 1.9849650921836601e-31, 0.99999999997999001), ...,
('2R:45164194', 1.0413482803123399e-24, 0.99999999997453404),
('2R:45164198', 1.09470356446595e-24, 0.99999999997635303),
('2R:45164519', 3.7521365799080699e-24, 0.99999999997453404)],
dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
我查了一些关于nditer的资料(但我想支持早期版本的numpy)。还有人说应该避免使用循环。
3 个回答
1
另一种稍微快一点的解决方案是使用列表推导和 +
运算符。虽然我不太明白为什么这样会更快,但这确实看起来很优雅,也很基础。
a['pos'] = ["2R:" + x for x in a['pos']]
时间测试:
%timeit a['pos'] = ["2R:" + x for x in a['pos']]
8.07 ms ± 64.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
9.53 ms ± 391 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit a['pos'] = add('2R:', a['pos'])
14.2 ms ± 337 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
附注:我用稍微不同的方式创建了数组 a
:
a = np.empty(20000, dtype=[('pos', 'U5'), ('par1', '<f8'), ('par2', '<f8')])
因为如果我对 pos
使用类型 Sxxx
,那么连接时会出现类型错误。
2
一个简单的(虽然可能不是最优的)解决方案就是:
a = np.array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
('6601', 2.2452745388799898e-27, 0.99999999995270605),
('21801', 1.9849650921836601e-31, 0.99999999997999001),
('45164194', 1.0413482803123399e-24, 0.99999999997453404),
('45164198', 1.09470356446595e-24, 0.99999999997635303),
('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
In [11]: a
Out[11]:
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
('2R:6601', 2.24527453887999e-27, 0.999999999952706),
('2R:21801', 1.98496509218366e-31, 0.99999999997999),
('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])
我喜欢@falsetru的答案,因为他使用了核心的numpy库,但令人惊讶的是,列表推导式似乎稍微快一点:
In [19]: a = np.empty(20000, dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])
In [20]: %timeit a['pos'] = [''.join(('2R:',x)) for x in a['pos']]
100 loops, best of 3: 11.1 ms per loop
In [21]: %timeit a['pos'] = add('2R:', a['pos'])
100 loops, best of 3: 15.7 ms per loop
不过,建议你在自己的使用场景和硬件上进行基准测试,看看哪种方法更适合你的实际应用。我学到的一件事是,在某些情况下,基本的Python结构可能会比numpy内置的函数表现得更好,这取决于具体的任务。
6
使用 numpy.core.defchararray.add
:
>>> from numpy import array
>>> from numpy.core.defchararray import add
>>>
>>> xs = array([('6506', 4.6725971801473496e-25, 0.99999999995088695),
... ('6601', 2.2452745388799898e-27, 0.99999999995270605),
... ('21801', 1.9849650921836601e-31, 0.99999999997999001),
... ('45164194', 1.0413482803123399e-24, 0.99999999997453404),
... ('45164198', 1.09470356446595e-24, 0.99999999997635303),
... ('45164519', 3.7521365799080699e-24, 0.99999999997453404)],
... dtype=[('pos', '|S100'), ('par1', '<f8'), ('par2', '<f8')])
>>> xs['pos'] = add('2R:', xs['pos'])
>>> xs
array([('2R:6506', 4.67259718014735e-25, 0.999999999950887),
('2R:6601', 2.24527453887999e-27, 0.999999999952706),
('2R:21801', 1.98496509218366e-31, 0.99999999997999),
('2R:45164194', 1.04134828031234e-24, 0.999999999974534),
('2R:45164198', 1.09470356446595e-24, 0.999999999976353),
('2R:45164519', 3.75213657990807e-24, 0.999999999974534)],
dtype=[('pos', 'S100'), ('par1', '<f8'), ('par2', '<f8')])
更新:你可以用 num.char.add
来代替 numpy.core.defchararray.add
(这是 @joel-buursma 提到的):
>>> import numpy
>>> numpy.char == numpy.core.defchararray
True