Python numpy.vectorize:ValueError:无法构造具有超过32个操作数的ufunc

import pandas as pd import numpy as np df = pd.DataFrame([[0] * 20], columns= ['a01', 'b02', 'c03', 'd04', 'e05', 'f06', 'g07', 'h08', 'i09', 'j10', 'k11', 'l12', 'n13', 'n14', 'o15', 'p16', 'q17', 'r18', 's19', 't20']) def func(a01, b02, c03, d04, e05, f06, g07, h08, i09, j10, k11, l12, n13, n14, o15, p16, q17, r18, s19, t20): # ... some complex logic here, if, for loops and so on return (a01, b02, c03, d04, e05, f06, g07, h08, i09, j10, k11, l12, n13, n14, o15, p16, q17, r18, s19, t20) df['a21'], df['b22'], df['c23'], df['d24'], df['e25'], df['f26'], df['g27'], df['h28'], df['i29'], df['j30'], \ df['k31'], df['l32'], df['n33'], df['n34'], df['o35'], df['p36'], df['q37'], df['r38'], df['s39'], df['t40'], \ = np.vectorize(func)( df['a01'], df['b02'], df['c03'], df['d04'], df['e05'], df['f06'], df['g07'], df['h08'], df['i09'], df['j10'], df['k11'], df['l12'], df['n13'], df['n14'], df['o15'], df['p16'], df['q17'], df['r18'], df['s19'], df['t20'])

Traceback (most recent call last): File "ufunc.py", line 18, in <module> = np.vectorize(func)( File "C:\Python\3.8.3\lib\site-packages\numpy\lib\function_base.py", line 2108, in __call__ return self._vectorize_call(func=func, args=vargs) File "C:\Python\3.8.3\lib\site-packages\numpy\lib\function_base.py", line 2186, in _vectorize_call ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args) File "C:\Python\3.8.3\lib\site-packages\numpy\lib\function_base.py", line 2175, in _get_ufunc_and_otypes ufunc = frompyfunc(_func, len(args), nout) ValueError: Cannot construct a ufunc with more than 32 operands (requested number were: inputs = 20 and outputs = 20)

1条回答

网友

1楼 · 发布于 2024-06-06 15:22:31

{}的基本目的是使{}广播的全部功能很容易应用于只接受标量输入的函数。因此，使用简单的格式化功能：

In [28]: def foo(i,j): 
    ...:     return f'{i}:{j}' 
    ...:                                                                                             
In [29]: foo(1,2)                                                                                    
Out[29]: '1:2'
In [31]: f = np.vectorize(foo, otypes=['U5'])

使用vectorize我可以传递匹配形状的列表/数组：

In [32]: f([1,2,3],[4,5,6])                            
Out[32]: array(['1:4', '2:5', '3:6'], dtype='<U3')

或使用（3,1）和（3）形状生成（3,3）结果：

In [33]: f(np.arange(3)[:,None], np.arange(4,7))                                                     
Out[33]: 
array([['0:4', '0:5', '0:6'],
       ['1:4', '1:5', '1:6'],
       ['2:4', '2:5', '2:6']], dtype='<U3')

我以前没见过你的错误，但我能猜出它是从哪里来的：

ufunc = frompyfunc(_func, len(args), nout)
ValueError: Cannot construct a ufunc with more than 32 operands 
(requested number were: inputs = 20 and outputs = 20)

实际工作是使用np.frompyfunc完成的，正如您所看到的，它需要2个数字、参数的数量和返回值的数量。你的情况是20和20。显然总共有32个限制。32是numpy可以拥有的最大维度数。我在其他一些案例中也看到过，比如np.select。在任何情况下，这个限制都深深地嵌入在numpy中，因此您无法避免它

您还没有告诉我们“复杂逻辑”，但显然它占用了整个数据帧行，并返回一个同等大小的行

让我们尝试将另一个函数应用于数据帧：

In [41]: df = pd.DataFrame(np.arange(12).reshape(3,4),columns=['a','b','c','d'])                     
In [42]: df                                                                                          
Out[42]: 
   a  b   c   d
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

In [44]: def foo(a,b,c,d): 
    ...:     print(a,b,c,d) 
    ...:     return 2*a, str(b), c*d, c/d 
    ...:                                                                                             
In [45]: foo(1,2,3,4)                                                                                
1 2 3 4
Out[45]: (2, '2', 12, 0.75)

In [47]: f = np.vectorize(foo)                                                                       
In [48]: f(df['a'],df['b'],df['c'],df['d'])                                                          
0 1 2 3                                 # a trial run to determine return type
0 1 2 3
4 5 6 7
8 9 10 11
Out[48]: 
(array([ 0,  8, 16]),
 array(['1', '5', '9'], dtype='<U1'),
 array([  6,  42, 110]),
 array([0.66666667, 0.85714286, 0.90909091]))

vectorize返回数组的元组，每个数组对应一个返回值

使用熊猫可以应用于相同的函数

In [80]: df.apply(lambda x:foo(*x),1)                                                                
0 1 2 3
4 5 6 7
8 9 10 11
Out[80]: 
0       (0, 1, 6, 0.6666666666666666)
1      (8, 5, 42, 0.8571428571428571)
2    (16, 9, 110, 0.9090909090909091)
dtype: object

简单的行迭代：

In [76]: for i in range(3): 
    ...:     print(foo(*df.iloc[i])) 
    ...:                                                                                             
0 1 2 3
(0, '1', 6, 0.6666666666666666)
4 5 6 7
(8, '5', 42, 0.8571428571428571)
8 9 10 11
(16, '9', 110, 0.9090909090909091)

时间安排

简化foo以进行计时：

In [92]: def foo1(a,b,c,d): 
    ...:     return 2*a, str(b), c*d, c/d 
    ...:                                                                                             
In [93]: f = np.vectorize(foo1)

我们还将测试应用程序到数组的行：

In [97]: arr = df.to_numpy()                                                                         
In [99]: [foo1(*row) for row in arr]                                                                 
Out[99]: 
[(0, '1', 6, 0.6666666666666666),
 (8, '5', 42, 0.8571428571428571),
 (16, '9', 110, 0.9090909090909091)]

vectorized明显快于apply：

In [100]: timeit f(df['a'],df['b'],df['c'],df['d'])                                                  
237 µs ± 3.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [101]: timeit df.apply(lambda x:foo1(*x),1)                                                       
1.04 ms ± 2.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

它甚至比数据帧行上更直接的迭代更快：

In [102]: timeit [foo1(*df.iloc[i]) for i in range(3)]                                               
528 µs ± 2.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

但是foo1对数组行的应用速度更快：

In [103]: timeit [foo1(*row) for row in arr]                                                         
17.5 µs ± 326 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [105]: timeit f(*arr.T)                                                                           
75.1 µs ± 81.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

最后两个例子表明np.vectorize相对于数组上的直接迭代是缓慢的。以各种方式在数据帧上迭代，会增加更多的计算时间

时间安排

相关问题更多 >

编程相关推荐

热门问题

热门文章