Pandas适用的论点因人而异

2条回答

网友

1楼 · 编辑于 2024-05-29 02:04:34

可以将该向量作为列添加到数据帧中（如果需要，请稍后删除）：

varespec['size'] = size

然后更改rare函数：

def rare(x):
    size = x['size']
    y = x.values[:-1]
    ...

或者，如果不想更改rare，请将其包装：

def rare_wrapper(x):
    size = x['size']
    y = x.values[:-1]
    return rare(y, size)

网友

2楼 · 编辑于 2024-05-29 02:04:34

您可以将结果表示为对整个NumPy数组的计算，而不是对varespec的每一行调用一次rare：

import pandas as pd
import pandas.rpy.common as com
import scipy.misc as misc
import numpy as np
np.random.seed(1)

def rare(y, size):
    notabs = ~np.isnan(y)
    t = y[notabs]
    N = np.sum(t)
    diff = N - t
    rare = np.sum(1 - misc.comb(diff, size)/misc.comb(N, size))
    return rare

def using_rare(size):
    return np.array([rare(varespec.iloc[i,:], size[i]) for i in xrange(50)])

def using_arrays(size):    
    N = varespec.sum(axis='columns', skina=True)
    diff = (N[:, np.newaxis] - varespec.values).T
    return np.sum(1 - misc.comb(diff, size) / misc.comb(N, size), axis=0)

varespec = com.load_data('BCI', 'vegan')
size = np.random.randint(varespec.shape[1], size=(varespec.shape[0],))

这表明using_rare和using_arrays产生相同的结果：

expected = using_rare(size)
result = using_arrays(size)
assert np.allclose(result, expected)

In [229]: %timeit using_rare(size)
10 loops, best of 3: 36.2 ms per loop

In [230]: %timeit using_arrays(size)
100 loops, best of 3: 2.89 ms per loop

这利用了scipy.misc.comb可以接受NumPy数组作为输入这一事实。所以可以调用comb(diff, size)，其中diff是一个形状数组（225，50），而size是一个形状数组（50，50）。因为size只在对comb的调用中使用，所以只需两次对comb的调用就可以执行所有的计算。不需要每行循环。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas适用的论点因人而异

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >