Pandas Series.filter.values 返回的类型与 numpy 数组不同

1 投票

1 回答

626 浏览

提问于 2025-04-18 17:04

我正在尝试在两个数组上运行 scipy.stats.entropy 函数。这个函数是通过 Pandas 的 apply 函数在每一行上运行的：

def calculate_H(row):
    pk = np.histogram(row.filter(regex='stuff'), bins=16)[0]
    qk = row.filter(regex='other').values
    stats.entropy(pk, qk, base=2)

df['DKL'] = df.apply(calculate_H, axis=1)

但是我遇到了以下错误：

TypeError: ufunc 'xlogy' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

（我也尝试过 qk = row[row.filter(regex='other').index].values）

我知道问题出在 qk 上，我可以传递另一个数组作为 qk，这样就能正常工作。问题是 Pandas 给我的东西虽然说是 numpy 数组，但实际上并不完全是一个 numpy 数组。以下示例都能正常工作：

qk1 = np.array([12024, 9643, 7681, 8193, 8012, 7846, 7615, 7484, 5966, 11484, 13627, 17749, 9820, 5336,4611, 3366])
qk2 = Series([12024, 9643, 7681, 8193, 8012, 7846, 7615, 7484, 5966, 11484, 13627, 17749, 9820, 5336,4611, 3366]).values
qk3 = df.filter(regex='other').iloc[0].values

如果我检查类型，比如 type(qk) == type(qk1)，结果是 True（都是 numpy.ndarray）。或者如果我使用 np.array_equals，结果也是 True。

我唯一的线索是，当我打印出正常工作和不正常工作的数组时（不正常的在下面）：

[12024  9643  7681  8193  8012  7846  7615  7484  5966 11484 13627 17749  9820  5336  4611  3366]
[12024 9643 7681 8193 8012 7846 7615 7484 5966 11484 13627 17749 9820 5336 4611 3366]

注意上面的数组值之间的间隔更大。

总结：这两个表达式返回的东西是不同的。

df.filter(regex='other').iloc[0].values
df.iloc[0].filter(regex='other').values

错误处理数据结构 numpy 数据类型数据分析 pandas 数组操作 apply函数

1 个回答

我怀疑 qk 是一个 对象 数组，而不是整数数组。在 calculate_H 这个函数里，试试这样做：

qk = row.filter(regex='other').values.astype(int)

（也就是说，把这些值转换成整数数组）。

回答于 2025-04-18 由 Python大师

分享举报

Pandas Series.filter.values 返回的类型与 numpy 数组不同

1 个回答

撰写回答