pandas Series上分位数函数的逆是什么？

67 投票

12 回答

43895 浏览

数据工程师

提问于 2025-04-28 19:32

分位数函数可以告诉我们一个给定的 pandas 序列 s 的分位数。

比如：

s.quantile(0.9) 的结果是 4.2

那么有没有一个反向的函数（也就是累积分布），可以找到一个值 x，使得：

s.quantile(x) = 4

谢谢！

暂无标签

12 个回答

在s中，有多少比例的记录是小于x：

# Find the percentile of `x` in `s`
(s<x).mean()  # i.e., (s<x).sum()/len(s)

就这样。

当s是排好序的时候，你也可以使用 pandas.Series.searchsorted：

s.searchsorted(x)/len(s)

回答于 2025-04-28 由 Python大师

分享举报

我不知道有没有一种简单的一行代码可以做到，但你可以用scipy这个库来实现：

import pandas as pd
import numpy as np
from scipy.interpolate import interp1d

# set up a sample dataframe
df = pd.DataFrame(np.random.uniform(0,1,(11)), columns=['a'])
# sort it by the desired series and caculate the percentile
sdf = df.sort('a').reset_index()
sdf['b'] = sdf.index / float(len(sdf) - 1)
# setup the interpolator using the value as the index
interp = interp1d(sdf['a'], sdf['b'])

# a is the value, b is the percentile
>>> sdf
    index         a    b
0      10  0.030469  0.0
1       3  0.144445  0.1
2       4  0.304763  0.2
3       1  0.359589  0.3
4       7  0.385524  0.4
5       5  0.538959  0.5
6       8  0.642845  0.6
7       6  0.667710  0.7
8       9  0.733504  0.8
9       2  0.905646  0.9
10      0  0.961936  1.0

现在我们可以看到这两个函数是互相反的。

>>> df['a'].quantile(0.57)
0.61167933268395969
>>> interp(0.61167933268395969)
array(0.57)
>>> interp(df['a'].quantile(0.43))
array(0.43)

interp函数还可以接收列表、numpy数组或者pandas数据系列，实际上任何可以迭代的东西都可以！

回答于 2025-04-28 由 Python大师

分享举报

从数学的角度来看，你想要找到的是一个叫做CDF的东西，或者说是计算一个值为q时，s小于或等于这个值的概率。

F(q) = Pr[s <= q]

你可以使用numpy.mean，试试这行代码：

np.mean(s.to_numpy() <= q)

回答于 2025-04-28 由 Python大师

分享举报

排序可能会很耗费资源。如果你只是想找一个单独的值，我觉得用下面的方法计算会更好：

s = pd.Series(np.random.uniform(size=1000))
( s < 0.7 ).astype(int).mean() # =0.7ish

可能还有其他方法可以避免使用 int(bool) 这种复杂的写法。

回答于 2025-04-28 由 Python大师

分享举报

使用 scipy.stats.percentileofscore：

# libs required
from scipy import stats
import pandas as pd
import numpy as np

# generate ramdom data with same seed (to be reproducible)
np.random.seed(seed=1)
df = pd.DataFrame(np.random.uniform(0, 1, (10)), columns=['a'])

# quantile function
x = df.quantile(0.5)[0]

# inverse of quantile
stats.percentileofscore(df['a'], x)

回答于 2025-04-28 由 Python大师

分享举报

pandas Series上分位数函数的逆是什么？

12 个回答

撰写回答