重写nunique（）或向i添加包装器

2024-04-25 15:14:52 发布

男 | 程序猿一只，喜欢编程写python代码。

我对nunique有个问题，它被一个空的groupby调用，它以一个错误结束

>df
Empty DataFrame
Columns: [A, B]
Index: []
>df.groupby(['A'])['B'].nunique()
IndexError: index 0 is out of bounds for axis 0 with size 0

我想添加一个简单的检查，如果groupby是空的，只返回一个空序列。你知道吗

我在python portable中更改了nunique的def，并在那里添加了一个有效的检查：

def nunique(self, dropna=True):
    """ Returns number of unique elements in the group """
    ids, _, _ = self.grouper.group_info
    val = self.obj.get_values()

    try:
        sorter = np.lexsort((val, ids))
    except TypeError:  # catches object dtypes
        assert val.dtype == object, \
            'val.dtype must be object, got %s' % val.dtype
        val, _ = algos.factorize(val, sort=False)
        sorter = np.lexsort((val, ids))
        isnull = lambda a: a == -1
    else:
        isnull = com.isnull

    ids, val = ids[sorter], val[sorter]

    if ids.size == 0: ######Thats what I've added
        return Series(ids,index=self.grouper.result_index,name=self.name)

    # group boundaries are where group ids change
    # unique observations are where sorted values change
    idx = np.r_[0, 1 + np.nonzero(ids[1:] != ids[:-1])[0]]
    inc = np.r_[1, val[1:] != val[:-1]]

    # 1st item of each group is a new unique observation
    mask = isnull(val)
    if dropna:
        inc[idx] = 1
        inc[mask] = 0
    else:
        inc[mask & np.r_[False, mask[:-1]]] = 0
        inc[idx] = 1

    out = np.add.reduceat(inc, idx).astype('int64', copy=False)
    res = out if ids[0] != -1 else out[1:]
    ri = self.grouper.result_index

    # we might have duplications among the bins
    if len(res) != len(ri):
        res, out = np.zeros(len(ri), dtype=out.dtype), res
        res[ids] = out

    return Series(res,
                  index=ri,
                  name=self.name)

问题是我不能改变portable本身，我需要重写nunique或者添加一个包装函数，当groupby（…）.nunique（）被调用时，这个包装函数将被调用。我在网上看了看，但什么也找不到（也不懂）。抱歉，如果这可能是一个简单的问题，但我是一个新手程序员，所以请对我放松：）

谢谢你

Tags： self ids index if np group res val

1条回答

网友

1楼 · 发布于 2024-04-25 15:14:52

如何使用apply函数添加一个条件来检查组的长度？你知道吗

df.groupby(['A'])['B'].apply(lambda x: x.nunique() if len(x)>0 else 0)

重写nunique（）或向i添加包装器

相关问题更多 >

编程相关推荐

热门问题

热门文章

重写nunique（）或向i添加包装器

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >