当我对DataFrame.agg（）的func参数使用字符串时，如何知道调用了什么函数？

2条回答

网友

1楼 · 编辑于 2024-05-23 22:07:46

@Ch3steR，谢谢你帮我看到光明。不过我想详细说明一下你的答案

{a1}包括这些相关行

def aggregate(
    obj: AggObjType,
    arg: AggFuncType,
    *args,
    **kwargs,
):

...

if isinstance(arg, str):
    return obj._try_aggregate_string_function(arg, *args, **kwargs), None

然后我们追踪^{}

def _try_aggregate_string_function(self, arg: str, *args, **kwargs):
        """
        if arg is a string, then try to operate on it:
        - try to find a function (or attribute) on ourselves
        - try to find a numpy function
        - raise
        """
        assert isinstance(arg, str)

        f = getattr(self, arg, None)
        if f is not None:
            if callable(f):
                return f(*args, **kwargs)

            # people may try to aggregate on a non-callable attribute
            # but don't let them think they can pass args to it
            assert len(args) == 0
            assert len([kwarg for kwarg in kwargs if kwarg not in ["axis"]]) == 0
            return f

        f = getattr(np, arg, None)
        if f is not None:
            if hasattr(self, "__array__"):
                # in particular exclude Window
                return f(self, *args, **kwargs)

        raise AttributeError(
            f"'{arg}' is not a valid function for '{type(self).__name__}' object"
        )

因此，当您进行类似df.agg('foo')的调用时，熊猫首先查找名为foo的数据帧属性，然后查找名为foo的NumPy函数（假设foo不作为数据帧属性存在）

网友

2楼 · 编辑于 2024-05-23 22:07:46

这是内部细节，我不认为这会被记录下来

pandas dev以这种方式处理这些字符串，即'sum'，'mean'。它们有一个映射，将函数映射到该函数的内部cythonised实现

摘自^{}

_cython_table = {
        builtins.sum: "sum",
        builtins.max: "max",
        builtins.min: "min",
        np.all: "all",
        np.any: "any",
        np.sum: "sum",
        np.nansum: "sum",
        np.mean: "mean",
        np.nanmean: "mean",
        np.prod: "prod",
        np.nanprod: "prod",
        np.std: "std",
        np.nanstd: "std",
        np.var: "var",
        np.nanvar: "var",
        np.median: "median",
        np.nanmedian: "median",
        np.max: "max",
        np.nanmax: "max",
        np.min: "min",
        np.nanmin: "min",
        np.cumprod: "cumprod",
        np.nancumprod: "cumprod",
        np.cumsum: "cumsum",
        np.nancumsum: "cumsum",
    }

所以，Series.agg(sum)，Series.agg('sum')，Series.agg(np.sum)，Series.agg(np.nansum)都调用相同的内部cythonized函数

摘自^{}

    def _get_cython_func(self, arg: Callable) -> Optional[str]:
        """
        if we define an internal function for this argument, return it
        """
        return self._cython_table.get(arg)

你可以在^{}中找到它们是如何处理的，它们使用getattr在这里，似乎cythonized func是定义的类属性。我没有找到好的起点，但最好是在^{}看看^{}

def aggregate(
    obj: AggObjType,
    arg: AggFuncType,
    *args,
    **kwargs,
):
    ...
    ...
    if callable(arg):
        f = obj._get_cython_func(arg)
        if f and not args and not kwargs:
            return getattr(obj, f)(), None
   ...
   ...

相关问题更多 >

编程相关推荐

热门问题

热门文章

当我对DataFrame.agg（）的func参数使用字符串时，如何知道调用了什么函数？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >