为什么字符串的和被转换成浮点数

2024-05-15 09:00:53 发布

您现在位置:Python中文网/ 问答频道 /正文

设置

考虑以下数据帧(注意字符串):

df = pd.DataFrame([['3', '11'], ['0', '2']], columns=list('AB'))
df

enter image description here

^{pr2}$

问题

我要算数。我希望字符串被连接起来。在

df.sum()

A     30.0
B    112.0
dtype: float64

看起来好像字符串被连接起来然后被转换成float。这有什么好的理由吗?这是虫子吗?任何有启发性的东西都会被投赞成票。在


Tags: columns数据字符串dataframedfabfloatlist
1条回答
网友
1楼 · 发布于 2024-05-15 09:00:53

与旧的堆栈跟踪相匹配。通过Pycharm也学到了一些关于pdb的知识。结果如下:

(一)

cls.sum = _make_stat_function(
            'sum', name, name2, axis_descr,
            'Return the sum of the values for the requested axis',
            nanops.nansum)

让我们看看_make_stat_function

(二)

^{pr2}$

最后一行是关键。这有点有趣,因为在pandas.core内大约有7个不同的_reduces。pdb说它是pandas.core.frame中的一个。让我们看看。在

(三)

def _reduce(self, op, name, axis=0, skipna=True, numeric_only=None,
            filter_type=None, **kwds):
    axis = self._get_axis_number(axis)

    def f(x):
        return op(x, axis=axis, skipna=skipna, **kwds)

    labels = self._get_agg_axis(axis)

    # exclude timedelta/datetime unless we are uniform types
    if axis == 1 and self._is_mixed_type and self._is_datelike_mixed_type:
        numeric_only = True

    if numeric_only is None:
        try:
            values = self.values
            result = f(values)
        except Exception as e:

            # try by-column first
            if filter_type is None and axis == 0:
                try:

                    # this can end up with a non-reduction
                    # but not always. if the types are mixed
                    # with datelike then need to make sure a series
                    result = self.apply(f, reduce=False)
                    if result.ndim == self.ndim:
                        result = result.iloc[0]
                    return result
                except:
                    pass

            if filter_type is None or filter_type == 'numeric':
                data = self._get_numeric_data()
            elif filter_type == 'bool':
                data = self._get_bool_data()
            else:  # pragma: no cover
                e = NotImplementedError("Handling exception with filter_"
                                        "type %s not implemented." %
                                        filter_type)
                raise_with_traceback(e)
            result = f(data.values)
            labels = data._get_agg_axis(axis)
    else:
        if numeric_only:
            if filter_type is None or filter_type == 'numeric':
                data = self._get_numeric_data()
            elif filter_type == 'bool':
                data = self._get_bool_data()
            else:  # pragma: no cover
                msg = ("Generating numeric_only data with filter_type %s"
                       "not supported." % filter_type)
                raise NotImplementedError(msg)
            values = data.values
            labels = data._get_agg_axis(axis)
        else:
            values = self.values
        result = f(values)

    if hasattr(result, 'dtype') and is_object_dtype(result.dtype):
        try:
            if filter_type is None or filter_type == 'numeric':
                result = result.astype(np.float64)
            elif filter_type == 'bool' and notnull(result).all():
                result = result.astype(np.bool_)
        except (ValueError, TypeError):

            # try to coerce to the original dtypes item by item if we can
            if axis == 0:
                result = com._coerce_to_dtypes(result, self.dtypes)

    return Series(result, index=labels)

天哪,说说失控的功能。有人需要重构!让我们放大故障线路:

if hasattr(result, 'dtype') and is_object_dtype(result.dtype):
    try:
        if filter_type is None or filter_type == 'numeric':
            result = result.astype(np.float64)

你最好相信最后一行会被执行。以下是一些pdb跟踪:

> c:\users\matthew\anaconda2\lib\site-packages\pandas\core\frame.py(4801)_reduce()
-> result = result.astype(np.float64)
(Pdb) l
4796                result = f(values)
4797    
4798            if hasattr(result, 'dtype') and is_object_dtype(result.dtype):
4799                try:
4800                    if filter_type is None or filter_type == 'numeric':
4801 ->                     result = result.astype(np.float64)
4802                    elif filter_type == 'bool' and notnull(result).all():
4803                        result = result.astype(np.bool_)
4804                except (ValueError, TypeError):
4805    
4806                    # try to coerce to the original dtypes item by item if we can

如果你是个不信教的人,敞开心扉熊猫.核心.框架.py并在第4801行的正上方放置一个print "OI"。它应该弹出到控制台:)。注意我在水蟒2号,窗户上。在

我要用“虫子”来回答你的问题。在

相关问题 更多 >