Pandas识别的所有类型都是什么?

2024-05-15 09:05:59 发布

您现在位置:Python中文网/ 问答频道 /正文

对于pandas,如果除了

(i)float64int64(以及np.number的其他变体,如float32int8等)

(二)bool

(iii)datetime64timedelta64

比如字符串列,总是有一个dtypeobject

或者,我想知道,如果上面列表中除了(I),(ii)和(iii)之外还有任何数据类型,pandas不能使它成为dtype一个object


Tags: 字符串numberpandasobjectnp变体iiiint8
3条回答

还有一个是uint8。

关于熊猫的文档有很多信息。

The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32.

By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit). The following will all result in int64 dtypes.

Numpy, however will choose platform-dependent types when creating arrays. The following WILL result in int32 on 32-bit platform.

pandasnumpy借用其数据类型。有关演示,请参见以下内容:

import pandas as pd

df = pd.DataFrame({'A': [1,'C',2.]})
df['A'].dtype

>>> dtype('O')

type(df['A'].dtype)

>>> numpy.dtype

您可以找到有效的numpy.dtypesin the documentation列表:

'?' boolean

'b' (signed) byte

'B' unsigned byte

'i' (signed) integer

'u' unsigned integer

'f' floating-point

'c' complex-floating point

'm' timedelta

'M' datetime

'O' (Python) objects

'S', 'a' zero-terminated bytes (not recommended)

'U' Unicode string

'V' raw data (void)

pandas应该支持这些类型。使用pandas.Series对象的astype方法作为输入参数,将导致pandas尝试将Series转换为该类型(或者至少返回到object类型);'u'是我看到的唯一一个pandas完全不理解的方法:

df['A'].astype('u')

>>> TypeError: data type "u" not understood

这是一个numpy错误,其结果是'u'需要后跟一个数字,该数字指定中每个项的字节数(需要有效):

import numpy as np

np.dtype('u')

>>> TypeError: data type "u" not understood

np.dtype('u1')

>>> dtype('uint8')

np.dtype('u2')

>>> dtype('uint16')

np.dtype('u4')

>>> dtype('uint32')

np.dtype('u8')

>>> dtype('uint64')

# testing another invalid argument
np.dtype('u3')

>>> TypeError: data type "u3" not understood

总之,pandas对象的astype方法将尝试对任何对numpy.dtype有效的参数执行一些合理的操作。请注意,numpy.dtype('f')numpy.dtype('float32')相同,numpy.dtype('f8')numpy.dtype('float64')等相同。将参数传递给pandasastype方法也是如此。

要在NumPy中定位相应的数据类型类,Pandas docs建议如下:

def subdtypes(dtype):
    subs = dtype.__subclasses__()
    if not subs:
        return dtype
    return [dtype, [subdtypes(dt) for dt in subs]]

subdtypes(np.generic)

输出:

[numpy.generic,
 [[numpy.number,
   [[numpy.integer,
     [[numpy.signedinteger,
       [numpy.int8,
        numpy.int16,
        numpy.int32,
        numpy.int64,
        numpy.int64,
        numpy.timedelta64]],
      [numpy.unsignedinteger,
       [numpy.uint8,
        numpy.uint16,
        numpy.uint32,
        numpy.uint64,
        numpy.uint64]]]],
    [numpy.inexact,
     [[numpy.floating,
       [numpy.float16, numpy.float32, numpy.float64, numpy.float128]],
      [numpy.complexfloating,
       [numpy.complex64, numpy.complex128, numpy.complex256]]]]]],
  [numpy.flexible,
   [[numpy.character, [numpy.bytes_, numpy.str_]],
    [numpy.void, [numpy.record]]]],
  numpy.bool_,
  numpy.datetime64,
  numpy.object_]]

熊猫接受这些类作为有效类型。例如,dtype={'A': np.float}

NumPy文档contain更多详细信息和图表:

dtypes

在其他答案的基础上,熊猫还包含了一些自己的数据类型。

Pandas and third-party libraries extend NumPy’s type system in a few places. This section describes the extensions pandas has made internally. See Extension types for how to write your own extension that works with pandas. See Extension data types for a list of third-party libraries that have implemented an extension.

The following table lists all of pandas extension types. See the respective document

https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#dtypes

enter image description here

此外,pandas 1.0将有一个字符串dtype。

相关问题 更多 >