数据帧中的字符串，但数据类型是obj

<class 'pandas.core.frame.DataFrame'> Int64Index: 56992 entries, 0 to 56991 Data columns (total 7 columns): id 56992 non-null values attr1 56992 non-null values attr2 56992 non-null values attr3 56992 non-null values attr4 56992 non-null values attr5 56992 non-null values attr6 56992 non-null values dtypes: int64(2), object(5)

2条回答

网友

1楼 · 编辑于 2024-05-14 02:47:58

dtype对象来自NumPy，它描述ndarray中元素的类型。ndarray中的每个元素的字节大小必须相同。对于int64和float64，它们是8字节。但对于字符串，字符串的长度不是固定的。因此，panda没有直接在ndarray中保存字符串的字节，而是使用object ndarray来保存指向对象的指针，因此这种ndarray的数据类型是object。

下面是一个例子：

int64数组包含4个int64值。
对象数组包含指向3个字符串对象的4个指针。

enter image description here

网友

2楼 · 编辑于 2024-05-14 02:47:58

公认的答案是好的。只是想提供一个答案。文件上说：

Pandas uses the object dtype for storing strings.

正如前面的评论所说：“别担心，它应该是这样的。”（尽管接受的答案很好地解释了“为什么”；字符串是可变长度的）

But for strings, the length of the string is not fixed.

相关问题更多 >

编程相关推荐

热门问题

热门文章