从单一字符串值创建dtype为字符串（非对象）的列而无需强制转换

1 投票

1 回答

28 浏览

提问于 2025-04-14 15:46

有没有办法从一个单独的字符串值创建一个列，这个列本身默认就是字符串列，而不是对象列呢？

对象列占用的内存太多了，我不想花时间把对象列再转换回字符串列。

df = pd.DataFrame(dict(a=range(10)))
df["new"] = "my string"
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       10 non-null     int64 
 1   new     10 non-null     object
dtypes: int64(1), object(1)
memory usage: 288.0+ bytes

即使我先初始化一个空的字符串列，它还是会返回一个对象列。

df = pd.DataFrame(dict(a=range(10)))
df["new"] = pd.Series(dtype="string")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       10 non-null     int64 
 1   new     0 non-null      string
dtypes: int64(1), string(1)
memory usage: 288.0 bytes

df["new"] = "my string"
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       10 non-null     int64 
 1   new     10 non-null     object
dtypes: int64(1), object(1)
memory usage: 288.0+ bytes

这是我找到的唯一有效的方法，但感觉为了实现一个应该很简单的事情，写了这么多代码和花了这么多精力。

df = pd.DataFrame(dict(a=range(10)))
df["new"] = pd.Series(["my string"] * len(df), dtype="string", index=df.index)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       10 non-null     int64 
 1   new     10 non-null     string
dtypes: int64(1), string(1)
memory usage: 288.0 bytes

内存管理数据类型数据框字符串列列创建强制转换对象列空列初始化

1 个回答

我想我找到了答案。你需要使用loc方法。

df = pd.DataFrame(dict(a=range(10_000)))
df["new"] = pd.Series(dtype="string")
df.loc[:, "new"] = "my string"
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       10000 non-null  int64 
 1   new     10000 non-null  string
dtypes: int64(1), string(1)
memory usage: 156.4 KB

回答于 2025-04-14 由 Python大师

分享举报

从单一字符串值创建dtype为字符串（非对象）的列而无需强制转换

1 个回答

撰写回答