从单一字符串值创建dtype为字符串(非对象)的列而无需强制转换

1 投票
1 回答
28 浏览
提问于 2025-04-14 15:46

有没有办法从一个单独的字符串值创建一个列,这个列本身默认就是字符串列,而不是对象列呢?

对象列占用的内存太多了,我不想花时间把对象列再转换回字符串列。

df = pd.DataFrame(dict(a=range(10)))
df["new"] = "my string"
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       10 non-null     int64 
 1   new     10 non-null     object
dtypes: int64(1), object(1)
memory usage: 288.0+ bytes

即使我先初始化一个空的字符串列,它还是会返回一个对象列。

df = pd.DataFrame(dict(a=range(10)))
df["new"] = pd.Series(dtype="string")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       10 non-null     int64 
 1   new     0 non-null      string
dtypes: int64(1), string(1)
memory usage: 288.0 bytes

df["new"] = "my string"
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       10 non-null     int64 
 1   new     10 non-null     object
dtypes: int64(1), object(1)
memory usage: 288.0+ bytes

这是我找到的唯一有效的方法,但感觉为了实现一个应该很简单的事情,写了这么多代码和花了这么多精力。

df = pd.DataFrame(dict(a=range(10)))
df["new"] = pd.Series(["my string"] * len(df), dtype="string", index=df.index)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       10 non-null     int64 
 1   new     10 non-null     string
dtypes: int64(1), string(1)
memory usage: 288.0 bytes

1 个回答

0

我想我找到了答案。你需要使用loc方法。

df = pd.DataFrame(dict(a=range(10_000)))
df["new"] = pd.Series(dtype="string")
df.loc[:, "new"] = "my string"
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       10000 non-null  int64 
 1   new     10000 non-null  string
dtypes: int64(1), string(1)
memory usage: 156.4 KB

撰写回答