从单一字符串值创建dtype为字符串(非对象)的列而无需强制转换
有没有办法从一个单独的字符串值创建一个列,这个列本身默认就是字符串列,而不是对象列呢?
对象列占用的内存太多了,我不想花时间把对象列再转换回字符串列。
df = pd.DataFrame(dict(a=range(10)))
df["new"] = "my string"
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 10 non-null int64
1 new 10 non-null object
dtypes: int64(1), object(1)
memory usage: 288.0+ bytes
即使我先初始化一个空的字符串列,它还是会返回一个对象列。
df = pd.DataFrame(dict(a=range(10)))
df["new"] = pd.Series(dtype="string")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 10 non-null int64
1 new 0 non-null string
dtypes: int64(1), string(1)
memory usage: 288.0 bytes
df["new"] = "my string"
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 10 non-null int64
1 new 10 non-null object
dtypes: int64(1), object(1)
memory usage: 288.0+ bytes
这是我找到的唯一有效的方法,但感觉为了实现一个应该很简单的事情,写了这么多代码和花了这么多精力。
df = pd.DataFrame(dict(a=range(10)))
df["new"] = pd.Series(["my string"] * len(df), dtype="string", index=df.index)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 10 non-null int64
1 new 10 non-null string
dtypes: int64(1), string(1)
memory usage: 288.0 bytes
1 个回答
0
我想我找到了答案。你需要使用loc方法。
df = pd.DataFrame(dict(a=range(10_000)))
df["new"] = pd.Series(dtype="string")
df.loc[:, "new"] = "my string"
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 10000 non-null int64
1 new 10000 non-null string
dtypes: int64(1), string(1)
memory usage: 156.4 KB