如何强制pandas read_csv对所有浮动列使用float32？

2条回答

网友

1楼 · 编辑于 2024-06-16 12:43:46

尝试：

import numpy as np
import pandas as pd

# Sample 100 rows of data to determine dtypes.
df_test = pd.read_csv(filename, nrows=100)

float_cols = [c for c in df_test if df_test[c].dtype == "float64"]
float32_cols = {c: np.float32 for c in float_cols}

df = pd.read_csv(filename, engine='c', dtype=float32_cols)

首先读取100行数据的样本（根据需要修改）以确定每列的类型。

它创建了一个“float64”列的列表，然后使用字典理解创建一个字典，其中这些列作为键，“np.float32”作为每个键的值。

最后，它使用“c”引擎读取整个文件（为列分配dtype所必需），然后将float32\u cols字典作为参数传递给dtype。

df = pd.read_csv(filename, nrows=100)
>>> df
   int_col  float1 string_col  float2
0        1     1.2          a     2.2
1        2     1.3          b     3.3
2        3     1.4          c     4.4

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 4 columns):
int_col       3 non-null int64
float1        3 non-null float64
string_col    3 non-null object
float2        3 non-null float64
dtypes: float64(2), int64(1), object(1)

df32 = pd.read_csv(filename, engine='c', dtype={c: np.float32 for c in float_cols})
>>> df32.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 4 columns):
int_col       3 non-null int64
float1        3 non-null float32
string_col    3 non-null object
float2        3 non-null float32
dtypes: float32(2), int64(1), object(1)

网友

2楼 · 编辑于 2024-06-16 12:43:46

@亚历山大的回答很好。某些列可能需要精确。如果是这样，您可能需要在列表理解中添加更多条件来排除某些列内置的any或all很方便：

float_cols = [c for c in df_test if all([df_test[c].dtype == "float64", 
             not df_test[c].name == 'Latitude', not df_test[c].name =='Longitude'])]

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何强制pandas read_csv对所有浮动列使用float32？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >