将具有<NA>值的数据类型为Int64的列转换为具有nan值的对象

2024-05-14 19:57:44 发布

您现在位置:Python中文网/ 问答频道 /正文

教程中的数据帧sequels如下所示:

              title sequel
id                        
19995        Avatar    nan
862       Toy Story    863
863     Toy Story 2  10193
597         Titanic    nan
24428  The Avengers    nan

<class 'pandas.core.frame.DataFrame'>
Index: 4803 entries, 19995 to 185567
Data columns (total 2 columns):
title     4803 non-null object
sequel    4803 non-null object
dtypes: object(2)
memory usage: 272.6+ KB

教程提供了一个文件sequels.p。然而,当我读入文件时,我的数据帧与教程中的数据帧不同

my_sequels = pd.read_pickle('data/pandas/sequels.p')
my_sequels.set_index('id', inplace=True)
my_sequels.head()
             title  sequel
id      
19995       Avatar  <NA>
862      Toy Story  863
863    Toy Story 2  10193
597        Titanic  <NA>
24428  The Avengers <NA>

sequels.info()
<class 'pandas.core.frame.DataFrame'>
Index: 4803 entries, 19995 to 185567
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   4803 non-null   object
 1   sequel  90 non-null     Int64 
dtypes: Int64(1), object(1)
memory usage: 117.3+ KB

我的问题是:有没有一种方法可以操纵my_sequels使之类似于sequels,也就是说,将my_sequels['sequel']作为4803非空的对象,其中<NA>变成nan

编辑:我希望my_sequelssequels相同的原因是为了避免后续步骤中的错误:

sequels_fin = my_sequels.merge(financials, on='id', how='left')

orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel', 
                             right_on='id', right_index=True,
                             suffixes=('_org','_seq'))

ValueError                                Traceback (most recent call last)
<ipython-input-5-7215de303684> in <module>
      3 orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel', 
      4                              right_on='id', right_index=True,
----> 5                              suffixes=('_org','_seq'))
ValueError: cannot convert to 'int64'-dtype NumPy array with missing values. Specify an appropriate 'na_value' for this dtype.

Tags: columnsidobjecttitleonmynannull
2条回答

第一个索引“id”:

sequels_fin = sequels_fin.set_index('id')

之后:

orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel', 
                             right_on='id', right_index=True,
                             suffixes=('_org','_seq'))

我想你不会想的。您之所以看到这篇教程,是因为它基于比您使用的版本更旧的Pandas

https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html

正如您可能期望的那样,您可以检测缺失的值并对其进行操作

arr = pd.array([1, 2, None], dtype=pd.Int64Dtype())
arr.isna()
array([False, False,  True])
arr.fillna(0)
<IntegerArray>
[1, 2, 0]
Length: 3, dtype: Int64

相关问题 更多 >

    热门问题