教程中的数据帧sequels
如下所示:
title sequel
id
19995 Avatar nan
862 Toy Story 863
863 Toy Story 2 10193
597 Titanic nan
24428 The Avengers nan
<class 'pandas.core.frame.DataFrame'>
Index: 4803 entries, 19995 to 185567
Data columns (total 2 columns):
title 4803 non-null object
sequel 4803 non-null object
dtypes: object(2)
memory usage: 272.6+ KB
教程提供了一个文件sequels.p
。然而,当我读入文件时,我的数据帧与教程中的数据帧不同
my_sequels = pd.read_pickle('data/pandas/sequels.p')
my_sequels.set_index('id', inplace=True)
my_sequels.head()
title sequel
id
19995 Avatar <NA>
862 Toy Story 863
863 Toy Story 2 10193
597 Titanic <NA>
24428 The Avengers <NA>
sequels.info()
<class 'pandas.core.frame.DataFrame'>
Index: 4803 entries, 19995 to 185567
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 title 4803 non-null object
1 sequel 90 non-null Int64
dtypes: Int64(1), object(1)
memory usage: 117.3+ KB
我的问题是:有没有一种方法可以操纵my_sequels
使之类似于sequels
,也就是说,将my_sequels['sequel']
作为4803非空的对象,其中<NA>
变成nan
编辑:我希望my_sequels
与sequels
相同的原因是为了避免后续步骤中的错误:
sequels_fin = my_sequels.merge(financials, on='id', how='left')
orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel',
right_on='id', right_index=True,
suffixes=('_org','_seq'))
ValueError Traceback (most recent call last)
<ipython-input-5-7215de303684> in <module>
3 orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel',
4 right_on='id', right_index=True,
----> 5 suffixes=('_org','_seq'))
ValueError: cannot convert to 'int64'-dtype NumPy array with missing values. Specify an appropriate 'na_value' for this dtype.
第一个索引“id”:
之后:
我想你不会想的。您之所以看到这篇教程,是因为它基于比您使用的版本更旧的Pandas
https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
正如您可能期望的那样,您可以检测缺失的值并对其进行操作
相关问题 更多 >
编程相关推荐