df.isna().sum()无法处理泰坦尼克号数据集

2021-12-01 10:46:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我在卡格尔上试过泰坦尼克号模型。奇怪的是,isna().sum()输出了错误的信息

import os
import pandas as pd 
import numpy as np
import statsmodels.api as sm

from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

worksheet = gc.open('titanic_train').sheet1

titanic = worksheet.get_all_records()
titanic = pd.DataFrame(titanic)
titanic
titanic.info()
titanic.isna().sum()

输出如下所示

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          891 non-null    object 
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        891 non-null    object 
 11  Embarked     891 non-null    object 
dtypes: float64(1), int64(5), object(6)
memory usage: 83.7+ KB
PassengerId    0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Cabin          0
Embarked       0
dtype: int64

据说楠是0,但在年龄上有几个楠。为什么它检测不到Nan?是因为数据类型吗