'df.isna().sum() is not working on titanic dataset

I tried titanic model on kaggle. And it is weird that isna().sum() outputs wrong information.

import os
import pandas as pd 
import numpy as np
import statsmodels.api as sm

from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

worksheet = gc.open('titanic_train').sheet1

titanic = worksheet.get_all_records()
titanic = pd.DataFrame(titanic)
titanic

titanic.info()
titanic.isna().sum()

output is like below.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          891 non-null    object 
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        891 non-null    object 
 11  Embarked     891 non-null    object 
dtypes: float64(1), int64(5), object(6)
memory usage: 83.7+ KB

PassengerId    0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Cabin          0
Embarked       0
dtype: int64

It said that NaN is 0 but there are several NaN in Age, Embarked. Why it cant detect Nan? Is it because of Dtype??

Solution 1:^[1]

It is doing this because there are no NaNs

You notice the df.info() there is no null value.

Solution 2:^[2]

its because of your panda version is 1.2.4.when i degrade to .24 or some other lower version you will get nan values

Solution 3:^[3]

I imported in Google Colab as well and get the follwing output when running df.isna().sum():

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

Did you make any column conversions? For instance, setting the Age column to type object will convert any np.nan values to "nan" which are not recognised as missing values by pandas.

df["Age"] = df["Age"].astype(str)

df["Age"].isna().sum()
# output: 0

You can check for any "nan" values with this:

df["Age"].str.contains("nan").any()
# output: True

Converting them back to np.nan will solve the issue:

df["Age"].replace("nan", np.nan, inplace=True)

df["Age"].isna().sum()
# output: 177

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Divyessh
Solution 2	Mufseera
Solution 3	Akis Hadjimpalasis

'df.isna().sum() is not working on titanic dataset

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]