'Convert a float column with nan to int pandas

I am trying to convert a float pandas column with nans to int format, using apply. I would like to use something like this:

df.col = df.col.apply(to_integer)

where the function to_integer is given by

def to_integer(x):
    if np.isnan(x):
        return np.NaN
    else:
        return int(x)

However, when I attempt to apply it, the column remains the same.

How could I achieve this without having to use the standard technique of dtypes?



Solution 1:[1]

You can't have NaN in an int column, NaN are float (unless you use an object type, which is not a good idea since you'll lose many vectorial abilities).

You can however use the new nullable integer type (NA).

Conversion can be done with convert_dtypes:

df = pd.DataFrame({'col': [1, 2, None]})
df = df.convert_dtypes()

# type(df.at[0, 'col'])
# numpy.int64

# type(df.at[2, 'col'])
# pandas._libs.missing.NAType

output:

    col
0     1
1     2
2  <NA>

Solution 2:[2]

Not sure how you would achieve this without using dtypes. Sometimes when loading in data, you may have a column that contains mixed dtypes. Loading in a column with one dtype and attemping to turn it into mixed dtypes is not possible though (at least, not that I know of).

So I will echo what @mozway said and suggest you use nullable integer data types

e.g

df['col'] = df['col'].astype('Int64')

(note the capital I)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 skeuomorph