'Initialize a column with missing values and copy+transform another column of a dataframe into the initialized column
I have a messy column in a csv
file (column A of the dataframe).
using CSV, DataFrames
df = DataFrame(A = ["1", "3", "-", "4", missing, "9"], B = ["M", "F", "R", "G", "Z", "D"])
Want I want to do is:
- Transform the integer from string to numeric (e.g.
Float64
) - Transform the string
"-"
inmissing
The strategy would be to first define a new column vector filled with missing
df[:C] = fill(missing, size(df)[1])
and then perform the 2 transformations with for loops
for i in 1:size(df)[1]
if df[:A][i] == "-"
continue
else
df[:C][i] = parse(Float64,df[:A][i])
end
end
However, when looking at df[:C]
I have a column filled only with missing.
What am I doing wrong?
Solution 1:[1]
There are several issues with your code, but first let me show how I would write this transformation:
df.C = passmissing(parse).(Float64, replace(df.A, "-"=>missing))
It is not the most efficient way to do it but is simple to reason about.
An implementation using a loop could look like:
df.C = similar(df.A, Union{Float64, Missing});
for (i, a) in enumerate(df.A)
if !ismissing(a) && a != "-"
df.C[i] = parse(Float64, a)
else
df.C[i] = missing
end
end
Note that similar
by default will fill the df.C
with missing
so the else
part could be dropped, but this behavior is not documented so it is safer to write it.
You could also use a comprehension:
df. C = [ismissing(a) || a == "-" ? missing : parse(Float64, a) for a in df.A]
Now, to fix your code you could write:
# note a different initialization
# in your code df.C allowed only values of Missing type and disallows of Float64 type
df.C = Vector{Union{Float64, Missing}}(missing, size(df, 1))
for i in 1:size(df)[1]
# note that we need to handle missing value and "=" separately
if ismissing(df.A[i]) || df.A[i] == "-"
continue
else
df.C[i] = parse(Float64,df.A[i])
end
end
Finally note that it is preferred to write df.C
than df[:C]
to access a column in a data frame (currently both are equivalent but this might change in the future).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |