'Sklearn can't convert string to float
I'm using Sklearn as a machine learning tool, but every time I run my code, it gives this error:
Traceback (most recent call last):
File "C:\Users\FakeUserMadeUp\Desktop\Python\Machine Learning\MachineLearning.py", line 12, in <module>
model.fit(X_train, Y_train)
File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\tree\_classes.py", line 942, in fit
X_idx_sorted=X_idx_sorted,
File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\tree\_classes.py", line 166, in fit
X, y, validate_separately=(check_X_params, check_y_params)
File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\base.py", line 578, in _validate_data
X = check_array(X, **check_X_params)
File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\utils\validation.py", line 746, in check_array
array = np.asarray(array, order=order, dtype=dtype)
File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py", line 1993, in __ array __
return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: 'Paris'
Here is the code, and down below there's my dataset:
(I've tried multiple different datasets, also, this dataset is a txt because I made it myself and am to dumb to convert it to csv.)
import pandas as pd
from sklearn.tree import DecisionTreeClassifier as dtc
from sklearn.model_selection import train_test_split as tts
city_data = pd.read_csv('TimeZoneTable.txt')
X = city_data.drop(columns=['Country'])
Y = city_data['Country']
X_train, X_test, Y_train, Y_test = tts(X, Y, test_size = 0.2)
model = dtc()
model.fit(X_train, Y_train)
predictions = model.predict(X_test)
print(Y_test)
print(predictions)
Dataset:
CityName,Country,Latitude,Longitude,TimeZone
Moscow,Russia,55.45'N,37.37'E,3
Vienna,Austria,48.13'N,16.22'E,2
Barcelona,Spain,41.23'N,2.11'E,2
Madrid,Spain,40.25'N,3.42'W,2
Lisbon,Portugal,38.44'N,9.09'W,1
London,UK,51.30'N,0.08'W,1
Cardiff,UK,51.29'N,3.11'W,1
Edinburgh,UK,55.57'N,3.11'W,1
Dublin,Ireland,53.21'N,6.16'W,1
Paris,France,48.51'N,2.21'E,2
Solution 1:[1]
Machine learning algorithms and in particular the random forest work exclusively with input numbers. If you want to improve your model it is even recommended to normalize your model between -1;1 in general and therefore to use decimal numbers, hence the expectation of a float.
In your case, your dataframe seems to contain exclusively string entries. As Dilara Gokay said, you first need to transform your strings into floats and to do so, use what is called an onehotencoder. I let you follow this tutorial if you don't know how to do it.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Nicolas Bzrd |