'Sklearn can't convert string to float

I'm using Sklearn as a machine learning tool, but every time I run my code, it gives this error:

Traceback (most recent call last):
  File "C:\Users\FakeUserMadeUp\Desktop\Python\Machine Learning\MachineLearning.py", line 12, in <module>
    model.fit(X_train, Y_train)
  File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\tree\_classes.py", line 942, in fit
    X_idx_sorted=X_idx_sorted,
  File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\tree\_classes.py", line 166, in fit
    X, y, validate_separately=(check_X_params, check_y_params)
  File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\base.py", line 578, in _validate_data
    X = check_array(X, **check_X_params)
  File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\sklearn\utils\validation.py", line 746, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
  File "C:\Users\FakeUserMadeUp\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py", line 1993, in __ array __
    return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: 'Paris'

Here is the code, and down below there's my dataset:

(I've tried multiple different datasets, also, this dataset is a txt because I made it myself and am to dumb to convert it to csv.)

    import pandas as pd
    from sklearn.tree import DecisionTreeClassifier as dtc
    from sklearn.model_selection import train_test_split as tts

    city_data = pd.read_csv('TimeZoneTable.txt')
    X = city_data.drop(columns=['Country'])
    Y = city_data['Country']

    X_train, X_test, Y_train, Y_test = tts(X, Y, test_size = 0.2)

    model = dtc()
    model.fit(X_train, Y_train)
    predictions = model.predict(X_test)

    print(Y_test)
    print(predictions)

Dataset:

CityName,Country,Latitude,Longitude,TimeZone

Moscow,Russia,55.45'N,37.37'E,3

Vienna,Austria,48.13'N,16.22'E,2

Barcelona,Spain,41.23'N,2.11'E,2

Madrid,Spain,40.25'N,3.42'W,2

Lisbon,Portugal,38.44'N,9.09'W,1

London,UK,51.30'N,0.08'W,1

Cardiff,UK,51.29'N,3.11'W,1

Edinburgh,UK,55.57'N,3.11'W,1

Dublin,Ireland,53.21'N,6.16'W,1

Paris,France,48.51'N,2.21'E,2


Solution 1:[1]

Machine learning algorithms and in particular the random forest work exclusively with input numbers. If you want to improve your model it is even recommended to normalize your model between -1;1 in general and therefore to use decimal numbers, hence the expectation of a float.

In your case, your dataframe seems to contain exclusively string entries. As Dilara Gokay said, you first need to transform your strings into floats and to do so, use what is called an onehotencoder. I let you follow this tutorial if you don't know how to do it.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nicolas Bzrd