'How can I convert this lakh into actual price with int datatype [closed]
I was trying to convert this column values into actual numbers so that I can used this number for machine learning algorithm. This label is actually what I want to predict from my machine learning algorithm, so I wanted to give this as input to my model to train them before the actual price prediction but here the range of price is given which is what I am finding difficult to convert, Can you help me how can I convert this combination of number and text to actual proper number with int data type(currently having object as data type)
About this Dataset: This is the dataset of all the used car which was sold to the customer at what price and what is the same car price if you buy a new car. so I wanted to create a model in which user give data about new-car price of that range, car-company name and many more other fields of label from which my model give expected price of Used car.
But I am stuck what can I do with this field of data as this is the range and I cannot drop it as it is one of the main factor to decide used car price.
Rs means Indian Rupees(similar to Dollar)
10 Lakh=1 million OR
1 Lakh = 100 Thousands
Solution 1:[1]
I didn't had a minimum reproducible example, I created a demo dataframe similar to yours.
import pandas as pd
df = pd.DataFrame({'selling_price' : ['5.5 Lakh*', '5.7 Lakh*', '3.5 Lakh*', '3.15 Lakh*'],
'new-price':['Rs.7.11-7.48 Lakh*','Rs.10.14-13.79 Lakh*','Rs.5.16-6.94 Lakh*','Rs.6.54-6.63 Lakh*',]})
pd.DataFrame({'selling_price' :[int(float(str(x).strip(' Lakh*'))*100000) for x in df['selling_price'].to_list()]})
# here I am converting the selling_price column of dataframe to list then stripping ' Lakh*'
# and again converting it back to dataframe. Similarly you can do it for new-price column.
#output
selling_price
0 550000
1 570000
2 350000
3 315000
I removed 'Rs.' and 'Lakh* ' in the new-price
column as well:
[x.strip('Rs.') for x in[x.strip(' Lakh*') for x in df['new-price'].to_list()]]
#output
['7.11-7.48', '10.14-13.79', '5.16-6.94', '6.54-6.63']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |