'How can I convert this lakh into actual price with int datatype [closed]

I was trying to convert this column values into actual numbers so that I can used this number for machine learning algorithm. This label is actually what I want to predict from my machine learning algorithm, so I wanted to give this as input to my model to train them before the actual price prediction but here the range of price is given which is what I am finding difficult to convert, Can you help me how can I convert this combination of number and text to actual proper number with int data type(currently having object as data type)

About this Dataset: This is the dataset of all the used car which was sold to the customer at what price and what is the same car price if you buy a new car. so I wanted to create a model in which user give data about new-car price of that range, car-company name and many more other fields of label from which my model give expected price of Used car.

But I am stuck what can I do with this field of data as this is the range and I cannot drop it as it is one of the main factor to decide used car price.

Rs means Indian Rupees(similar to Dollar)

10 Lakh=1 million OR

1 Lakh = 100 Thousands

image of two column I am talking about



Solution 1:[1]

I didn't had a minimum reproducible example, I created a demo dataframe similar to yours.

import pandas as pd
df = pd.DataFrame({'selling_price' : ['5.5 Lakh*', '5.7 Lakh*', '3.5 Lakh*', '3.15 Lakh*'],
                   'new-price':['Rs.7.11-7.48 Lakh*','Rs.10.14-13.79 Lakh*','Rs.5.16-6.94 Lakh*','Rs.6.54-6.63 Lakh*',]})

pd.DataFrame({'selling_price' :[int(float(str(x).strip(' Lakh*'))*100000) for x in df['selling_price'].to_list()]})

# here I am converting the selling_price column of dataframe to list then stripping ' Lakh*' 

# and again converting it back to dataframe. Similarly you can do it for new-price column.


#output

    selling_price
0   550000
1   570000
2   350000
3   315000

I removed 'Rs.' and 'Lakh* ' in the new-price column as well:

[x.strip('Rs.') for x in[x.strip(' Lakh*') for x in df['new-price'].to_list()]]

#output

['7.11-7.48', '10.14-13.79', '5.16-6.94', '6.54-6.63']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1