'drop duplicates and exclude specific columns and take the lowest value
I have this example dataset
CPU_Sub_Series RAM Screen_Size Resolution Price
Intel i5 8 15.6 1920x1080 699
Intel i5 8 15.6 1920x1080 569
Intel i5 8 15.6 1920x1080 789
Ryzen 5 16 16.0 2560x1600 999
Ryzen 5 32 16.0 2560x1600 1299
All I want to do is, check and then drop the duplicate data, except in the price column, and then keep the lowest value in the price column.
So, the output column is like this :
CPU_Sub_Series RAM Screen_Size Resolution Price
Intel i5 8 15.6 1920x1080 569
Ryzen 5 16 16.0 2560x1600 999
Ryzen 5 32 16.0 2560x1600 1299
Should I sort it first by price? and then what?df.sort_values('Price')
? and then what?
Solution 1:[1]
In addition to @Daniele Bianco's answer, you can also get the result like this (almost similar approach but slightly different form):
import pandas as pd
df = pd.DataFrame({
'CPU_Sub_Series': ['Intel i5', 'Intel i5', 'Intel i5', 'Ryzen 5', 'Ryzen 5'],
'RAM': [8, 8, 8, 16, 32],
'Screen_Size': [15.6, 15.6, 15.6, 16.0, 16.0],
'Resolution': ['1920x1080', '1920x1080', '1920x1080', '2560x1600', '2560x1600'],
'Price': [699, 569, 789, 999, 1299]
})
df = df.groupby(["CPU_Sub_Series", "RAM", "Screen_Size", "Resolution"])['Price'].min().reset_index()
print(df)
# CPU_Sub_Series RAM Screen_Size Resolution Price
#0 Intel i5 8 15.6 1920x1080 569
#1 Ryzen 5 16 16.0 2560x1600 999
#2 Ryzen 5 32 16.0 2560x1600 1299
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Park |