'How can i find the "non-unique" rows?
I imported CSV files with over 500k rows, one year, every minute. To merge two of this files, i want so re-sample the index to every minute:
Temp= pd.read_csv("Temp.csv", sep=";", decimal="," , thousands='.' ,encoding="cp1252")
Temp["Time"] = pd.to_datetime(Temp["Time"],dayfirst=True)
Temp.set_index(['Time'], inplace=True)
Temp= Temp.resample('1Min').ffill()
But I got the error:
cannot reindex a non-unique index with a method or limit
How can i find the "non-unique" rows?
Solution 1:[1]
My solution:
Temp= pd.read_csv("Temp.csv", sep=";", decimal="," , thousands='.' ,encoding="cp1252")
Temp.drop_duplicates(inplace=True)
Temp["Time"] = pd.to_datetime(Temp["Time"],dayfirst=True)
Temp.set_index(['Time'], inplace=True)
Temp= Temp.resample('1Min').ffill()
I used:
len(Temp.index)
and
len(set(Temp.index))
to find out, that there are Dublicates
Solution 2:[2]
You can return a slice of all duplicated rows using df.duplicated()
In your case
Temp[Temp.duplicated(subset=None, keep=False)]
where subset can be changed if you want to find duplicates only in a specific column, and keep = False specifies to display all rows that are duplicated, regardless if its the first or second appearance.
Documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | kolja |
Solution 2 | Sherman |