'How can i find the "non-unique" rows?

I imported CSV files with over 500k rows, one year, every minute. To merge two of this files, i want so re-sample the index to every minute:

Temp= pd.read_csv("Temp.csv", sep=";", decimal="," , thousands='.'  ,encoding="cp1252")

Temp["Time"] = pd.to_datetime(Temp["Time"],dayfirst=True)
Temp.set_index(['Time'], inplace=True)
Temp= Temp.resample('1Min').ffill()

But I got the error:

cannot reindex a non-unique index with a method or limit

How can i find the "non-unique" rows?



Solution 1:[1]

My solution:

Temp= pd.read_csv("Temp.csv", sep=";", decimal="," , thousands='.'  ,encoding="cp1252")
Temp.drop_duplicates(inplace=True) 
Temp["Time"] = pd.to_datetime(Temp["Time"],dayfirst=True)
Temp.set_index(['Time'], inplace=True)
Temp= Temp.resample('1Min').ffill()

I used:

len(Temp.index)

and

len(set(Temp.index))

to find out, that there are Dublicates

Solution 2:[2]

You can return a slice of all duplicated rows using df.duplicated()

In your case

Temp[Temp.duplicated(subset=None, keep=False)]

where subset can be changed if you want to find duplicates only in a specific column, and keep = False specifies to display all rows that are duplicated, regardless if its the first or second appearance.

Documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 kolja
Solution 2 Sherman