'How to use pandas.to_sql but only add row if row doesn't exist yet
I have some experience with python but very new to the SQL thing and trying to use pandas.to_sql to add table data into my database, but when I add I want it to check if the data exists before append
This are my 2 dataframes
>>> df0.to_markdown()
| | Col1 | Col2 |
|---:|-------:|-------:|
| 0 | 0 | 00 |
| 1 | 1 | 11 |
>>> df1.to_markdown()
| | Col1 | Col2 |
|---:|-------:|-------:|
| 0 | 0 | 00 |
| 1 | 1 | 11 |
| 2 | 2 | 22 |
So here I use the pandas to_sql
>>> df0.to_sql(con=con, name='test_db', if_exists='append', index=False)
>>> df1.to_sql(con=con, name='test_db', if_exists='append', index=False)
Here I check my data inside the database file
>>> df_out = pd.read_sql("""SELECT * FROM test_db""", con)
>>> df_out.to_markdown()
| | Col1 | Col2 |
|---:|-------:|-------:|
| 0 | 0 | 0 |
| 1 | 1 | 11 |
| 2 | 0 | 0 | # Duplicate
| 3 | 1 | 11 | # Duplicate
| 4 | 2 | 22 |
But I want my database to look like this, so I don't want to add the duplicate data to my database
| | Col1 | Col2 |
|---:|-------:|-------:|
| 0 | 0 | 0 |
| 1 | 1 | 11 |
| 3 | 2 | 22 |
Is there any option I can set or some line of code to add to make this happend?
Thankyou!
edit: There are some SQL code to only pull unique data, but what I want to do is don't add the data to the database in the first place
Solution 1:[1]
Don't use to_sql a simple query can work
query = text(f""" INSERT INTO test_db VALUES {','.join([str(i) for i in list(df0.to_records(index=False))])} ON CONFLICT ON CONSTRAINT test_db_pkey DO NOTHING""")
self.engine.connect().execute(query)
For each DataFrame change df0 to df1
Follow these link for a better understanding
Solution 2:[2]
For example in my Sqlite 3 database I used temporary table:
I inserted all the dataframe data into temporary table:
df0.to_sql(con=con, name='temptable', if_exists='append', index=False)
df1.to_sql(con=con, name='temptable', if_exists='append', index=False)
Then I copy only new data and drop (delete) the table:
con.executescript('''
INSERT INTO test_db
SELECT test_db.* FROM temptable
LEFT JOIN test_db on
test_db.Col1 = temptable.Col1
WHERE test_db.Col1 IS NULL; -- only items, that not presented in 'test_db' table
DROP TABLE temptable;
''')
Solution 3:[3]
add this code to your function
remove_duplicate = 'EXEC remove_duplicate'
cursor.execute(remove_duplicate)
cursor.commit()
and create procedure in your db:
CREATE PROCEDURE remove_duplicate AS
BEGIN
;WITH duplicates as (SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1, col2, ORDER BY col1) AS number_of_duplicates
FROM dbo.table)
DELETE FROM duplicates WHERE number_of_duplicates > 1
END
go
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Sanchit Singh |
Solution 2 | |
Solution 3 | Bartol |