'Is there a way to insert data into an sql table using spark jdbc WITHOUT inserting duplicates AND losing already existing data?

I'm trying to write a spark dataframe into a postgresql table by using df.write.jdbc. The problem is, I want to make sure to not lose existing data already inside the table (Using SaveMode.Append) but also making sure to avoid inserting duplicate data already inserted into it.

  1. So, if I use SaveMode.Overwrite:

-The table gets dropped losing all previous data

  1. If I use SaveMode.Append:
  • The table doesn't get dropped but the duplicate records get inserted.
  • If I use this mode together with a primary key already into the db (that would provide the unique constraint) it returns an error.

Is there some kind of option to solve this? Thanks



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source