'Python Postgres Package: psycopg2 copy_from vs copy_expert

Requirement: To load millions of rows into a table from S3 using Python and avoid memory issue

I see there are two methods psycopg2's copy_from and copy_expert.

Which of these are most efficient and avoid memory issue

Also, I see that Redshift(Which is Postgres) support COPY Command to load data from S3 file but not sure if Postgres DB support such feature



Solution 1:[1]

My implementation changing copy_from to copy_expert. Extensive analysis of PostgreSQL load can be found here: https://hakibenita.com/fast-load-data-python-postgresql.

COPY_FROM

def insert_with_string_io(df: pd.DataFrame, table_name: str):
        buffer = io.StringIO()
        df.to_csv(buffer, index=False, header=False)
        buffer.seek(0)
        with conn.cursor() as cursor:
            try:
                cursor.copy_from(file=buffer, table=table_name, sep=",", null="")
            except (Exception, psycopg2.DatabaseError) as error:
                print("Error: %s" % error)

COPY_EXPERT

def insert_with_string_io(df: pd.DataFrame):
        buffer = io.StringIO()
        df.to_csv(buffer, index=False, header=False)
        buffer.seek(0)
        with conn.cursor() as cursor:
            try:
                cursor.copy_expert(f"COPY <database>.<schema>.<table> FROM STDIN (FORMAT 'csv', HEADER false)" , buffer)
            except (Exception, psycopg2.DatabaseError) as error:
                print("Error: %s" % error)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Hale4029