'Inserting Data to SQL Server from a Python Dataframe Quickly

I have been trying to insert data from a dataframe in Python to a table already created in SQL Server. The data frame has 90K rows and wanted the best possible way to quickly insert data in the table. I only have read,write and delete permissions for the server and I cannot create any table on the server.

Below is the code which is inserting the data but it is very slow. Please advise.

import pandas as pd
import xlsxwriter
import pyodbc

df = pd.read_excel(r"Url path\abc.xlsx")
conn = pyodbc.connect('Driver={ODBC Driver 11 for SQL Server};'
                      'SERVER=Server Name;'
                      'Database=Database Name;'
                      'UID=User ID;'
                      'PWD=Password;'
                      'Trusted_Connection=no;')
cursor= conn.cursor()
#Deleting existing data in SQL Table:- 
cursor.execute("DELETE FROM datbase.schema.TableName")
conn.commit()
#Inserting data in SQL Table:- 
for index,row in df.iterrows():
    cursor.execute("INSERT INTO Table Name([A],[B],[C],) values (?,?,?)", row['A'],row['B'],row['C']) 
conn.commit()
cursor.close()
conn.close()


Solution 1:[1]

To insert data much faster, try using sqlalchemy and df.to_sql. This requires you to create an engine using sqlalchemy, and to make things faster use the option fast_executemany=True

connect_string = urllib.parse.quote_plus(f'DRIVER={{ODBC Driver 11 for SQL Server}};Server=<Server Name>,<port>;Database=<Database name>')
engine = sqlalchemy.create_engine(f'mssql+pyodbc:///?odbc_connect={connect_string}', fast_executemany=True)

with engine.connect() as connection:
  df.to_sql(<table name>, connection, index=False)

Solution 2:[2]

This should do what you want...very generic example...

# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc

# create timer
start_time = time.time()
from sqlalchemy import create_engine


df = pd.read_csv("C:\\your_path\\CSV1.csv")

conn_str = (
    r'DRIVER={SQL Server Native Client 11.0};'
    r'SERVER=Excel-PC\SQLEXPRESS;'
    r'DATABASE=NORTHWND;'
    r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)

cursor = cnxn.cursor()

for index,row in df.iterrows():
    cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)', 
                    row['Name'], 
                    row['Address'], 
                    row['Age'],
                    row['Work'])
    cnxn.commit()
cursor.close()
cnxn.close()

# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))

Try that and post back if you have additional questions/issues/concerns.

Solution 3:[3]

Replace df.iterrows() with df.apply() for one thing. Remove the loop for something much more efficient.

Solution 4:[4]

Here is the script and hope this works for you.

import pandas as pd    
import pyodbc as pc    
connection_string = "Driver=SQL Server;Server=localhost;Database={0};Trusted_Connection=Yes;"                           
cnxn = pc.connect(connection_string.format("DataBaseNameHere"), autocommit=True)    
cur=cnxn.cursor()
df= pd.read_csv("your_filepath_and_filename_here.csv").fillna('')    
query = 'insert into TableName({0}) values ({1})'    
query = query.format(','.join(df.columns), ','.join('?' * len(df1.columns)))        
cur.fast_executemany = True    
cur.executemany(query, df.values.tolist())
cnxn.close()

Solution 5:[5]

Try to populate a temp table with 1 or none indexes then insert it into your good table all at once. Might speed things up due to not having to update the indexes after each insert??

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 NYC Coder
Solution 2 ASH
Solution 3 Matt
Solution 4 Jeremy Caney
Solution 5 Lenn Denis