'Best strategy for Massive Insert/Update using jdbc in mssqlserver

Good Day, I posted this question previously but it seems I am not clear enough so I will try to be as detailed as possible here about my situation.

I need to implement a solution to do a daily extraction of data from some CSV files and using only JDBC insert this data into a production environment database tables.

I have to insert into 2 tables

Tables :

    Table1 (
    [func] [varchar](8) NOT NULL,
    [Ver] [smallint] NOT NULL,
    [id] [varchar](32) NOT NULL,
    [desc] [varchar](300) NOT NULL,
    [value] [float] NOT NULL,
    [dtcreated] [date] NOT NULL,
    [dtloaded] [date] NULL,
 CONSTRAINT [Table1_PK] PRIMARY KEY CLUSTERED 
(
    [func] ASC,
    [ver] ASC,
    [id] ASC,
    [desc] ASC,
    [dtcreated] ASC
);

table2 (
    [id] [varchar](32) NOT NULL,
    [f1] [varchar](50) NOT NULL,
    [f2] [varchar](32) NOT NULL,
    [f3] [varchar](6) NULL,
    [f4] [varchar](3) NULL,
    [f5] [varchar](3) NULL,
    [f6] [varchar](32) NULL,
    [DtStart] [date] NOT NULL,
    [DtEnd] [date] NOT NULL,
    [dtcreated] [date] NOT NULL,
    [dtloaded] [date] NULL,
    CONSTRAINT [table2_PK] PRIMARY KEY CLUSTERED 
    (
        [id] ASC,
        [DtStart] DESC,
        [DtEnd] DESC
    )

Table1 has a size of 400+GB with 6,500+ Million Records. Table2 has a size of 30+GB with about 5 Million Records.

In table1 I need to process and insert 1.5 Million records.

In table2 I need to process and update/insert 1.1 Million records, this is done using a merge-when-matched query.

I need to be able to do these 2 processes without interruption of usage of these tables.

my code does the following

public void processFile(String fileLocation) throws IOException, SQLException{
    try {

        SqlClient sqlClient = SqlClient.from(DriverClassName.SQLSERVER, DriverConnectionString.barra());
        Connection connection = sqlClient.getConnection();

        PreparedStatement pstmt  = connection.prepareStatement(getSql());

        File file = new File(fileLocation);

        try (BufferedReader br = new BufferedReader(new FileReader(file))) {
            int lnproc = 0;
            int batchCount = 0;
            String line;
            while (((line = br.readLine()) != null) {
    
                String[] parts = line.split(",");
                pstmt.clearParameters();
                .....//Process parts and add them to the preparestatement
                pstmt.addBatch();
                batchCount++;
    
                if(batchCount>=batchSize){
                    batchCount = 0;
                    try {
                        pstmt.executeBatch();
                    }catch (BatchUpdateException ex){               
                    }
                }
            }

            try {
                pstmt.executeBatch();
            }catch (BatchUpdateException ex){
            }
        }

        connection.commit();
        connection.close();
    } catch (ClassNotFoundException | InstantiationException | IllegalAccessException e) {
    }
}

because of the huge amount of records to insert in each table, i can generate dfferent locks on the tables that can afect the production environment.

I have done some research and I have multiple strategies I am thinking of using

  • create batches of max 5k inserts and commit them to prevent lock-escalation
  • committing after every record to prevent locks and transactions logs.

I would like to pick the brains of the community about what you think could be the best strategy to use in this case.

And any recomendations you can make me.



Solution 1:[1]

After looking into it, the best solution i found for the following,

First as stated in the comments, I read the whole file loaded it into memory in a java structure.

After loading the file I then iterated the java stricter and started to load each record in the batch. At the same time I kept a counter on each item I add the batch. When the counter hits 5000, I do a commit on the batch, reset the counter to 0 and keep adding to the following items I either hit 5000 again or reached the end of the iteration.

By doing this I am preventing MSSQL from creating a lock on the table, and the table can still be used by the other processes and applications.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 rread