'Azure Synapse IDENTITY COLUMN; Wrong Id values based on seed

The goal is to create a table, insert some (3) dummy rows for technical reasons, then for any valid data, start using Ids above 100.

Script for creating (drop-create) the table:

IF OBJECT_ID(N'dbo.IdentityInsertTest') IS NOT NULL
  DROP TABLE dbo.IdentityInsertTest;
GO

CREATE TABLE dbo.IdentityInsertTest (
  Id BIGINT NOT NULL IDENTITY (100, 1) -- SEED is 100, not 1!
 ,Col1 INT
 ,Col2 INT
)
WITH (
  HEAP
 ,DISTRIBUTION = HASH(Col1)
)
;

Insert scripts:

-- 1) 1 row insert; this one is only "needed" to make sure valid data starts with >100 Id; see explanation later
INSERT INTO dbo.IdentityInsertTest(Col1, Col2)
SELECT -1 AS Col1
      ,-2 AS Col2
 WHERE 1 = 1
;

-- 2) Insert technical/dummy rows
SET IDENTITY_INSERT dbo.IdentityInsertTest ON;
INSERT INTO dbo.IdentityInsertTest(Id, Col1, Col2) VALUES(1, 1, 1);
INSERT INTO dbo.IdentityInsertTest(Id, Col1, Col2) VALUES(2, 2, 2);
INSERT INTO dbo.IdentityInsertTest(Id, Col1, Col2) VALUES(3, 3, 3);
SET IDENTITY_INSERT dbo.IdentityInsertTest OFF;

-- 3) Inserting "valid" rows
WITH
  t0(i) AS (SELECT 0 UNION ALL SELECT 0), --             2
  t1(i) AS (SELECT 0 FROM t0 a, t0 b),    --             4
  t2(i) AS (SELECT 0 FROM t1 a, t1 b),    --            16
  t3(i) AS (SELECT 0 FROM t2 a, t2 b),    --           256
  t4(i) AS (SELECT 0 FROM t3 a, t3 b),    --        65 536
  t5(i) AS (SELECT 0 FROM t4 a, t4 b),    -- 4 294 967 296
  tn(i) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM t5)
INSERT INTO dbo.IdentityInsertTest(Col1, Col2)
SELECT tn.i % 180 AS Col1
      ,tn.i AS Col2
  FROM tn 
 WHERE i BETWEEN 1 AND 15
;

Normally, only 2) and 3) would be needed, but it seems that if I first insert rows manually then load valid data, the engine will give Ids to those rows <100, see image below.

Result of loading data into the tables, without and with the extra, not needed data (-1, -2)

(1) Shows what happens when I run only 2) and 3), in this order, not running 1), no extra row (-1, -2). First I insert the 3 technical rows I need by swithcing on and off the IDENTITY_INSERT, then load the valid data. Valid data starts with Id = 4, but the definition of the IDENTITY column is IDENTIY(100, 1).

(2) Shows that I do add an extra row (-1, -2) to the table first, then do the manual insert then loading the valid data. Valid data does get correct Id values (> 100).

Based on further observation, it seems to me that if I insert into an empty table manual rows first (IDENTITY_INSERT ... ON), then load additional row with IDENTITY_INSERT ... OFF, the system won't take the SEED value into account, so in order to make this to work, I need to:

1 - Insert a dummy, not needed row into the empty table **first** (which I'll delete later)
2 - Insert the technical manual rows I need
3 - Delete the extra row I added before (1st step)
4 - Now it's safe to add more rows to the table (with `IDENTITY_INSERT ... OFF`).

I tried to google about this unexpected behavior, but couldn't find anything. Is this a bug?



Solution 1:[1]

This is one of the limitations of Azure Synapse. You can use the following code to generate identity column explicitly:

Declare @IdentityColumnName BIGINT 

set @IdentityColumnName = (SELECT case when MAX(IdentityColumnName) is null then 0 else  MAX(IdentityColumnName)+1 end FROM dw.TableName) 

DBCC CHECKIDENT('dw.TableName', reseed, @IdentityColumnName) 

Solution 2:[2]

Unless I'm not understanding correctly, I think you should be able to run your identity inserts, then reseed the table to 100:

IF OBJECT_ID(N'dbo.IdentityInsertTest') IS NOT NULL
  DROP TABLE dbo.IdentityInsertTest;
GO

CREATE TABLE dbo.IdentityInsertTest (
  Id BIGINT NOT NULL IDENTITY (100, 1) -- SEED is 100, not 1!
 ,Col1 INT
 ,Col2 INT
)
WITH (
  HEAP
 ,DISTRIBUTION = HASH(Col1)
)
;
 
SET IDENTITY_INSERT dbo.IdentityInsertTest ON;
INSERT INTO dbo.IdentityInsertTest(Id, Col1, Col2) VALUES(1, 1, 1);
INSERT INTO dbo.IdentityInsertTest(Id, Col1, Col2) VALUES(2, 2, 2);
INSERT INTO dbo.IdentityInsertTest(Id, Col1, Col2) VALUES(3, 3, 3);
SET IDENTITY_INSERT dbo.IdentityInsertTest OFF;

DBCC CHECKIDENT('dbo.IdentityInsertTest', reseed, 100) ;
 
 WITH
  t0(i) AS (SELECT 0 UNION ALL SELECT 0), --             2
  t1(i) AS (SELECT 0 FROM t0 a, t0 b),    --             4
  t2(i) AS (SELECT 0 FROM t1 a, t1 b),    --            16
  t3(i) AS (SELECT 0 FROM t2 a, t2 b),    --           256
  t4(i) AS (SELECT 0 FROM t3 a, t3 b),    --        65 536
  t5(i) AS (SELECT 0 FROM t4 a, t4 b),    -- 4 294 967 296
  tn(i) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM t5)
INSERT INTO dbo.IdentityInsertTest(Col1, Col2)
SELECT tn.i % 180 AS Col1
      ,tn.i AS Col2
  FROM tn 
 WHERE i BETWEEN 1 AND 15
;
 SELECT * FROM dbo.IdentityInsertTest
 ORDER BY id

Interestingly, the reseed doesn't work properly if the identity_insert is still on. So this doesn't work:

SET IDENTITY_INSERT dbo.IdentityInsertTest ON;
INSERT INTO dbo.IdentityInsertTest(Id, Col1, Col2) VALUES(1, 1, 1);
INSERT INTO dbo.IdentityInsertTest(Id, Col1, Col2) VALUES(2, 2, 2);
INSERT INTO dbo.IdentityInsertTest(Id, Col1, Col2) VALUES(3, 3, 3);
DBCC CHECKIDENT('dbo.IdentityInsertTest', reseed, 100) ;
SET IDENTITY_INSERT dbo.IdentityInsertTest OFF;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 AnnuKumari-MSFT
Solution 2