'Guid.NewGuid() is returning duplicate values when used in a Parallel.For loop
I have an application that is hitting an API. As such, it does a query for all ID's in the object, and then has to query each item one at a time per ID. I'm doing this in a Parallel.For loop and adding each items data to a row in a datatable. Then I use sqlbulkcopy to send the datatable to a SQL server table.
If I do this without using Parallel.For, it works great. However, with Parallel.For, this line:
workrow["id"] = Guid.NewGuid();
is generating duplicate Guids. It's doing it often and causing the data to not load into the SQL server table because the id row in SQL is a Primary Key and doesn't allow duplicates. I tried locking:
lock (lockobject)
{
workrow["id"] = Guid.NewGuid();
}
This didn't help.
I tried not assigning an id to that field in hopes that SQL would generate it (it does have newid() on that field). That fails saying it can't insert a null.
I can't seem to just remove the id field from the datatable because then the columns don't align when I do the sqlbulkcopy.
Can someone help me here? I either need to figure out how to get Guid.NewGuid() to STOP producing duplicates OR I need to figure out a way to not pass in the id (always the first field in the datatable) so that SQL will generate the id.
Here is the code I use to generate one of tables:
public static DataTable MakeWorkflowTable()
{
DataTable Workflow = new DataTable("Workflow");
DataColumn id = new DataColumn("id", System.Type.GetType("System.Guid"));
Workflow.Columns.Add(id);
DataColumn OrgInfoID = new DataColumn("OrgInfoID", System.Type.GetType("System.Guid"));
Workflow.Columns.Add(OrgInfoID);
DataColumn Name = new DataColumn("Name", System.Type.GetType("System.String"));
Workflow.Columns.Add(Name);
DataColumn Active = new DataColumn("Active", System.Type.GetType("System.String"));
Workflow.Columns.Add(Active);
DataColumn Description = new DataColumn("Description", System.Type.GetType("System.String"));
Workflow.Columns.Add(Description);
DataColumn Object = new DataColumn("Object", System.Type.GetType("System.String"));
Workflow.Columns.Add(Object);
DataColumn Formula = new DataColumn("Formula", System.Type.GetType("System.String"));
Workflow.Columns.Add(Formula);
DataColumn ManageableState = new DataColumn("ManageableState", System.Type.GetType("System.String"));
Workflow.Columns.Add(ManageableState);
DataColumn NameSpacePrefix = new DataColumn("NameSpacePrefix", System.Type.GetType("System.String"));
Workflow.Columns.Add(NameSpacePrefix);
DataColumn TDACount = new DataColumn("TDACount", System.Type.GetType("System.Int32"));
Workflow.Columns.Add(TDACount);
DataColumn TriggerType = new DataColumn("TriggerType", System.Type.GetType("System.String"));
Workflow.Columns.Add(TriggerType);
DataColumn CreatedDate = new DataColumn("CreatedDate", System.Type.GetType("System.DateTime"));
Workflow.Columns.Add(CreatedDate);
DataColumn CreatedBy = new DataColumn("CreatedBy", System.Type.GetType("System.String"));
Workflow.Columns.Add(CreatedBy);
DataColumn LastModifiedDate = new DataColumn("LastModifiedDate", System.Type.GetType("System.DateTime"));
Workflow.Columns.Add(LastModifiedDate);
DataColumn LastModifiedBy = new DataColumn("LastModifiedBy", System.Type.GetType("System.String"));
Workflow.Columns.Add(LastModifiedBy);
return Workflow;
}
Here is the code I use to send it to the SQL server:
public static void SendDTtoDB(ref DataTable dt, ref SqlConnection cnn, string TableName)
{
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(cnn))
{
bulkCopy.DestinationTableName =
TableName;
try
{
bulkCopy.WriteToServer(dt);
dt.Clear();
}
catch (Exception e)
{
logger.Warn("SendDTtoDB {TableName}: ORGID: {ORGID} : {Message}", TableName, dt.Rows[0]["OrgInfoID"], e.Message.ToString());
if (e.Message.ToString().Contains("PRIMARY KEY"))
{
foreach(DataRow row in dt.Rows)
{
logger.Warn("ID: {id}", row["id"]);
}
}
}
}
}
As you can see in the catch statement, I set it to write out the ID's to the log so I could see them for myself and, sure enough, there is a duplicate there. So frustrating! I really don't want to take out the Parallel.For and single thread it if I don't have to.
Per request, here is the code with the Parallel.For
if (qr.totalSize > 0)
{
object lockobject = new object();
Parallel.For(0, qr.records.Length, i =>
{
ToolingService.CustomTab1 vr = new ToolingService.CustomTab1();
vr = (ToolingService.CustomTab1)qr.records[i];
string mdSOQL = "Select FullName, description, ManageableState, MasterLabel, NamespacePrefix, Type, Url, CreatedDate, CreatedBy.Name, "
+ "LastModifiedDate, LastModifiedBy.Name From CustomTab where id='" + vr.Id + "'";
ToolingService.QueryResult mdqr = new ToolingService.QueryResult();
ToolingService.CustomTab1 vrmd = new ToolingService.CustomTab1();
mdqr = ts.query(mdSOQL);
vrmd = (ToolingService.CustomTab1)mdqr.records[0];
DataRow workrow = CustomTabs.NewRow();
lock (lockobject)
{
workrow["id"] = Guid.NewGuid();
}
workrow["OrgInfoID"] = _orgDBID;
workrow["FullName"] = vrmd.FullName;
workrow["Description"] = vrmd.Description ?? Convert.DBNull;
workrow["ManageableState"] = vrmd.ManageableState;
workrow["MasterLabel"] = vrmd.MasterLabel ?? Convert.DBNull;
workrow["NameSpacePrefix"] = vrmd.NamespacePrefix ?? Convert.DBNull;
workrow["Type"] = vrmd.Type ?? Convert.DBNull;
workrow["URL"] = vrmd.Url ?? Convert.DBNull;
workrow["CreatedDate"] = vrmd.CreatedDate ?? Convert.DBNull;
if (vrmd.CreatedBy == null)
{
workrow["CreatedBy"] = Convert.DBNull;
}
else
{
workrow["CreatedBy"] = vrmd.CreatedBy.Name;
}
workrow["LastModifiedDate"] = vrmd.LastModifiedDate ?? Convert.DBNull;
if (vrmd.LastModifiedBy == null)
{
workrow["LastModifiedBy"] = Convert.DBNull;
}
else
{
workrow["LastModifiedBy"] = vrmd.LastModifiedBy.Name;
}
lock (CustomTabs)
{
CustomTabs.Rows.Add(workrow);
}
});
OrgTables.SendDTtoDB(ref CustomTabs, ref _cnn, "OrgCustomTabs");
Solution 1:[1]
The thing is that having to use a lock
inside the Parallel.ForEach
in a DataTable, sort of defeats the purpose of using the Parallel.ForEach
in the first place; however, I am surprised that you don't get exceptions when you call DataRow workrow = CustomTabs.NewRow();
because on my test, I get an index corrupted exception. I had to actually wrap the call to NewRow
inside a lock. Something like this:
Parallel.ForEach(data, x =>
{
DataRow row = null;
lock (lockRow)
{
row = dt.NewRow();
row["Guid"] = Guid.NewGuid();
}
...
lock(lockObj)
dt.Rows.Add(row);
Where lockObj
and lockRow
are 2 separate static objects instantiated as
static object lockObj = new object();
static object lockRow = new object();
And that worked for me, adding 1 million rows to the DataTable
and making sure that all the Guids were unique.
With all the above said, I would strongly recommend writing the code as suggested by Julian or create a class that implements IDataReader
(which you can use with SQLBulkCopy) and upload the data using that.
Solution 2:[2]
I have seen this issue before. There is no problem with the Guid.NewGuid()
, but the DataTable
isn't threadsafe!
DataTable is simply not designed or intended for concurrent usage (in particular where there is any form of mutation involved).
See Thread safety for DataTable
Also related: c# DataGridView DataTable internal index corrupted in parallel loop
It's done in a loop because I have to query the API over and over to get each item in the object. I'm doing the Parallel.For loop so I can speed up the process. If I need to get 450 items or more one at a time, I'd like to multithread that for speed purposes. The database access isn't being done in the loop, just building the datatable is because once I get the data back from the API I need to store it.
You could create a type and add them multithreaded to a ConcurrentBag<T>
or ConcurrentQueue<T>
(or another concurrent collection, see MS Docs) - those are thread-safe :)
Then after it you could build the DataTable
with a single thread. Or maybe skip the whole DataTable if possible for your use case.
Solution 3:[3]
I used a guidgenerator that I lock every time I access it.
lock (guidGenerator)
{
entity.Id = guidGenerator.NewGuid();
}
worked fine for me. It's a different approach though.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | Lukas Wessel |