'Are GUID collisions possible?
I'm working on a database in SQL Server 2000 that uses a GUID for each user that uses the app it's tied to. Somehow, two users ended up with the same GUID. I know that microsoft uses an algorithm to generate a random GUID that has an extremely low chance of causing collisons, but is a collision still possible?
Solution 1:[1]
Basically they are not possible !, the chances are astronomically low.
But... I'm the only person I the world that I know of, that had a GUID colision once (yep!).
And I'm sure of it, and that it wasn't a mistake.
How did it happen, in a small application that was running on Pocket PC, at the end of an operation a command that has an generated GUID must be issued. The command after it was executed on the server it was stored in a command table on the server along with the execution date. One day when I was debugging I issued the module command (with the newly generated GUID attached) and nothing happened. I did it again (with the same guid, because the guid was generated only once at the beginning of the operation), and again, and nothing, finally trying to find out why the command isn't executing, I checked the command table, and the same GUID as the current one was inserted 3 weeks ago. Not believing this, I restored a database from 2 weeks backup, and the guid was there. Checked the code, the new guid was freshly generated no doubt about it. Pow guid collision, happened only once, but I really wish I would have won at lotto instead,the chance is greater :).
Edit: there are some factors that could have greatly increased the chance of this happening, the application was running on the PocketPC emulator, and the emulator has a save state feature, which means that every time the state is restored the local time is restored also and the guid is based on on the internal timer....also the guid generating algorithm for compact framework might be less complete than for example the COM one...
Solution 2:[2]
They are theoretically possible, but with 3.4E38 possible numbers, if you create tens of trillions of GUIDs in a year the chance of having one duplicate is 0.00000000006 (Source).
If two users ended up with the same GUID, I would wager that there is a bug in the program which is causing the data to be copied or shared.
Solution 3:[3]
First lets look at the chance of collision of two GUIDs. It is not, as other answers have stated, 1 in 2^128 (10^38) because of the birthday paradox, which means that for a 50% chance of two GUIDs colliding the probability is actually 1 in 2^64 (10^19) which is a lot smaller. However, this is still a very large number, and as such the probability of collision assuming you are using a reasonable number of GUIDs is low.
Note also that GUIDs do not contain a timestamp or the MAC address as many people also seem to believe. This was true for v1 GUIDs but now v4 GUIDs are used, which are simply a pseudo-random number which means that possibility of collision is arguably higher because they are no longer unique to a time and a machine.
So essentially the answer is yes, collisions are possible. But they are highly unlikely.
Edit: fixed to say 2^64
Solution 4:[4]
The chances of two random GUIDs colliding (~1 in 10^38) is lower than the chance of not detecting a corrupt TCP/IP packet (~1 in 10^10). http://wwwse.inf.tu-dresden.de/data/courses/SE1/SE1-2004-lec12.pdf, page 11. This is also true of disk drives, cd drives, etc...
GUIDs are statistically unique and the data you read from the db is only statistically correct.
Solution 5:[5]
Are you a mathematician? Then yes.
Are you an engineer? Then no.
Solution 6:[6]
I would consider Occam's razor as a good guide in this case. It is incredibly unlikely that you have a GUID collision. It is much more likely you have a bug, or someone messing with your data.
Solution 7:[7]
See Wikipedia's Globally Unique Identifier article. There are several ways to generate GUIDs. Apparently the old (?) way used Mac address, a timestamp down to a very short unit and a unique counter (to manage fast generations on the same computer), so making them duplicate is nearly impossible. But these GUIDs were dropped because they could be used to track down users...
I am not sure of the new algorithm used by Microsoft (the article says a sequence of GUIDs can be predicted, looks like they no longer use timestamp? The Microsoft article linked above says something else...).
Now, GUIDs are carefully designed to be, by name, globally unique, so I will risk it is impossible, or of very very very low probability. I would look elsewhere.
Solution 8:[8]
Two Win95 machines that have ethernet cards with duplicate MAC addresses will issue duplicate GUIDS under tightly controlled conditions, especially if, for example, the power goes off in the building and they both boot at exactly the same time.
Solution 9:[9]
I'll preface this with "I'm not a networking person, so I may make completely incoherent sentences following.".
When I worked at Illinois State University, we had two Dell desktops, ordered at different times. We put the first one on the network, but when we tried to put the second one on the network we started receiving crazy errors. After much troubleshooting, it was determined that both machines were producing the same GUID (I'm not sure exactly what for, but it made them both unusable on the network). Dell actually replaced both machines as defective.
Solution 10:[10]
I know people like the feel-good answer that GUIDs are magical and guaranteed to be unique, but in reality, most GUIDs are just 121-bit random numbers (seven of the bits are wasted on formatting). If you wouldn't feel comfortable using a big random number, then you shouldn't feel comfortable using a GUID.
Solution 11:[11]
Could the code used to generate a GUID have a bug in it? Yes, of course it could. But the answer is the same as it would be for a compiler bug - your own code is orders of magnitude more likely to be buggy, so look there first.
Solution 12:[12]
Generalized formula
There's a formula that estimates how many values of size S to generate to get a collision between two of them with probability P.
Variables:
- bits - how many bits in your data type.
- probability - target probability for the collision.
To get a collision, you have to generate around:
Or in Python:
from math import sqrt, log
def how_many(bits, probability):
return 2 ** ((bits + 1) / 2) * sqrt(-log(1 - probability))
GUIDs
For GUIDs (128 bits), to get a collision with probability 1% (0.01), you'll need:
In [2]: how_many(bits=128, probability=0.01)
Out[2]: 2.6153210405530885e+18
...around 2.6 * 10^18 GUIDs (that's 42 exabytes of GUIDs).
Note that this probability grows rapidly. No matter the number of bits, for 99.99% probability you'll need only 30x more GUIDs than for 1%!
In [3]: how_many(bits=128, probability=0.9999)
Out[3]: 7.91721721556706e+19
Int64
Same numbers, but for the int64 datatype:
In [4]: how_many(bits=64, probability=0.01)
Out[4]: 608926881
In [5]: how_many(bits=64, probability=0.9999)
Out[5]: 18433707802
For 1% collision probability you'll need 5 gigabytes of int64-s. Still a lot but compared to the GUIDs that is a much more comprehensible number.
It's the so called birthday problem - and in this Wikipedia article you can find more precise estimation formulas than this one.
Solution 13:[13]
Of course its possible....Probable? Not likely, but it is possible.
Remember, the same machine is generating every GUID (the server), so a lot of the "randomness" that is based on machine specific information is lost.
Solution 14:[14]
Just for grins, try the following script... (works on SQL 2005, not sure about 2000)
declare @table table
(
column1 uniqueidentifier default (newid()),
column2 int,
column3 datetime default (getdate())
)
declare @counter int
set @counter = 1
while @counter <= 10000
begin
insert into @table (column2) values (@counter)
set @counter = @counter + 1
end
select * from @table
select * from @table t1 join @table t2 on t1.column1 = t2.column1 and t1.column2 != t2.column2
Running this repeatedly (takes less than a second) produces a fairly wide range from the first select, even with an EXTREMELY short time gap. So far the second select hasn't produced anything.
Solution 15:[15]
Impossible if the users have different machines with network cards, and even if not it is still an extremely marginal almost theoretical risk.
Personally I'd look elsewhere as it is more likely a bug rather than a GUID clash...
Providing of course that you don't chop bits off the GUID to make it shorter.
Solution 16:[16]
It's highly unlikely that you'll run into GUID collisions if you're generating them through something like the NEWID()
function in SQL Server (though of course possible, as other answers have emphasized). One thing they haven't pointed out is that it's actually quite likely that you'll run into collisions if you're generating GUIDs in JavaScript on browsers in the wild. Not only are there sometimes problems in the RNG in different browsers, but I've also run into problems where the Google spiders seem to cache the results of functions like that, and ended up repeatedly passing the same GUID up to our systems.
See the various answers here for more details:
Solution 17:[17]
Don’t worry about what it is. Make it impossible. Mix the improbability of GUID with the impossibility of sequential. Just add a database sequential I’d to the GUID and call it done. You may need to change the data type from GUID to String-ish but they are not that different storage wise.
Solution 18:[18]
Sure it's possible, and maybe even likely. It's not like each GUID is in a random portion of the possible number space. In the event that two threads attempted to generate one simultaneously, barring some kind of centralized GUID function with a semaphore around it, they could end up with the same value.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow