'How to find the winner(s) of Brazil's Mega Sena lottery most efficiently?

Mega Sena is Brazil's most famous lottery. The number set ranges from 1 to 60 and a single bet can contain each from 6 to 15 numbers selected (the more numbers selected the more expensive that bet is). Prizes are given, proportionally, for those who can correctly guess 4, 5 or 6 numbers, the latter being the jackpot.

Sometimes the total number of bets reach the hundred of millions. Drawings are televisioned live so, as a player, you can know almost instantaneously if you won or not. However, the lottery takes several hours to announce the number of winners and which cities they are from.

Given the rules above, how would you implement the computational logic that finds the winners? What data structures would you choose to store the data? Which sorting algorithms, if any, would you select? What's the Big O notation of the best solution you can think of?



Solution 1:[1]

I would allocate a 64-bit integer to represent which numbers have been chosen.

The first step would be expand all the 7-15 number tickets to tickets having only 6 bits, which could be done offline before the drawing. I don't know if the 15-number ticket contains all the (15 choose 6) = 5005 6-element tickets, or do they use another system. But since it's done offline, the complexity is delegated elsewhere.

There's even an algorithm (or bithack) called lexicographically next permutation, which is able to generate all those n choose k bit patterns efficiently, if it needs to be done in real time.

Mask all those tickets with the bit pattern of the winning row, and compute the number of bits left. This should be extremely efficient taking an order of one second for a billion tickets in a modern computer that has the popcount instruction.

The other issue is the validation, integrity and confidentiality of the data, when associated to the ticket holders. I would guess that this is the real issue and is probably addressed by implementing the whole thing by a database query.

Solution 2:[2]

You can hold the selection of each person within a single 64-bit word with each bit representing a selected number. The entire dataset could fit in memory in one long integer array.

If the array was sorted in the same order as your database, e.g., by ticket ID, then you could retrieve an associated record simply from knowing that the position in the array would be the same as the rownum of your query.

If you performed a bitwise AND operation of each value with the 64-bit representation of the winning numbers, you could count the bits and store the offsets of any matching 4, 5, or 6 bits into their own respective lists.

The whole operation would be pretty obviously linear O(n).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Aki Suihkonen
Solution 2 phatfingers