'One-to-one integer mapping function

We are using MySQL and developing an application where we'd like the ID sequence not to be publicly visible... the IDs are hardly top secret and there is no significant issue if someone indeed was able to decode them.

So, a hash is of course the obvious solution, we are currently using MD5... 32bit integers go in, and we trim the MD5 to 64bits and then store that. However, we have no idea how likely collisions are when you trim like this (especially since all numbers come from autoincrement or the current time). We currently check for collisions, but since we may be inserting 100.000 rows at once the performance is terrible (can't bulk insert).

But in the end, we really don't need the security offered by the hashes and they consume unnecessary space and also require an additional index... so, is there any simple and good enough function/algorithm out there that guarantees one-to-one mapping for any number without obvious visual patterns for sequential numbers?

EDIT: I'm using PHP which does not support integer arithmetic by default, but after looking around I found that it could be cheaply replicated with bitwise operators. Code for 32bit integer multiplication can be found here: http://pastebin.com/np28xhQF



Solution 1:[1]

You could simply XOR with 0xDEADBEEF, if that's good enough.

Alternatively multiply by an odd number mod 2^32. For the inverse mapping just multiply by the multiplicative inverse

Example: n = 2345678901; multiplicative inverse (mod 2^32): 2313902621 For the mapping just multiply by 2345678901 (mod 2^32):

1 --> 2345678901 2 --> 396390506

For the inverse mapping, multiply by 2313902621.

Solution 2:[2]

If you want to ensure a 1:1 mapping then use an encryption (i.e. a permutation), not a hash. Encryption has to be 1:1 because it can be decrypted.

If you want 32 bit numbers then use Hasty Pudding Cypher or just write a simple four round Feistel cypher.

Here's one I prepared earlier:

import java.util.Random;

/**
 * IntegerPerm is a reversible keyed permutation of the integers.
 * This class is not cryptographically secure as the F function
 * is too simple and there are not enough rounds.
 *
 * @author Martin Ross
 */
public final class IntegerPerm {
    //////////////////
    // Private Data //
    //////////////////

    /** Non-zero default key, from www.random.org */
    private final static int DEFAULT_KEY = 0x6CFB18E2;

    private final static int LOW_16_MASK = 0xFFFF;
    private final static int HALF_SHIFT = 16;
    private final static int NUM_ROUNDS = 4;

    /** Permutation key */
    private int mKey;

    /** Round key schedule */
    private int[] mRoundKeys = new int[NUM_ROUNDS];

    //////////////////
    // Constructors //
    //////////////////

    public IntegerPerm() { this(DEFAULT_KEY); }

    public IntegerPerm(int key) { setKey(key); }

    ////////////////////
    // Public Methods //
    ////////////////////

    /** Sets a new value for the key and key schedule. */
    public void setKey(int newKey) {
        assert (NUM_ROUNDS == 4) : "NUM_ROUNDS is not 4";
        mKey = newKey;

        mRoundKeys[0] = mKey & LOW_16_MASK;
        mRoundKeys[1] = ~(mKey & LOW_16_MASK);
        mRoundKeys[2] = mKey >>> HALF_SHIFT;
        mRoundKeys[3] = ~(mKey >>> HALF_SHIFT);
    } // end setKey()

    /** Returns the current value of the key. */
    public int getKey() { return mKey; }

    /**
     * Calculates the enciphered (i.e. permuted) value of the given integer
     * under the current key.
     *
     * @param plain the integer to encipher.
     *
     * @return the enciphered (permuted) value.
     */
    public int encipher(int plain) {
        // 1 Split into two halves.
        int rhs = plain & LOW_16_MASK;
        int lhs = plain >>> HALF_SHIFT;

        // 2 Do NUM_ROUNDS simple Feistel rounds.
        for (int i = 0; i < NUM_ROUNDS; ++i) {
            if (i > 0) {
                // Swap lhs <-> rhs
                final int temp = lhs;
                lhs = rhs;
                rhs = temp;
            } // end if
            // Apply Feistel round function F().
            rhs ^= F(lhs, i);
        } // end for

        // 3 Recombine the two halves and return.
        return (lhs << HALF_SHIFT) + (rhs & LOW_16_MASK);
    } // end encipher()

    /**
     * Calculates the deciphered (i.e. inverse permuted) value of the given
     * integer under the current key.
     *
     * @param cypher the integer to decipher.
     *
     * @return the deciphered (inverse permuted) value.
     */
    public int decipher(int cypher) {
        // 1 Split into two halves.
        int rhs = cypher & LOW_16_MASK;
        int lhs = cypher >>> HALF_SHIFT;

        // 2 Do NUM_ROUNDS simple Feistel rounds.
        for (int i = 0; i < NUM_ROUNDS; ++i) {
            if (i > 0) {
                // Swap lhs <-> rhs
                final int temp = lhs;
                lhs = rhs;
                rhs = temp;
            } // end if
            // Apply Feistel round function F().
            rhs ^= F(lhs, NUM_ROUNDS - 1 - i);
        } // end for

        // 4 Recombine the two halves and return.
        return (lhs << HALF_SHIFT) + (rhs & LOW_16_MASK);
    } // end decipher()

    /////////////////////
    // Private Methods //
    /////////////////////

    // The F function for the Feistel rounds.
    private int F(int num, int round) {
        // XOR with round key.
        num ^= mRoundKeys[round];
        // Square, then XOR the high and low parts.
        num *= num;
        return (num >>> HALF_SHIFT) ^ (num & LOW_16_MASK);
    } // end F()

} // end class IntegerPerm

Solution 3:[3]

Do what Henrik said in his second suggestion. But since these values seem to be used by people (else you wouldn't want to randomize them). Take one additional step. Multiply the sequential number by a large prime and reduce mod N where N is a power of 2. But choose N to be 2 bits smaller than you can store. Next, multiply the result by 11 and use that. So we have:

Hash = ((count * large_prime) % 536870912) * 11

The multiplication by 11 protects against most data entry errors - if any digit is typed wrong, the result will not be a multiple of 11. If any 2 digits are transposed, the result will not be a multiple of 11. So as a preliminary check of any value entered, you check if it's divisible by 11 before even looking in the database.

Solution 4:[4]

You can use mod operation for big prime number.

your number * big prime number 1 / big prime number 2.

Prime number 1 should be bigger than second. Seconds should be close to 2^32 but less than it. Than it will be hard to substitute.

Prime 1 and Prime 2 should be constants.

Solution 5:[5]

For our application, we use bit shuffle to generate the ID. It is very easy to reverse back to the original ID.

func (m Meeting) MeetingCode() uint {
    hashed := (m.ID + 10000000) & 0x00FFFFFF
    chunks := [24]uint{}
    for i := 0; i < 24; i++ {
        chunks[i] = hashed >> i & 0x1
    }
    shuffle := [24]uint{14, 1, 15, 21, 0, 6, 5, 10, 4, 3, 20, 22, 2, 23, 8, 13, 19, 9, 18, 12, 7, 11, 16, 17}
    result := uint(0)
    for i := 0; i < 24; i++ {
        result = result | (chunks[shuffle[i]] << i)
    }
    return result
}

Solution 6:[6]

There is an exceedingly simple solution that none have posted, even though an answer has been selected I highly advise any visiting this question to consider the nature of binary representations, and the application of modulos arithmetic.

Given an finite range of integers, all the values can be permuted in any order through a simple addition over their index while bound by the range of the index through a modulos. You could even leverage simple integer overflow such that using the modulos operator is not even necessary.

Essentially, you'd have a static variable in memory, where a function when called increments the static variable by some constant, enforces the boundaries, and then returns the value. This output could be an index over a collection of desired outputs, or the desired output itself

The constant of the increment that defines the mapping may be several times the size in memory of the value being returned, but given any mapping there exists some finite constant that will achieve the mapping through a trivial modulos arithmetic.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 rossum
Solution 3 phkahler
Solution 4 varela
Solution 5 Ciel
Solution 6 SteamyThePunk