'How to get original array from random shuffle of an array

I was asked in an interview today below question. I gave O(nlgn) solution but I was asked to give O(n) solution. I could not come up with O(n) solution. Can you help?

An input array is given like [1,2,4] then every element of it is doubled and 
appended into the array. So the array now looks like [1,2,4,2,4,8].  How 
this array is randomly shuffled. One possible random arrangement is 
[4,8,2,1,2,4].  Now we are given this random shuffled array and we want to
 get original array [1,2,4] in O(n) time.

The original array can be returned in any order. How can I do it?


Solution 1:[1]

Here's an O(N) Java solution that could be improved by first making sure that the array is of the proper form. For example it shouldn't accept [0] as an input:

import java.util.*;

class Solution {
  public static int[] findOriginalArray(int[] changed) {
    if (changed.length % 2 != 0)
        return new int[] {};

    // set Map size to optimal value to avoid rehashes
    Map<Integer,Integer> count = new HashMap<>(changed.length*100/75);
    int[] original = new int[changed.length/2];
    int pos = 0;

    // count frequency for each number
    for (int n : changed) {
        count.put(n, count.getOrDefault(n,0)+1);
    }

    // now decide which go into the answer
    for (int n : changed) {

       int smallest = n;
       for (int m=n; m > 0 && count.getOrDefault(m,0) > 0; m = m/2)  {
          //System.out.println(m);
          smallest = m;
          if (m % 2 != 0) break;
       }


       // trickle up from smallest to largest while count > 0
       
       for (int m=smallest, mm = 2*m; count.getOrDefault(mm,0) > 0; m = mm, mm=2*mm){

          int ct = count.getOrDefault(mm,0);
          while (count.get(m) > 0 && ct > 0) {
             //System.out.println("adding "+m);
             original[pos++] = m;
             count.put(mm, ct -1);
             count.put(m, count.get(m) - 1);
             ct = count.getOrDefault(mm,0);
          }

       }    
    }

    // check for incorrect format
    if (count.values().stream().anyMatch(x -> x > 0)) {
        return new int[] {};
    }

    return original;
}

public static void main(String[] args) {
   int[] changed = {1,2,4,2,4,8};
   System.out.println(Arrays.toString(changed));
   System.out.println(Arrays.toString(findOriginalArray(changed)));
  } 
}

But I've tried to keep it simple.

The output is NOT guaranteed to be sorted. If you want it sorted it's going to cost O(NlogN) inevitably unless you use a Radix sort or something similar (which would make it O(NlogE) where E is the max value of the numbers you're sorting and logE the number of bits needed).

Runtime

This may not look that it is O(N) but you can see that it is because for every loop it will only find the lowest number in the chain ONCE, then trickle up the chain ONCE. Or said another way, in every iteration it will do O(X) iterations to process X elements. What will remain is O(N-X) elements. Therefore, even though there are for's inside for's it is still O(N).

An example execution can be seen with [64,32,16,8,4,2]. If this where not O(N) if you print out each value that it traverses to find the smallest you'd expect to see the values appear over and over again (for example N*(N+1)/2 times).

But instead you see them only once:

finding smallest 64
finding smallest 32
finding smallest 16
finding smallest 8
finding smallest 4
finding smallest 2
adding 2
adding 8
adding 32

If you're familiar with the Heapify algorithm you'll recognize the approach here.

Solution 2:[2]

def findOriginalArray(self, changed: List[int]) -> List[int]:
    size = len(changed)
    ans = []
    left_elements = size//2
    
    #IF SIZE IS ODD THEN RETURN [] NO SOLN. IS POSSIBLE
    if(size%2 !=0):
        return ans
    
    #FREQUENCY DICTIONARY given array [0,0,2,1] my map will be: {0:2,2:1,1:1}
    d = {}
    for i in changed:
        if(i in d):
            d[i]+=1
        else:
            d[i] = 1
            
    # CHECK THE EDGE CASE OF 0         
    if(0 in d):
        count = d[0]
        half = count//2
        if((count % 2 != 0) or (half > left_elements)):
            return ans
        left_elements -= half
        ans = [0 for i in range(half)] 
        
    #CHECK REST OF THE CASES : considering the values will be 10^5
    for i in range(1,50001):
        if(i in d):
            if(d[i] > 0):
                count = d[i]
                if(count > left_elements):
                    ans = []
                    break
                left_elements -= d[i]
                for j in range(count):
                    ans.append(i)
                if(2*i in d):
                    if(d[2*i] < count):
                        ans = []
                        break
                    else:
                        d[2*i] -= count
                else:
                    ans = []
                    break
    return ans

Solution 3:[3]

I have a simple idea which might not be the best, but I could not think of a case where it would not work. Having the array A with the doubled elements and randomly shuffled, keep a helper map. Process each element of the array and, each time you find a new element, add it to the map with the value 0. When an element is processed, increment map[i] and decrement map[2*i]. Next you iterate over the map and print the elements that have a value greater than zero.

A simple example, say that the vector is:

[1, 2, 3]

And the doubled/shuffled version is:

A = [3, 2, 1, 4, 2, 6]

When processing 3, first add the keys 3 and 6 to the map with value zero. Increment map[3] and decrement map[6]. This way, map[3] = 1 and map[6] = -1. Then for the next element map[2] = 1 and map[4] = -1 and so forth. The final state of the map in this example would be map[1] = 1, map[2] = 1, map[3] = 1, map[4] = -1, map[6] = 0, map[8] = -1, map[12] = -1.

Then you just process the keys of the map and, for each key with a value greater than zero, add it to the output. There are certainly more efficient solutions, but this one is O(n).

Solution 4:[4]

In C++, you can try this. With time is O(N + KlogK) where N is the length of input, and K is the number of unique elements in input.

class Solution {
public:
    vector<int> findOriginalArray(vector<int>& input) {
        if (input.size() % 2) return {};
        unordered_map<int, int> m;
        for (int n : input) m[n]++;
        vector<int> nums;
        for (auto [n, cnt] : m) nums.push_back(n);
        sort(begin(nums), end(nums));
        vector<int> out;
        for (int n : nums) {
            if (m[2 * n] < m[n]) return {};
            for (int i = 0; i < m[n]; ++i, --m[2 * n]) out.push_back(n);
        }
        return out;
    }
};

Solution 5:[5]

Not so clear about the space complexity required in the question, so this is my top-of-the-mind attempt to this question if this requires O(n) time complexity.

  1. If the length of the input array is not even, then its wrong !!
  2. Create a map, add the elements of the input array to it.
  3. Divide each element in the input array by 2 and check if that value exists in the map. If it exists, add it to the array (slice) orig.
  4. There is a chance we have added duplicate values to this original array, clean it!!

Here is a sample go code: https://go.dev/play/p/w4mm-rloHyi

I am sure we can optimize this code in a lot of ways for space complexities. But its O(n) time complexity.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 luizbarcelos
Solution 4 Viettel Solutions
Solution 5 Dr.Mr.Dr