'Finding union of 2 sorted arrays (with duplicates)

I'm trying to find the union of 2 sorted arrays (with duplicates) but I feel I'm not coming up with the most elegant code (what I have works btw, I just feel I can reduce some lines of code). Lets say I have 2 vectors a = {1,3,3,4,4,4,5,7} and b = {1,3,3,3,5,5,5,6,8,9} and I want to store their union in a vector called unionVector (which will be 1,3,4,5,6,7,8,9)

Here's my code:

#include <iostream>
#include <vector>
using namespace std;

// Prints the contents of a vector
void printVector(vector<int> a){
  if(a.size() == 0)
    return;
  else{
    for(int i = 0; i < a.size(); i++)
      cout << a[i] << '\t';
  }
  cout << endl;
}

// Print the union of 2 sorted arrays with duplicates
void printUnion(int *a, int aSize, int *b, int bSize){
  if(aSize == 0 && bSize == 0)
    return;
  else{

    vector<int> unionVector;

    int i = 0;
    int j = 0;
    int last = 0;

    // insert the smaller of first element regardless
    if(a[i] < b[j]){
      unionVector.push_back(a[i]);
      i++;
    }
    else if (b[j] < a[i]){
      unionVector.push_back(b[j]);
      j++;
    }
    else{// both are equal numbers 
      unionVector.push_back(a[i]);
      i++;
      j++;
    }

    // now traverse both the loops one increment at a time
    while(i < aSize && j < bSize){
      last = unionVector[unionVector.size() - 1];

      if(a[i] < b[j]){
        if(last != a[i])
          unionVector.push_back(a[i]);
        i++; // increment i in either case
      }
      else if(b[j] < a[i]){
        if(last != b[j])
          unionVector.push_back(b[j]);
        j++;
      }
      else{
        // both of the numbers are equal
        if(last != a[i])
          unionVector.push_back(a[i]);
        i++;
        j++;
      }
    }

    // lets say if 1 array wasn't complete
    while(i < aSize){
      last = unionVector[unionVector.size() - 1];

      if(last != a[i])
        unionVector.push_back(a[i]);
      i++;
    }

    while(j < bSize){
      last = unionVector[unionVector.size() - 1];

      if(last != b[i])
        unionVector.push_back(b[j]);
      j++;
    }

    printVector(unionVector);
  }
}

int main(){
  int a[] = {1,3,3,4,4,4,5,7};
  int b[] = {1,3,3,3,5,5,5,6,7,7,8,9};

  printUnion(a,8,b,12);

  return 0;
}

Thing is since there can be duplicates I check the element that is to be inserted with the last element inserted in unionVector. I need to make sure that I don't try to get the 'last' element when the unionVector is empty, thats why I'm inserting 1 element in the unionVector anyways. I would really appreciate if anyone can suggest a way that I can do that check without needing to insert 1 element first (I was thinking of having a flag variable that checks if the unionVector is empty or not, but I feel that will be too messy)

Edit 1:

  • This isn't a homework problem. This is something I was practising for my interviews

Edit 2:

  • I also can't use any built in functions

Edit 3:

  • Some people were getting confused if this was for a C++ position. You can use any language you want


Solution 1:[1]

If both arrays are sorted, it's just a matter of skipping ahead one iterator or the other or both if there's a match.

So something like:

void printUnion(int* a, int aSize, int* b, int bSize)
{
    int *aEnd = a + aSize, *bEnd = b + bSize;
    std::vector<int> unionVec;

    for (; a != aEnd; ) {
        if (b == bEnd) {
            // copy all of a
            while (a != aEnd) {
                unionVec.push_back(*a);
                a = std::upper_bound(a + 1, aEnd, *a);
            }
            break;
        }

        if (*b < *a) {
            unionVec.push_back(*b);
            b = std::upper_bound(b + 1, bEnd, *b);
        } 
        else {
            unionVec.push_back(*a);
            if (*b == *a) {
                b = std::upper_bound(b + 1, bEnd, *b);
            }
            a = std::upper_bound(a + 1, aEnd, *a);
        }
    }

    // copy all of b
    while (b != bEnd) {
        unionVec.push_back(*b);
        b = std::upper_bound(b + 1, bEnd, *b);
    }

    printVector(unionVec);
}

If you can't use upper_bound directly, just implement that function yourself. Copying the implementation from this reference:

template<class ForwardIt, class T>
int* upper_bound(int* first, int* last, const int value)
{
    int* it;
    int count = last - first;
    int step;

    while (count > 0) {
        it = first; 
        step = count / 2; 
        it += step;
        if (value >= *it) {
            first = ++it;
            count -= step + 1;
        }
        else {
            count = step;
        }
    }

    return first;
}

Or a non-binary-search version:

int* upper_bound(int* first, int* last, const int value) {
    for (; first < last && *first == value; ++first) {
        ;
    }

    return first;
}

Now that's obviously pretty verbose, and that's why the standard actually provides an algorithm directly for you set_union:

void printUnion(int* a, int aSize, int* b, int bSize)
{
    std::vector<int> unionVec;

    // get the union
    std::set_union(a, a + aSize, b, b + bSize, std::back_inserter(unionVec));

    // remove the dupes
    unionVec.erase(std::unique(unionVec.begin(), unionVec.end()), unionVec.end());

    printVector(unionVec);
}

Solution 2:[2]

Here is one way. Elegance may vary!

void printUnion(int* a, int aSize, int* b, int bSize)
{
    std::multiset<int> x;
    x.insert(a, a + aSize);
    x.insert(b, b + bSize);

    for (auto y : x )
        cout << y << ",";
    cout << endl;
}

NB. consider having printUnion take iterator pairs. Use std::set to ignore duplicates, or std::multiset to retain duplicates.

Solution 3:[3]

def solve(a, b):
    c = []
    for i in range(len(a)):
        if a[i] not in c:
            c.append(a[i])
    for i in range(len(b)):
        if b[i] not in c:
            c.append(b[i])
    return c

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Tyler2P