'Mergesorting and Removing Duplicates in C (Any language will work)

This is my first question so I apologize in advance if I leave anything out or am ambiguous on an item.

Anyway, this is code I got form GeeksForGeeks.org (array modified for purpose of question) , but the concept is the same: How would I need to modify the mergesort code provided so that duplicates are removed while sorting. I figured I could change:

if (L[i] <= R[j])
{
    arr[k] = L[i];
    i++;
}
else
{
    arr[k] = R[j];
    j++;
}
k++;

to

if (L[i] < R[j])
{
    arr[k] = L[i];
    i++;
}
else if (L[i] > R[j])
{
    arr[k] = R[j];
    j++;
}
else
{
    arr[k] = L[i];
    i++;
    j++;
}
k++;

But it threw off the array being sorted. And yes, they must be removed while sorting and not in other steps. Here is the promised code:

int shift = 0;
int merge(int arr[], int l, int m, int r)
{
int i, j, k;
int n1 = m - l + 1;
int n2 =  r - m;

/* create temp arrays */
int L[n1], R[n2];

/* Copy data to temp arrays L[] and R[] */
for (i = 0; i < n1; i++)
    L[i] = arr[l + i];
for (j = 0; j < n2; j++)
    R[j] = arr[m + 1+ j];

/* Merge the temp arrays back into arr[l..r]*/
i = 0; // Initial index of first subarray
j = 0; // Initial index of second subarray
k = l; // Initial index of merged subarray
while (i < n1 && j < n2)
{
    if (L[i] <  R[j])
    {
        arr[k] = L[i];
        i++;
    }
    else if (L[i] >  R[j])
    {
        arr[k] = R[j];
        j++;
    }
    else
    {
        arr[k] = L[i];
        i++;
        j++;
        shift++;
    }
    k++;
} 
    /* Copy the remaining elements of L[], if there
   are any */
while (i < n1)
{
    arr[k] = L[i];
    i++;
    k++;
}

/* Copy the remaining elements of R[], if there
   are any */
while (j < n2)
{
    arr[k] = R[j];
    j++;
    k++;
}
return shift;
}

/* l is for left index and r is right index of the
sub-array of arr to be sorted */
void mergeSort(int arr[], int l, int r)
{
if (l < r)
{
    // Same as (l+r)/2, but avoids overflow for
    // large l and h
    int m = l+(r-l)/2;

    // Sort first and second halves
    mergeSort(arr, l, m);
    mergeSort(arr, m+1, r);

    merge(arr, l, m, r);
}
}

/* UTILITY FUNCTIONS */
/* Function to print an array */
void printArray(int A[], int size)
{
int i;
for (i=0; i < size; i++)
    printf("%d ", A[i]);
printf("\n");
}

/* Driver program to test above functions */
int main()
{
int arr[] = {4,5,7,2,7,0,1,2,4,7,3,4,7,5,6};
int arr_size = sizeof(arr)/sizeof(arr[0]);

printf("Given array is \n");
printArray(arr, arr_size);

mergeSort(arr, 0, arr_size - 1);

printf("\nSorted array is \n");
printArray(arr, arr_size);
return 0;
}

Thank you!



Solution 1:[1]

I'm not sure how to this in C, but if you are in C++ you could use unordered set (https://www.cplusplus.com/reference/unordered_set/unordered_set/), or if you are in Java you could use HashSet class (https://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html). Using one of these will allow to throw your element while you are merge sorting and when you want to access element you first ask if it was already on the set.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Bandolero