'Remove objects with a duplicate property from List

I have a List of objects in C#. All of the objects contain a property ID. There are several objects that have the same ID property.

How can I trim the List (or make a new List) where there is only one object per ID property?

[Any additional duplicates are dropped out of the List]



Solution 1:[1]

If you want to avoid using a third-party library, you could do something like:

var bar = fooArray.GroupBy(x => x.Id).Select(x => x.First()).ToList();

That will group the array by the Id property, then select the first entry in the grouping.

Solution 2:[2]

MoreLINQ DistinctBy() will do the job, it allows using object proeprty for the distinctness. Unfortunatly built in LINQ Distinct() not flexible enoght.

var uniqueItems = allItems.DistinctBy(i => i.Id);

DistinctBy()

Returns all distinct elements of the given source, where "distinctness" is determined via a projection and the default eqaulity comparer for the projected type.

PS: Credits to Jon Skeet for sharing this library with community

Solution 3:[3]

var list = GetListFromSomeWhere();
var list2 = GetListFromSomeWhere();
list.AddRange(list2);

....
...
var distinctedList = list.DistinctBy(x => x.ID).ToList();

More LINQ at GitHub

Or if you don't want to use external dlls for some reason, You can use this Distinct overload:

public static IEnumerable<TSource> Distinct<TSource>(
    this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)

Usage:

public class FooComparer : IEqualityComparer<Foo>
{
    // Products are equal if their names and product numbers are equal.
    public bool Equals(Foo x, Foo y)
    {

        //Check whether the compared objects reference the same data.
        if (Object.ReferenceEquals(x, y)) return true;

        //Check whether any of the compared objects is null.
        if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
            return false;

        return x.ID == y.ID
    }
}



list.Distinct(new FooComparer());

Solution 4:[4]

Starting from .NET 6, a new DistinctBy LINQ operator is available:

public static IEnumerable<TSource> DistinctBy<TSource,TKey> (
    this IEnumerable<TSource> source,
    Func<TSource,TKey> keySelector);

Returns distinct elements from a sequence according to a specified key selector function.

Usage example:

List<Item> distinctList = listWithDuplicates
    .DistinctBy(i => i.Id)
    .ToList();

There is also an overload that has an IEqualityComparer<TKey> parameter.


Alternative: In case creating a new List<T> is not desirable, here is a RemoveDuplicates extension method for the List<T> class:

/// <summary>
/// Removes all the elements that are duplicates of previous elements,
/// according to a specified key selector function.
/// </summary>
/// <returns>
/// The number of elements removed.
/// </returns>
public static int RemoveDuplicates<TSource, TKey>(
    this List<TSource> source,
    Func<TSource, TKey> keySelector,
    IEqualityComparer<TKey> keyComparer = null)
{
    var hashSet = new HashSet<TKey>(keyComparer);
    return source.RemoveAll(item => !hashSet.Add(keySelector(item)));
}

This method is efficient (O(n)) but a bit dangerous, because it has the potential to corrupt the contents of the List<T> in case the keySelector lambda fails for some item. The same problem exists with the built-in RemoveAll method¹. So in case the keySelector lambda is not fail-proof, the RemoveDuplicates method should be invoked in a try block that has a catch block where the potentially corrupted list is discarded.

¹ The List<T> class is backed by an internal _items array. The RemoveAll method invokes the Predicate<T> match for each item in the list, moving values stored in the _items along the way (source code). In case of an exception the RemoveAll just exits immediately, leaving the _items in a corrupted state. I've posted an issue on GitHub regarding the corruptive behavior of this method, and the feedback that I've got was that neither the implementation should be fixed, nor the behavior should be documented.

Solution 5:[5]

Not sure if anyone is still looking for any additional ways to do this. But I've used this code to remove duplicates from a list of User objects based on matching ID numbers.

private ArrayList RemoveSearchDuplicates(ArrayList SearchResults)
{
    ArrayList TempList = new ArrayList();

    foreach (User u1 in SearchResults)
    {
        bool duplicatefound = false;
        foreach (User u2 in TempList)
            if (u1.ID == u2.ID)
                duplicatefound = true;

        if (!duplicatefound)
            TempList.Add(u1);
    }
    return TempList;
}

Call: SearchResults = RemoveSearchDuplicates(SearchResults);

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Daniel Lord
Solution 2 Kolappan N
Solution 3 Kolappan N
Solution 4
Solution 5 Nikita Popov