'Can I pop from a HashSet efficiently?

My algorithm needs to iteratively shrink a set by removing an element, and do something with the element removed and with the shrinking set in each iteration. And:

  • I need a genuine set with fast lookup, not just a vector containing unique elements.
  • The choice of element is arbitrary: the outcome of the algorithm doesn't depend on the order visited. The performance probably varies greatly with that choice, but let's say I want the simplest code and leave it up to the set itself to pick the element it can remove efficiently.
  • By the way, the algorithm is the basic form of the Bron–Kerbosch algorithm. Smarter versions of that algorithm work faster (mostly) because they don't leave the choice of element arbitrary and I'd like to learn how much that effort pays off.

Python sets have a pop member that pretty much does that. In Scala and Go, picking and removing the "first" element of a hash set seems to work fine (where "first" corresponds to the iterator). In Rust, that is something like:

// split off an arbitrary element from a (non-empty) set
pub fn pop<T>(set: &mut HashSet<T>) -> T
where
    T: Eq + Clone + std::hash::Hash,
{
    let elt = set.iter().next().cloned().unwrap();
    set.remove(&elt);
    elt
}

This seems to be a performance bottleneck compared to the other languages. I benchmarked some implementations of a pop-like function on the playground but none perform well. Apparently removing an element is not expensive, but picking one is: iter().next() costs a fortune (*). Avoiding that with retain understandably doesn't help: it always iterates the whole set. Are there any alternatives?

(*) On closer examination, iter().next() is quite cheap, as far as microbenchmarking can be trusted. Separate microbenchmarks say that picking an arbitrary element from a set costs (in nanoseconds on my system):

| Type of set      | Number of elements in set instance
|                  | 100 | 10,000 | 1,000,000
| Rust HashSet     |   2 |      2 |         2
| Rust BTreeSet    |  11 |     12 |        13
| Go map[]struct{} |  27 |     31 |        94
| Python set       | 125 |    125 |       125


Solution 1:[1]

I guess the same advice applies as in Can I randomly sample from a HashSet efficiently?: copy the set as a vector just to iterate over it as shown in the "sequenced" solution in the benchmark:

let seq: Vec<u32> = set.iter().cloned().collect();
for elt in seq {
    set.remove(&elt);

That means this answer is not applicable if you need to shrink the set (pick an arbitrary element) only once or a few times, or if the set contents cannot be cheaply cloned.

Solution 2:[2]

the set I'm using has integers

Don't use a HashSet; A BTreeSet has better and more consistent performance.

For N = 100000...

BTreeSet

sequenced : 3065.098µs
pop_1     : 2941.876µs
pop_2     : 2927.429µs

HashSet

sequenced : 3091.454µs
pop_1     : 172547.080µs
pop_2     : 807182.085µs

Solution 3:[3]

Your code can be simplified a bit:

let elt = set.iter().next().cloned().unwrap();
set.take(&elt).unwrap()

If you want to remove all elements from a HashSet then you should use the drain iterator - it is very efficient.

HashSet from the Rust standard library is not that fast. Try replacing it with one from the hashbrown crate.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Shepmaster
Solution 2 Shepmaster
Solution 3 Shepmaster