'What is the purpose of a Bloomier filter?

This question is about the Bloomier filter, which is not the same as a standard Bloom filter.

I'm learning about the Bloomier filter and I don't see the advantage of using one. As far as I'm concerned, a Bloomier filter is a generalization of a Bloom filter. It can return the specific items themselves.

However, you could accomplish this by simply using hash tables and they seems faster and more space-efficient.

Given this, what's the purpose of a Bloomier filter?



Solution 1:[1]

There is a talk by Michael Mitzenmacher available here. On slide 41 he mentions the following about Bloomier Filter:

Bloomier filter [Chazelle, Kilian, Rubinfeld, Tal]:

  • Extend to handle approximate functions.
  • Each element of set has associated function value.
  • Non-set elements should return null.
  • Want to always return correct function value for set elements.
  • A false positive returns a function value for a non-null element.

Solution 2:[2]

You might find the following analogy helpful:

Bloom filter is to set as Bloomier filter is to map.

A Bloom filter can be thought of as a way of storing a compressed version of a set of items in significantly less space than the original set would normally take. The tradeoff involved is that you might have some false positives - items that aren't actually in the set but which the Bloom filter says are in the set.

A Bloomier filter can be thought of as a way of storing a compressed version of a map (or associative array, or dictionary, depending on what terminology you're using). That is, it stores a compressed representation of an association from keys to values. The advantage of the Bloomier filter is that it uses significantly less space than what would normally be required to store the map. The drawback is that the Bloomier filter can have false positives - if you look something up that isn't in the map, the Bloomier filter might accidentally report that it's there and give back a nonsense associated value. (This is a bit different from what you described in your question. In your question, you characterized a Bloomier filter as a Bloom filter that can also hand back the items stored in the filter.)

So to directly answer your question, yes, you absolutely could use a hash table instead of a Bloomier filter. However, doing so would use considerably more space. In most cases, we don't care about that space, but there are applications where space is at a premium and reducing storage would be valuable. As an example, check out the paper Weightless: Lossy Weight Encoding For Deep Neural Network Compression, which uses Bloomier filters to reduce the storage space needed to store large neural networks without too much of a decrease in the quality of the network.

As for how a Bloomier filter works, it's based on a technique that's closely related to that of the XOR filter, and the linked question gives a rough overview of the key techniques.

Solution 3:[3]

On example of the use of a Bloomier filter is in an LSM tree. The Bloomier filter can store a map of each key and the run it is a member of, and this can fit in memory much easier than the full LSM tree, assuming the values are large. This reduces lookup time substantially, and industry LSM trees like levelDB and RocksDB do use Bloom-filter-like structures to help reduce lookup time.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nick Dandoulakis
Solution 2 templatetypedef
Solution 3 jbapple