'Performance issue in Golang's key-value store (Badger DB)
In badgerDB, we have billions of keys of type string
and values of type HashValueList
. In our use case length of HashValueList
might be in millions. We have to encode key and value in []byte
before inserting into BadgerDb; we are using the encoding/gob
package. So, whenever we need values, we have to decode them again. This decoding process is causing overhead in our case.
type HashValue struct {
Row_id string
Address string
}
type HashValueList []HashValue
To mitigate decoding overhead, we change our design to prefix iteration. With prefix iteration, we stored each of the values from our collection as a distinct Badger KV pair, rather than a single key with a large value. The prefix of the key would be the original hash value key. we'd then need to add a suffix to provide uniqueness over the collection of values from the original collection. So in your original scheme have something like:
k1 -> [v1, v2, v3, ..., vn]
...
km -> [w1, ..., wm]
Now have something like:
k1@1 -> v1
k1@2 -> v2
k1@3 -> v2
...
k1@n -> vn
...
km@1 -> w1
...
km@m -> wm
To find the values from DB, we have n goroutines that are reading the KeyChan
channel and writing values to ValChan
.
func Get(db *badger.DB, KeyChan <-chan string, ValChan chan []byte) {
var val []byte
for key := range KeyChan {
txn := db.NewTransaction(false)
opts := badger.DefaultIteratorOptions
opts.Prefix = []byte(key)
it := txn.NewIterator(opts)
prefix := []byte(key)
for it.Rewind(); it.ValidForPrefix(prefix); it.Next() {
item := it.Item()
val, err := item.ValueCopy(val[:])
ValChan <- val
item = nil
if err != nil {
fmt.Println(err)
}
}
it.Close()
txn.Discard()
}
}
In Prefix iteration, Get func to gets very slow after a while. We collected a 5-second execution trace, and the results are below:
Here github.com/dgraph-io/badger/v3.(*Iterator).fill.func1 N=2033535
Creating huge number of Goroutine internally.
The significant time spent in Scheduler Wait.
How do we improve the prefix iteration performance? Is there any better approach for our use case?
Thanks
Solution 1:[1]
If you want to iterate over a prefix really fast and collect the results, you should consider using the Stream framework within Badger. It uses many goroutines to make iteration as fast as disk would allow.
https://pkg.go.dev/github.com/outcaste-io/badger/v3#Stream
Also, in general, using a Go channel to collect what could be millions of results would be really slow. Channels are best used by batching up the results and decreasing how many times you interact with them.
Stream Framework takes care of all of that by giving you a serially executed Stream.Send function. You can even keep the encoded data as it is, and decode it within the Send, or copy it over from Send.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mrjn |