'How to cache the compiled regex in Go
Below is my golang code. Each time validate method is called my compile method gets executed. I want to compile only once, not each time we call validate.
1) How to do it ? 2) My idea was to create an instance variable which would be nil at start. It would be lazy initialized in validate.
if (a != nil) {
a, err := regexp.Compile(rras.Cfg.WhiteList)
}
However if I declare a variable as an instance variable,
var a *Regexp; // regexp.Compile returns *Regexp
my compiler underlines in red. How to fix it ?
type RRAS struct {
Cfg *RRAPIConfig
}
type RRAPIConfig struct {
WhiteList string
}
func (rras *RRAS) validate(ctx context.Context) error {
a, err := regexp.Compile(rras.Cfg.WhiteList)
}
Solution 1:[1]
Static initialization
var whitelistRegexp = regexp.MustCompile(Cfg.WhiteList)
func (rras *RRAS) validate(ctx context.Context) error {
if !whitelistRegexp.Match(...) {...}
}
This will compile the Regexp as soon as the package is imported, which is usually at the startup of the program, before any code in the main-method executes.
Benefits
- Your program will crash immediately if the regex is broken, which helps to find bugs very quickly.
- Very small and clean code, without any pitfalls
- No need to worry about go-routines
Drawbacks
- Potentially slow compilation may slow down the startup of the whole program (or server)
- Only works if the regex is static and present at startup
- Only works if a single regex (or a few static regexes) is used for all cases
Synchronization and Caching
var whitelistR struct{
rex *regexp.Regexp
once sync.Once
err error
}
func (rras *RRAS) validate(ctx context.Context) error {
whitelistR.once.Do(func() {
whitelistR.ex, whitelistR.err = regexp.Compile(rras.Cfg.WhiteList)
})
if whitelistR.err != nil {
return fmt.Errorf("could not compile regex: %w", err)
}
if !whitelistR.rex.Match(...) {...}
}
This will layzily compile the Regexp on the first call to the method. The sync.Once
is very important, because it is a synchronization point, which guarantees access to the regexp is not a race condition. Every call to the method has to wait until the Regexp is compiled for the first time. After that the synchronization is very fast, because it uses only an atomic load.
You can also call go once.Do(...)
in your main method to initialize the regexp in parallel to speed up the first call, without blocking other methods.
Benefits
- Program (or server) startup is not impacted by the compilation time
- Compilation is only done if it is actually needed
- You can create the String for the Regexp dynamically on demand, which can reduce binary file size and speed up your program
- Possible to cache many different Regexes in a Caching-Map
Drawbacks
- Errors in the Regexp will only show up in tests which actually use this method, not on startup
- Code is more complex (10 lines instead of one)
- Someone developer might forget the call to sync.Once in another method and introduce a hard-to-catch race condition
- Someone might try to be clever and wrap the sync.Once call into an if and will introduce a hard-to-catch race condition
Conclusion
Almost always use the easy static initialization. Only if you are sure you have a performance impact (benchmarking) use the synchronized initialization. When synchronizing access always try to use the helpers which go provides (sync.Once, Mutex, RWMutex, ...) because they are optimized and less error prone.
Recommended Reading:
The Go Memory Model details about synchronization and best practices
Go Data Race Detector you should race-test every complex multi routine go program
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Falco |