'How to cache the compiled regex in Go

Below is my golang code. Each time validate method is called my compile method gets executed. I want to compile only once, not each time we call validate.

1) How to do it ? 2) My idea was to create an instance variable which would be nil at start. It would be lazy initialized in validate.

if (a != nil) {
  a, err := regexp.Compile(rras.Cfg.WhiteList)
}

However if I declare a variable as an instance variable,

var a *Regexp; // regexp.Compile returns *Regexp

my compiler underlines in red. How to fix it ?

type RRAS struct {
    Cfg       *RRAPIConfig
}

type RRAPIConfig struct {
    WhiteList               string
}

func (rras *RRAS) validate(ctx context.Context) error {
        a, err := regexp.Compile(rras.Cfg.WhiteList)
}

regex go compilation instance-variables

Solution 1:^[1]

Static initialization

var whitelistRegexp = regexp.MustCompile(Cfg.WhiteList)

func (rras *RRAS) validate(ctx context.Context) error {
  if !whitelistRegexp.Match(...) {...}
}

This will compile the Regexp as soon as the package is imported, which is usually at the startup of the program, before any code in the main-method executes.

Benefits

Your program will crash immediately if the regex is broken, which helps to find bugs very quickly.
Very small and clean code, without any pitfalls
No need to worry about go-routines

Drawbacks

Potentially slow compilation may slow down the startup of the whole program (or server)
Only works if the regex is static and present at startup
Only works if a single regex (or a few static regexes) is used for all cases

Synchronization and Caching

var whitelistR struct{
  rex *regexp.Regexp
  once sync.Once
  err error
}

func (rras *RRAS) validate(ctx context.Context) error {
  whitelistR.once.Do(func() {
    whitelistR.ex, whitelistR.err = regexp.Compile(rras.Cfg.WhiteList)
  })

  if whitelistR.err != nil {
    return fmt.Errorf("could not compile regex: %w", err)
  }

  if !whitelistR.rex.Match(...) {...}
}

This will layzily compile the Regexp on the first call to the method. The sync.Once is very important, because it is a synchronization point, which guarantees access to the regexp is not a race condition. Every call to the method has to wait until the Regexp is compiled for the first time. After that the synchronization is very fast, because it uses only an atomic load.

You can also call go once.Do(...) in your main method to initialize the regexp in parallel to speed up the first call, without blocking other methods.

Benefits

Program (or server) startup is not impacted by the compilation time
Compilation is only done if it is actually needed
You can create the String for the Regexp dynamically on demand, which can reduce binary file size and speed up your program
Possible to cache many different Regexes in a Caching-Map

Drawbacks

Errors in the Regexp will only show up in tests which actually use this method, not on startup
Code is more complex (10 lines instead of one)
Someone developer might forget the call to sync.Once in another method and introduce a hard-to-catch race condition
Someone might try to be clever and wrap the sync.Once call into an if and will introduce a hard-to-catch race condition

Conclusion

Almost always use the easy static initialization. Only if you are sure you have a performance impact (benchmarking) use the synchronized initialization. When synchronizing access always try to use the helpers which go provides (sync.Once, Mutex, RWMutex, ...) because they are optimized and less error prone.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Falco

'How to cache the compiled regex in Go

Solution 1:[1]

Static initialization

Benefits

Drawbacks

Synchronization and Caching

Benefits

Drawbacks

Conclusion

Recommended Reading:

Sources

Related Questions

Solution 1:^[1]