'Modsecurity & Apache: How to limit access rate by header?

I have both Apache and Modsecurity working together. I'm trying to limit hit rate by request's header (like "facebookexternalhit"). And then return a friendly "429 Too Many Requests" and "Retry-After: 3".

I know I can read a file of headers like:

SecRule REQUEST_HEADERS:User-Agent "@pmFromFile ratelimit-bots.txt"

But I'm getting trouble building the rule.

Any help would be really appreciated. Thank you.



Solution 1:[1]

After 2 days of researching and understanding how Modsecurity works, I finally did it. FYI I'm using Apache 2.4.37 and Modsecurity 2.9.2 This is what I did:

In my custom file rules: /etc/modsecurity/modsecurity_custom.conf I've added the following rule:

# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "@pm facebookexternalhit" \
    "id:400009,phase:2,nolog,pass,setvar:global.ratelimit_facebookexternalhit=+1,expirevar:global.ratelimit_facebookexternalhit=3"
SecRule GLOBAL:RATELIMIT_FACEBOOKEXTERNALHIT "@gt 1" \
    "chain,id:4000010,phase:2,pause:300,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"
    SecRule REQUEST_HEADERS:User-Agent "@pm facebookexternalhit"
Header always set Retry-After "3" env=RATELIMITED
ErrorDocument 429 "Too Many Requests"

Explanation:

Note: I want to limit to 1 request every 3 seconds.

  1. The first rule matches the request header user agent against "facebookexternalhit". If the match was succesful, it creates the ratelimit_facebookexternalhit property in the global collection with the initial value of 1 (it will increment this value with every hit matching the user agent). Then, it sets the expiration time of this var in 3 seconds. If we receive a new hit matching "facebookexternalhit" it will sum 1 to ratelimit_facebookexternalhit. If we don't receive hits matching "facebookexternalhit" after 3 seconds, ratelimit_facebookexternalhit will be gone and this process will be restarted.
  2. If global.ratelimit_clients > 1 (we received 2 or more hits within 3 seconds) AND user agent matches "facebookexternalhit" (this AND condition is important because otherwise all requests will be denied if a match is produced), we set RATELIMITED=1, stop the action with a 429 http error, and log a custom message in Apache error log: "RATELIMITED BOT".
  3. RATELIMITED=1 is set just to add the custom header "Retry-After: 3". In this case, this var is interpreted by Facebook's crawler (facebookexternalhit) and will retry operation in the specified time.
  4. We map a custom return message (in case we want) for the 429 error.

You could improve this rule by adding @pmf and a .data file, then initializing global collection like initcol:global=%{MATCHED_VAR}, so you are not limited just to a single match by rule. I didn't test this last step (this is what I needed right now). I'll update my answer in case I do.

UPDATE:

I've adapted the rule to be able to have a file with all user agents I want to rate limit, so a single rule can be used across multiple bots/crawlers:

# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "@pmf data/ratelimit-clients.data" \
    "id:100008,phase:2,nolog,pass,setuid:%{tx.ua_hash},setvar:user.ratelimit_client=+1,expirevar:user.ratelimit_client=3"

SecRule USER:RATELIMIT_CLIENT "@gt 1" \
    "chain,id:1000009,phase:2,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"                                                                                     
    SecRule REQUEST_HEADERS:User-Agent "@pmf data/ratelimit-clients.data"

Header always set Retry-After "3" env=RATELIMITED

ErrorDocument 429 "Too Many Requests"

So, the file with user agents (one per line) is located inside a subdirectory under the same directory of this rule: /etc/modsecurity/data/ratelimit-clients.data. Then we use @pmf to read and parse the file (https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual-(v2.x)#pmfromfile). We initialize the USER collection with the user agent: setuid:%{tx.ua_hash} (tx.ua_hash is in the global scope in /usr/share/modsecurity-crs/modsecurity_crs_10_setup.conf). And we simply use user as collection instead of global. That's all!

Solution 2:[2]

Might be better to use "deprecatevar", And you can allow a bit bigger burst leneanancy

# Limit client hits by user agent SecRule REQUEST_HEADERS:User-Agent "@pmf data/ratelimit-clients.data" \
    "id:100008,phase:2,nolog,pass,setuid:%{tx.ua_hash},setvar:user.ratelimit_client=+1,deprecatevar:user.ratelimit_client=3/1"

SecRule USER:RATELIMIT_CLIENT "@gt 1" \
    "chain,id:1000009,phase:2,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"                                                                 

    SecRule REQUEST_HEADERS:User-Agent "@pmf data/ratelimit-clients.data"

Header always set Retry-After "6" env=RATELIMITED

ErrorDocument 429 "Too Many Requests"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Sasf54