'cert-manager HTTP01 certificate challenge is inaccessible when rewrite-target is enabled

We have a dozen of services exposed using a ingress-nginx controller in GKE.

In order to route the traffic correctly on the same domain name, we need to use a rewrite-target rule.

The services worked well without any maintenance since their launch in 2019, that is until recently; when cert-manager suddenly stopped renewing the Let's Encrypt certificates, we "resolved" this by temporarily removing the "tls" section from the ingress definition, forcing our clients to use the http version.

After that we removed all traces of cert-manager attempting to set it up from scratch.

Now, the cert-manager is creating the certificate signing request, spawns an acme http solver pod and adds it to the ingress, however upon accessing its url I can see that it returns an empty response, and not the expected token.

This has to do with the rewrite-target annotation that messes up the routing of the acme challenge. What puzzles me the most, is that this used to work before. (It was set up by a former employee)

Disabling rewrite-target is unfortunately not an option, because it will stop the routing from working correctly.

Using dns01 won't work because our ISP does not support programmatic changes of the DNS records.

Is there a way to make this work without disabling rewrite-target?

P.S. Here's a number of similar cases reported on Github:

None of them help.

Here's the definition of my ClusterIssuer

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # The ACME server URL
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: [email protected]
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-prod
    # Enable the HTTP-01 challenge provider
    solvers:
    - http01:
        ingress:
            class: nginx


Solution 1:[1]

Please share the cluster issuer or issue you are using.

ingressClass

If the ingressClass field is specified, cert-manager will create new Ingress resources in order to route traffic to the acmesolver pods, which are responsible for responding to ACME challenge validation requests.

Ref : https://cert-manager.io/v0.12-docs/configuration/acme/http01/#ingressclass

Mostly we don't see the HTTP solver challenge it comes and get removed if DNS or HTTP working fine.

Also, make sure your ingress doesn't have SSL-redirect annotation that could be also once reason behind certs not getting generated.

Did you try checking the other object of cert-manager like order and certificate status request ? kubectl describe challenge are you getting 404 there ?

If you are trying continuously there could be chance you hit rate limit of let's encrypt to request generating certificates.

Troubleshooting : https://cert-manager.io/docs/faq/troubleshooting/#troubleshooting-a-failed-certificate-request

Solution 2:[2]

When you configure an Issuer with http01, the default serviceType is NodePort. This means, it won't even go through the ingress controller. From the docs:

By default, type NodePort will be used when you don't set HTTP01 or when you set serviceType to an empty string. Normally there's no need to change this.

I'm not sure how the rest of your setup looks like, but http01 cause the acme server to make HTTP requests (not https). You need to make sure your nginx has listener for http (80). It does follow redirects, so you can listen on http and redirect all traffic to https, this is legit and working.

The cert-manager creates an ingress resource for validation. It directs traffic to the temporary pod. This ingress has it's own set of rules, and you can control it using this setting. You can try and disable or modify the rewrite-targets on this resource.

Another thing I would try is to access this URL from inside the cluster (bypassing the ingress nginx). If it works directly, then it's an ingress / networking problem, otherwise it's something else.

Please share the relevant nginx and cert-manager logs, it might be useful for debugging or understanding where your problem exist.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Chen A.