'AWS Cloudfront + lambda@edge modify html content (making all links absolute -> relative)

I (maybe falsely) assumed lambda@edge can modify origin.responce content, so wrote a lambda function like this:

/* this does not work. response.Body is not defined */

'use strict';
exports.handler = (event, context, callback) => {
  var response = event.Records[0].cf.response;
  var data = response.Body.replace(/OLDTEXT/g, 'NEWTEXT');
  response.Body = data;
  callback(null, response);
};

Which fails because you can not reference origin responce body with this syntax.

Can I modify this script to make it work as I intended, or maybe should I consider using another service on AWS?

My background :

We are trying to set up an AWS Cloudfront distribution, that consolidates access to several websites, like this:

ttp://foo.com/ -> https:/newsite.com/foo/
ttp://bar.com/ -> https:/newsite.com/bar/
ttp://boo.com/ -> https:/newsite.com/boo/

the sites are currently managed by external parties. We want to disable direct public access to foo/bar/boo, and have just newsite.com as the only site visible on the internet.

Mapping the origins into a single c-f distribution is relatively simple. however doing so will break html contents that specify files with an absolute url, if their current domain names are removed from the web.

ttp://foo.com/images/1.jpg
 -> (disable foo.com dns)
  -> image not found

to benefit from cloudfront caching and other merits, I want to modify/rewrite all absolute file references in html files to a relative url -
so

<img src="ttp://foo.com/images/1.jpg">

becomes

<img src="/foo/images/1.jpg">

//(accessed as https:/newsite.com/foo/images/1.jpg from a user)
//(maybe I should make it an absolte url for SEO purpose)

(http is changed to ttp, due to restriction of using the banned domain name foo.com)

(edit) I found this AWS blog, which may be a great hint but feel a little too convoluted to my expectation. (set up a linux container so I can just use sed to process html files, maybe using S3 as a temp storage) Hope I can find a simpler way: https://aws.amazon.com/blogs/networking-and-content-delivery/resizing-images-with-amazon-cloudfront-lambdaedge-aws-cdn-blog/



Solution 1:[1]

From what I have just learnt myself you unfortunately cannot modify the response body within a Lambda@edge. You can only wipe out or totally replace the body content. I was hoping to be able to clean all responses from a legacy site, but using a Cloudfront Lambda@Edge will not allow this to be done.

As the AWS documentation states here :

When you’re working with the HTTP response, Lambda@Edge does not expose the body that is returned by the origin server to the origin-response trigger. You can generate a static content body by setting it to the desired value, or remove the body inside the function by setting the value to be empty. If you don’t update the body field in your function, the original body returned by the origin server is returned back to viewer.

Solution 2:[2]

I ran into the same issue, and have been able to pull some info out of the request headers to piece together a URL from which I can fetch the original body.

Beware: I haven't yet been able to confirm that this is a "safe" method, like maybe it's relying on undocumented behaviour etc, but for now it DOES fetch the original body properly, for me. Of course it also takes another request / round trip, possibly inferring some extra transfer costs, execution time, etc.

const fetchOriginalBody = (request) => {
    const host = request['headers']['host'][0]['value']; // xxxx.yyy.com
    const uri = request['uri'];
    const fetchOriginalBodyUrl = 'https://' + host + uri;

    return httpsRequest(fetchOriginalBodyUrl);
}

// Helper that turns https.request into a promise
function httpsRequest(options) {
    return new Promise((resolve, reject) => {
        const req = https.request(options, (res) => {
            if (res.statusCode < 200 || res.statusCode >= 300) {
                return reject(new Error('statusCode=' + res.statusCode));
            }
            var body = [];
            res.on('data', function(chunk) {
                body.push(chunk);
            });
            res.on('end', function() {
                try {
                    body = Buffer.concat(body).toString();
                    // body = JSON.parse(Buffer.concat(body).toString());
                } catch(e) {
                    reject(e);
                }
                resolve(body);
            });
        });

        req.on('error', (e) => {
            reject(e.message);
        });

        req.end();
    });
}

exports.handler = async (event, context, callback) => {
    const records = event.Records;
    if (records && records.length > 0) {
        const request = records[0].cf.request;

        const body = await fetchOriginalBody(request);
    }

    ...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Matt Monty
Solution 2 Jonny