'Cloudfront cache with GraphQL?
At my company we're using graphql for production apps, but only for private ressources.
For now our public APIs are REST APIs with a Cloudfront service for cache. We want to transform them as GraphQL APIs, but the question is : how to handle cache properly with GraphQL ?
We thought using a GET graphql endpoint, and cache on querystring but we are a bit affraid of the size of the URL requested (as we support IE9+ and sell to schools with sometime really dummy proxy and firewalls)
So we would like to use POST graphQL endpoint but...cloudfront cannot cache a request based on its body
Anyone has an idea / best practice to share ? Thanks
Solution 1:[1]
The two best options today are:
- Use a specialized caching solution, like FastQL.io
- Use persisted queries with GET, where some queries are saved on your server and accessed by name via GET
*Full disclosure: I started FastQL after running into these issues without a good solution.
Solution 2:[2]
I am not sure if it has a specific name, but I've seen a pattern in the wild where the graphQL queries themselves are hosted on the backend with a specific id. It's much less flexible as it required pre-defined queries baked in.
The client would just send arguments/params and ID of said pre-defined query to use and that would be your cache key. Similar to how HTTP caching would work with an authenticated request to /my-profile
with CloudFront serving different responses based on auth token in headers.
How the client sends it depends on your backends implementation of graphQL. You could either pass it as a white listed header or query string.
So if the backend has defined a query that looks like
(Using pseudo code)
const MyQuery = gql`
query HeroNameAndFriends($episode: int) {
hero(episode: $episode) {
name
friends {
name
}
}
}
`
Then your request would be to something like api.app.com/graphQL/MyQuery?episode=3
.
That being said, have you actually measured that your queries wouldn't fit in a GET request? I'd say go with GET requests if CDN Caching is what you need and use the approach mentioned above for the requests that don't fit the limits.
Edit: Seems it has a name: Automatic Persisted Queries. https://www.apollographql.com/docs/apollo-server/performance/apq/
Another alternative to remain with POST requests is to use Lambda@Edge on your CloudFront and by using DynamoDB tables to store your caches similar to how CloudFlare workers do it.
async function handleRequest(event) {
let cache = caches.default
let response = await cache.match(event.request)
if (!response){
response = await fetch(event.request)
if (response.ok) {
event.waitUntil(cache.put(event.request, response.clone()))
}
}
return response
}
Some reading material on that
Solution 3:[3]
An option I've explored on paper but not yet implemented is to use Lambda@Edge in request trigger mode to transform a client POST to a GET, which can then result in a cache hit.
This way clients can still use POST to send GQL requests, and you're working with a small number of controlled services within AWS when trying to work out the max URL length for the converted GET request (and these limits are generally quite high).
There will still be a length limit, but once you have 16kB+ GQL requests, it's probably time to take the other suggestion of using predefined queries on server and just reference them by name.
It does have the disadvantage that request trigger Lambdas run on every request, even a cache hit, so will generate some cost, although the lambda itself should be very fast/simple.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 |