'Why do I get ThrottlingException - Rate Exceeded status:400 when making AWS Athena API call from API server?
We have an S3 data lake in AWS (with Lake Formation, Glue etc.) The end goal is to query the S3 data sources using SQL in Athena.
- When making the query in the AWS Athena console - everything works fine, results are provided,see screenshot: https://share.getcloudapp.com/NQuNBr5g
- When making the query through the official API
application domain (Symfony5 RESTful api that uses aws-sdk-php vendor), the
query doesn't even get to Athena, error returned is 400: https://share.getcloudapp.com/xQuqQLrq
- in CloudTrail events, I can only see errorcode= ThrottlingException and errormessage='Rate exceeded', there's no query execution id.
- Weird thing I don't get is, when making the same call in my localhost setup of the API app, the call is again successful: https://share.getcloudapp.com/jkuv8ZGy
The call made is StartQueryExecution on Athena API, error as shown on the API app's side:
Error executing \"GetQueryExecution\" on \"https://athena.us-west-2.amazonaws.com\"; AWS HTTP error: Client error: `POST https://athena.us-west-2.amazonaws.com` resulted in a `400 Bad Request` response:\n{\"__type\":\"ThrottlingException\",\"message\":\"Rate exceeded\"}\n ThrottlingException (client): Rate exceeded - {\"__type\":\"ThrottlingException\",\"message\":\"Rate exceeded\"}", "class": "Aws\\Athena\\Exception\\AthenaException"
The API app server and the datalake etc. are on the same VPC, and I created a VPC endpoint from the server's VPC to athena us-west-2 endpoint, but it didn't help. I don't think it's Athena Quota issues, since on localhost the query works just fine. Any insight would be very helpful, thank you!
Solution 1:[1]
The solution was a combination of actions. Athena just doesn't work like that. So it's not okay to expect data from an Athena query over an S3 datalake as if querying a relational database. What helped get results consistently and not have this error was:
- update the PHP SDK AthenaClient constructor, and also pass config for retries.
... other AthenaClient constructor params...
'retries' => [
'mode' => 'standard',
'max_attempts' => 3
],
- Athena and other elastic services (e.g. dynamodb) work asynchronously. You issue the query, but the result will not be delivered synchronously. As example - I saw in my early tests always receiving the initial "throttlingException" but in Athena Query console, the result of that exact same query came slightly later, but successfully. It looks like the PHP SDK for aws is done with this in mind so doing retries and exponential backoff is also what AWS recommends: https://docs.aws.amazon.com/general/latest/gr/api-retries.html
Partition your data, and in a relevant way, in order to scan as less data as possible. Which helps with more consistent and faster results. - https://docs.aws.amazon.com/athena/latest/ug/partitions.html // either on the glue table directly, or via Glue ETL job where partitioning keys are specified. If your query on athena is looking for something where country={country}, a good partitioning scheme is per country.
avoid 'select *' - always name exactly the columns needed + add limit + queries over Athena should be relatively simple select queries, if you need joins or other more complex query types, Redshift is better suited for that.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Irina O. |