'Why is my AWS NACL only allowing HTTP access with 'All Traffic' or 'All TCP' inbound rules?
I've got an AWS VPC set up with 3 subnets - 1 public subnet and 2 private. I have an EC2 instance with an associated Elastic Block Store (the EBS contains my website) running in the public subnet, and a MySQL database in the private subnets. The security group attached to the EC2 instance allows inbound HTTP access from any source, and SSH access from my IP address only. The outbound security rule allows all traffic to all destinations. The security group associated with the database allows MySQL/Aurora access only for both inbound and outbound traffic, with the source and destination being the public access security group.
This has all been working perfectly well, but when I came to setting up the NACLs for the subnets I ran into a snag that I can't figure out. If I change the inbound rule on the public subnet's NACL to anything other than 'All Traffic' or 'All TCP', I get an error response from my website: Unable to connect to the database: Connection timed out. 2002.
I've tried using every option available and always get this result. I'm also getting an unexpected result from the NACL attached to the private subnets: If I deny all access (i.e. delete all rules other than the default 'deny all' rule) for both inbound and outbound traffic, the website continues to function correctly (provided the inbound rule on the public subnet's NACL is set to 'All Traffic' or 'All TCP').
A similar question has been asked here but the answer was essentially to not bother using NACLs, rather than an explanation of how to use them correctly. I'm studying for an AWS Solutions Architect certification so obviously need to understand their usage and in my real-world example, none of AWS' recommended NACL settings work.
Solution 1:[1]
I know this is super late but I found the answer to this because I keep running into the same issue and always try to solve it with the ALL TRAFFIC rule. However, no need to do that anymore; it's answered here. The Stack Overflow answer provides the link to an AWS primary source that actually answers your question.
Briefly, you need to add a Custom TCP Rule to your outbound NACL and add the port range 1024 - 65535. This will allow the clients requesting access through the various ports to receive the data requested. If you do not add this rule, the outbound traffic will not reach the requesting clients. I tested this through ICMP (ping), ssh (22) http (80) and https (443).
Why do the ports need to be added? Apparently, AWS sends out traffic through one of the ports between 1024 and 63535. Specifically, "When a client connects to a server, a random port from the ephemeral port range (1024-63535) becomes the client's source port." (See second link.)
The general convention around ACLs is that because they are stateless, incoming traffic is sent back out through the mandatory corresponding port, which is why most newbies (or non hands on practitioners like me) may miss the "ephemeral ports" part of building custom VPCs.
For what it's worth, I went on to remove all the outgoing ports and left just the ephemeral port range. No outgoing traffic was allowed. It seems like either the ACL still needs those ports listed so it can send traffic requested through those ports. Perhaps the outgoing data, first goes through the appropriate outgoing port and then is routed to the specific ephemeral port to which the client is connected. To verify that the incoming rules still worked, I was able to ssh into an EC2 within a public subnet in the VPC, but was not able ping google.com from same.
The alternative working theory for why outgoing traffic was not allowed is because the incoming and matching outgoing ports are all below 1024-63535. Perhaps that's why the outgoing data is not picked up by that range. I will get around to configuring the various protocol (ssh, http/s, imcp) to higher port numbers,, within the range of the ephemeral ports, to continue to verify this second point.
====== [Edited to add findings ======
As a follow up, I worked on the alternate theory and it is likely that the outgoing traffic was not sent through the ephemeral ports because the enabled ports (22, 80 and 443) do not overlap with the ephemeral port range (1024-63535).
I verified this by reconfiguring my ssh protocol to login through port 2222 by editing my sshd_config file on the EC2 (instructions here. I also reconfigured my http protocol to provide access through port 1888. You also need to edit the config file of your chosen webserver, which in my case was apache thus httpd. (You can extrapolate from this link). For newbies, the config files will be generally found in the etc folder. Be sure to restart each service on the EC2 ([link][8] <-- use convention to restart ssh)
Both of these reconfigured port choices was to ensure overlap with the ephemeral ports. Once I made the changes on the EC2, I then changed the security group inbound rule, removed 22, 80 and 443 and added 1888 and 2222. I then went to the NACL and removed the inbound rules 22, 80 and 443 and added 1888 and 2222. [![inbound][9]][9]For the NACL, I removed the outbound rules 22, 80 and 443 and just left the custom TCP rule and add the ephemeral ports 1024-63535.[![ephemeral onnly][10]][10]
I can ssh using - p 2222 and access the web server through 1888, both of which overlap with ephemeral ports.[![p 1888][11]][11][![p2222][12]][12]
[8]: https://(https://hoststud.com/resources/how-to-start-stop-or-restart-apache-server-on-centos-linux-server.191/ [9]: https://i.stack.imgur.com/65tHH.png [10]: https://i.stack.imgur.com/GrNHI.png [11]: https://i.stack.imgur.com/CWIkk.png [12]: https://i.stack.imgur.com/WnK6f.png
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |