'AWS System Manager start session: An error occurred (TargetNotConnected) when calling the StartSession operation: <instance_id> is not connected
Problem:
When I try to locally connect to a running EC2 instance using the AWS System Session Manager CLI command: aws ssm start-session --target i-123456
I get the error:
An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.
Background:
- Linux 2 instance hosted on a private subnet within a custom VPC
- VPC endpoints used to connect System Manager to managed instances without the need for a NAT GW or IGW.
- Endpoint Service Names:
com.amazonaws.us-west-2.s3
com.amazonaws.us-west-2.ec2
com.amazonaws.us-west-2.ec2messages
com.amazonaws.us-west-2.ssm
com.amazonaws.us-west-2.ssmmessages
- AWS CLI == 2.0.40
- Python == 3.7.4
- Custom Terraform module to launch airflow instance within one of the private subnets (see module "airflow_aws_resources" below)
- The only .tf file that would be relevant to this problem would be airflow.tf within the module "airflow_aws_resources". This file contains the security group and instance profile configuration for the EC2 instance that is being connected via SSM.
Reproduce with Terraform:
module "airflow_aws_resources" {
source = "github.com/marshall7m/tf_modules/airflow-aws-resources"
resource_prefix = "test"
vpc_id = module.vpc.vpc_id
env = "testing"
private_bucket = "test-bucket"
private_subnets_ids = module.vpc.private_subnets
private_subnets_cidr_blocks = module.vpc.private_subnets_cidr_blocks
create_airflow_instance = true
create_airflow_instance_sg = true
create_airflow_db = false
create_airflow_db_sg = false
airflow_instance_ssm_access = true
airflow_instance_ssm_region = "us-west-2"
airflow_instance_ami = "ami-0841edc20334f9287"
airflow_instance_type = "t2.micro"
}
resource "aws_security_group" "vpc_endpoints" {
name = "test-vpc-endpoint-sg"
description = "Default security group for vpc endpoints"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
#private subnet cidr blocks
cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
}
egress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
}
egress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["10.0.0.32/28", "10.0.0.64/28"]
}
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "2.44.0"
name = "test-vpc"
cidr = "10.0.0.0/24"
azs = ["us-west-2a", "us-west-2b"]
private_subnets = ["10.0.0.32/28", "10.0.0.64/28"]
private_dedicated_network_acl = true
private_subnet_suffix = "private"
public_subnets = ["10.0.0.96/28", "10.0.0.128/28"]
public_dedicated_network_acl = true
public_subnet_suffix = "public"
enable_s3_endpoint = true
enable_ec2messages_endpoint = true
ec2messages_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]
enable_ec2_endpoint = true
ec2_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]
enable_ssm_endpoint = true
ssm_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]
enable_ssmmessages_endpoint = true
ssmmessages_endpoint_security_group_ids = [aws_security_group.vpc_endpoints.id]
enable_nat_gateway = false
single_nat_gateway = false
enable_vpn_gateway = false
create_database_subnet_route_table = false
create_database_internet_gateway_route = false
create_database_subnet_group = false
manage_default_network_acl = false
enable_dns_hostnames = true
enable_dns_support = true
private_inbound_acl_rules = [
{
"description": "Allows inbound https traffic for aws s3 package requests"
"cidr_block": "0.0.0.0/0",
"from_port": 443,
"to_port": 443,
"protocol": "tcp",
"rule_action": "allow",
"rule_number": 101
},
{
"description": "Allows inbound http traffic for aws s3 package requests"
"cidr_block": "0.0.0.0/0",
"from_port": 80,
"to_port": 80,
"protocol": "tcp",
"rule_action": "allow",
"rule_number": 102
}
]
private_outbound_acl_rules = [
{
"description": "Allows outbound https traffic for aws s3 package requests"
"cidr_block": "0.0.0.0/0",
"from_port": 443,
"to_port": 443,
"protocol": "tcp",
"rule_action": "allow",
"rule_number": 101
},
{
"description": "Allows outbound http traffic for aws s3 package requests"
"cidr_block": "0.0.0.0/0",
"from_port": 80,
"to_port": 80,
"protocol": "tcp",
"rule_action": "allow",
"rule_number": 102
}
]
vpc_endpoint_tags = {
type = "vpc-endpoint"
}
}
Attempts:
#1
I tried the trouble shooting tips within the EC2 Console SSM (AWS Ec2 console >> instance-id >> Connect >> Session Manager):
SSM agent is already pre-installed on AWS Linux instance types. Although I doubled checked by accessing the instance via SSH and running
sudo status amazon-ssm-agent
which returned:amazon-ssm-agent start/running, process 1234
The EC2 instance profile displayed above includes the required
AmazonSSMManagedInstanceCore
policyI completed the Session Manager Prerequisite.
#2
Attaching AmazonSSMFullAccess
to the user using the command: aws ssm start-session --target i-123456
Same error while connecting the instance via SSM:
An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.
#3
Adding HTTPS inbound/outbound traffic from the VPC endpoint's asscoiated private subnet to the EC2 instance security group (see airflow.tf)
Same error:
An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.
#4
Within the System Manager console I used the Quick Setup option and configured the Quick Setup with the Instance profile specified in airflow.tf and the System Manager role with the default role. The ec2 instance successfully registered "Managed instances" within the quick setup page.
Same error:
An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.
#5
Given this is a test VPC and EC2 instance, I tried allowing all types of traffic from all IPv4 sources (0.0.0.0/0) for the following resources:
- Private subnets NACL
- EC2 instance security group
- The security group associated with the following interface/gateway endpoints:
com.amazonaws.us-west-2.s3
com.amazonaws.us-west-2.ec2
com.amazonaws.us-west-2.ec2messages
com.amazonaws.us-west-2.ssm
com.amazonaws.us-west-2.ssmmessages
Same error while connecting the instance via SSM:
An error occurred (TargetNotConnected) when calling the StartSession operation: i-123456 is not connected.
Solution 1:[1]
I would refer here to make sure you have everything set up properly. I would first add the profile argument. If that still doesn't work, I ran into a similar issue when my profiles default region was not the same region I was looking to begin an active session. Thus, I needed to use the region argument as well. Sample .ssh/config below:
host ssh i-abc123
ProxyCommand sh -c "aws --region desired_region --profile my_profile ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'"
I would also encourage using AWS CLI v2. Once you configure your .ssh/config to look like that above, simply execute the following in a CLI:
ssh i-abc123
Solution 2:[2]
So you might need to use a profile. I am using AWS CLI on OSX to connect via the terminal into a linux host in a VPC. This is an account only accessible via SSO. I was able to create a profile and after authenticating via the CLI to SSO I can establish a connection like this.
Do this once
aws sso login --profile my_customer
Then verify the sso login was successful with a trivial command (on my osx terminal)
aws s3 ls --profile my_customer custbucket-s3-sftp/rds/
now establish session manager connection
aws ssm start-session --profile my_customer --target i-0012345abcdef890
I know you are using python but maybe this helps.
Solution 3:[3]
In some cases, you've to verify the following:
- AWS Account/Profile
- AWS region
In one case, I found that it was trying to connect to aws profile.
Later in other case I was connecting to a different region.
Solution 4:[4]
I was also getting the same error when I tried to connect from my Terminal: An error occurred (TargetNotConnected) when calling the StartSession operation: i-122334455 is not connected.
In my case, the issue was that the SSM installed on the target instance was out of date. I discovered this by trying to start the session from Systems Manager in the AWS console; basically going to Systems Manager->Fleet Manager->{INSTANCE_ID}->Instance Actions->Start Session. When I tried that, I got the error message that the SSM agent on the target ec2 instance was out of date. After updating, I was able to login successfully.
To update, you can either enable SSM agent auto-update for all managed instances, update the particular instance manually, or do selective update of the managed instances. See the following documentation for info:
- https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-troubleshooting.html#session-manager-troubleshooting-instances
- https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent-automatic-updates.html
- https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-manual-agent-install.html
Solution 5:[5]
Explaination: Unfortunately ec2 instances are not fault tolerant and under your system server is a host system. As a best practice you should add another instances to backup and prevent single point of failure.
One of the possible reasons when you try to ssm/ssh your host and get TargetNotConnected
issue can happen from several reasons: if a host hardware fails, connectivity/electricity issues, software memory leak ( running out of memory ), full disk that are not cleaned up or your application can handle edge cases and crashing itself.
Under parts of this cases ec2 instance state might still be running though the reachability fails.
When you run aws ec2 describe-instance-status --instance-ids <instance-id>
you might notice that the instance state is running though the health check fails.
Example:
request: aws ec2 describe-instance-status --instance-ids i-abc123
response:
{
"InstanceStatuses": [
{
"AvailabilityZone": "us-west-1b",
"InstanceId": "i-abc123",
"InstanceState": {
"Code": 16,
"Name": "running"
},
"InstanceStatus": {
"Details": [
{
"ImpairedSince": "2020-10-10T12:10:00+00:00",
"Name": "reachability",
"Status": "failed"
}
],
"Status": "impaired"
},
"SystemStatus": {
"Details": [
{
"Name": "reachability",
"Status": "passed"
}
],
"Status": "ok"
}
}
]
}
Solution would be recreating this instance again if it's an hardware issue ( in iaac platforms such as terraform / clodformation or manually ofcourse ) if it's applicative issue connect into machine and solve the exact problem.
Solution 6:[6]
Do your Interface
type VPC endpoints have private DNS enabled?
Session Manager appears to need private_dns_enabled = true
in Terraform VPC endpoints of Interface
type in order to work.
Solution 7:[7]
I ran into this after making some changes with terraform that modified the EC2 instance in place. Turns out that all I needed to do was reboot the EC2, and then it allowed me to connect again
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Dharman |
Solution 2 | gpmilliken |
Solution 3 | aspdeepak |
Solution 4 | king |
Solution 5 | |
Solution 6 | Harry |
Solution 7 | Dan O |