'Aws ecs fargate ResourceInitializationError: unable to pull secrets or registry auth
I am trying to run a private repository on aws-ecs-fargate-1.4.0 platform.
For private repository authentication, I have followed the docs and it was working well.
Somehow after updating existing service many times it goes fail to run the task and complain the error like
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to get registry auth from asm: service call has been retried 1 time(s): asm fetching secret from the service for <secretname>: RequestError: ...
I haven't change the ecsTaskExecutionRole
and it contains all required policies to fetch secret value.
- AmazonECSTaskExecutionRolePolicy
- CloudWatchFullAccess
- AmazonECSTaskExecutionRolePolicy
- GetSecretValue
- GetSSMParamters
Solution 1:[1]
AWS employee here.
What you are seeing is due to a change in how networking works between Fargate platform version 1.3.0, and Fargate platform version 1.4.0. As part of the change from using Docker to using containerd we also made some changes to how networking works. In version 1.3.0 and below each Fargate task got two network interfaces:
- One network interface was used for the application traffic from your application container(s), as well as for logs and container image layer pulls.
- A secondary network interface was used by the Fargate platform itself, to get ECR authentication credentials, and fetch secrets.
This secondary network interface had some downsides though. This secondary traffic did not show up in your VPC flow logs. Also while most traffic stayed in the customer VPC, the secondary network interface was sending traffic outside of your VPC. A number of customers complained that they did not have the ability to specify network level controls on this secondary network interface and what it was able to connect to.
To make the networking model less confusing and give customers more control, we changed in Fargate platform version 1.4.0 to using a single network interface and keeping all traffic inside of your VPC, even the Fargate platform traffic. The Fargate platform traffic for fetching ECR authentication and task secrets now uses the same task network interface as the rest of your task traffic, and you can observe this traffic in VPC flow logs, and control this traffic using the routing table in your own AWS VPC.
However, with this increased ability to observe and control the Fargate platform networking, you also become responsible for ensuring that there is actually a network path configured in your VPC that allows the task to communicate with ECR and AWS Secrets Manager.
There are a few ways to solve this:
- Launch tasks into a public subnet, with a public IP address, so that they can communicate to ECR and other backing services using an internet gateway
- Launch tasks in a private subnet that has a VPC routing table configured to route outbound traffic via a NAT gateway in a public subnet. This way the NAT gateway can open a connection to ECR on behalf of the task.
- Launch tasks in a private subnet and make sure you have AWS PrivateLink endpoints configured in your VPC, for the services you need (ECR for image pull authentication, S3 for image layers, and AWS Secrets Manager for secrets).
You can read more about this change in this official blogpost, under the section "Task elastic network interface (ENI) now runs additional traffic flows"
https://aws.amazon.com/blogs/containers/aws-fargate-launches-platform-version-1-4/
Solution 2:[2]
I'm not completely sure about your setup but after I disabled the NAT-Gateways to save some $, I had a very similar error message on the aws-ecs-fargate-1.4.0 platform:
Stopped reason: ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 1 time(s): RequestError: send request failed caused by: Post https://api.ecr....
It turned out that I had to create VPC Endpoints to these Service names:
- com.amazonaws.REGION.s3
- com.amazonaws.REGION.ecr.dkr
- com.amazonaws.REGION.ecr.api
- com.amazonaws.REGION.logs
- com.amazonaws.REGION.ssm
And I had to downgrade to the aws-ecs-fargate-1.3.0 platform. After the downgrade the Docker images could be pulled from ECR and the deployments succeeded again.
If you are using the secret manager without a NAT-Gateway, it might be that you have to create a VPC Endpoint for com.amazonaws.REGION.secretsmanager
.
Solution 3:[3]
Ensure internet connectivity either via IGW
or NAT
and make sure public IP is Enabled, if its IGW in Fargate Task/Service network configuration.
{
"awsvpcConfiguration": {
"subnets": ["string", ...],
"securityGroups": ["string", ...],
"assignPublicIp": "ENABLED"|"DISABLED"
}
}
Solution 4:[4]
This error occurs when the Fargate agent fails to create or bootstrap the resources required to start the container or the task is belongs to. This error only occurs if using platform version 1.4 or later, most likely because the version 1.4 uses Task ENI (which is in your VPC) instead of the Fargate ENI (which is in AWS's VPC). I'd think this might be caused by some need for extra IAM permissions needed to pull image from ECR. Are you using any privatelink? If yes, you might wanna take a look at the policies for ECR endpoint.
I'll try to replicate it but I'd suggest opening a support Ticket with AWS if you can so they can take a closer look at your resources and better suggest.
Solution 5:[5]
If you are using a public subnet and select "Don't assign public address", this error can happen.
The same is applicable if you have a private subnet and do not have an internet gateway or NAT gateway in your VPC. It needs a route to the internet.
This is the same behaviour across all of AWS ecosystem. It would be great if AWS can display a large banner warning in such cases.
Solution 6:[6]
Since ECS agent in FARGATE version 1.4.0 uses task ENI to retrieve information, the request to the Secret Manager will go through this eni.
You must ensure that the trafic to the Secret Manager api (secretsmanager.{region}.amazonaws.com) is 'open' :
if your task is private you must either have a vpc endpoint (com.amazonaws.{region}.secretsmanager) or a NAT gateway and the task ENI's security group must allow https outbound trafic to it.
if your task is public, the security group must allow https outbound trafic to the outside (or AWS public cidrs).
Solution 7:[7]
I got this problem after translating my Cloudformation file to a Terraform file.
After struggling, I found out that I was missing an outbound rule in my fargate security group. Indeed, AWS automatically creates an "ALLOW ALL" rule but terraform disables it. You need to add to your aws_security_group
:
resource "aws_security_group" "example" {
# ... other configuration ...
egress = [
{
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
]
}
You can check the doc here.
Solution 8:[8]
for my case i tried all of the above solutions and none seemed to be working. it was a very simple mistake but one that others might find useful if none of the answers work for you.
the valueFrom
in the containerDefinition
portion of the task definition json file needs ::
at the end of the value.
i.e. in my case:
{
"containerDefinitions": [{
"secrets": [{
"name": "MY_SECRET",
"valueFrom": "arn:aws:secretsmanager:<region>:<aws_account_id>:secret:<sm_resource_name>:MY_SECRET"
}]
}]
}
correct format was:
{
"containerDefinitions": [{
"secrets": [{
"name": "MY_SECRET",
"valueFrom": "arn:aws:secretsmanager:<region>:<aws_account_id>:secret:<sm_resource_name>:MY_SECRET::"
}]
}]
}
note the extra ::
at the end of the correct solution valueFrom
.
Solution 9:[9]
I was having the exact same issue using Fargate as the launch type with the platform version 1.4.0
. At the end, since I was using public subnets, all I needed to do was to enable the assignment of public ip to the tasks in order to allow the task to have outbound network access to pull the image.
I got the hint to solve it when I tried to create the service with using the platform version 1.3.0
and the task creation failed with a similar but fortunately documented error.
Solution 10:[10]
I resolved a similar problem by updating rules in ECS Service's Security Group. Below rules configuration.
Inbound Rules:
* HTTP TCP 80 0.0.0.0/0
Outbound Rules:
* All traffic All All 0.0.0.0/0
Solution 11:[11]
This has burned me sufficiently well today that I figured I'd share my experience, since it differs from most all the above (AWS Employee's answer covers it technically, but doesn't spell the problem out).
If all the following are true:
- You're running platform 1.4.0 (or, newer presumably - at the time of writing, 1.4.0 is the latest)
- You're in a VPC environment
- Your VPC, for "reasons", runs its own DNS (i.e. not at VPC_BASE+2)
- For "reasons", you don't allow all outbound traffic, so you're setting egress rules on your task security group
And consequently, you have endpoints for all the things, then the following must also be true:
- Your homegrown DNS will need to be able correctly resolve the private addresses of the endpoints (for instance, using VPC_BASE+2, but how doesn't matter)
- You will also need to make sure your task security group has rules allowing DNS traffic to your DNS server(s) <-- This one burned me.
To add insult to the injury, what little error information you get out of Fargate doesn't really indicate that you have a DNS issue, and naturally your CloudTrails won't show a damn thing either, since nothing ends up hitting the API to start with.
Solution 12:[12]
I had to auto-assign public IP.
To do so from the console, when running the task, ...
... I had to select "ENABLED" for "Auto-assign public IP".
Solution 13:[13]
The service's security group needs outbound access on port 443 (outbound access on all ports will work for this). Without this, it can't access Secrets Manager.
Solution 14:[14]
It should be mostly due to the outbound restriction in your security groups(in case of public subnet).
Making the TCP port open will help you to resolve the same.
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth
Solution 15:[15]
I had this issue, and eventually sorted it out.
My solution below is to:
- Set up the ECS in private subnet
- Add AWS PrivateLink endpoints in VPC
Post my CDK code here for reference. I pasted some documentation links in the function comments for you to better understand its purpose.
This is the EcsStack:
export class EcsStack extends Stack {
constructor(scope: cdk.App, id: string, props: EcsStackProps) {
super(scope, id, props);
this.createOrderServiceCluster(props.vpc);
}
private createOrderServiceCluster(serviceVpc:ec2.IVpc) {
const ecsClusterName = "EcsClusterOfOrderService";
const OrderServiceCluster = new ecs.Cluster(this, ecsClusterName, {
vpc: serviceVpc,
clusterName: ecsClusterName
});
// Now ApplicationLoadBalancedFargateService just pick a randeom private subnet.
// https://github.com/aws/aws-cdk/issues/8621
new ecs_patterns.ApplicationLoadBalancedFargateService(this, "FargateOfOrderService", {
cluster: OrderServiceCluster, // Required
cpu: 512, // Default is 256
desiredCount: 1, // Default is 1
taskImageOptions: {
image: ecs.ContainerImage.fromRegistry("12345.dkr.ecr.us-east-1.amazonaws.com/comics:user-service"),
taskRole: this.createEcsTaskRole(),
executionRole: this.createEcsExecutionRole(),
containerPort: 8080
},
memoryLimitMiB: 2048, // Default is 512
// creates a public-facing load balancer that we will be able to call
// from curl or our web browser. This load balancer will forward calls
// to our container on port 8080 running inside of our ECS service.
publicLoadBalancer: true // Default is false
});
}
/**
* This IAM role is the set of permissions provided to the ECS Service Team to execute ECS Tasks on your behalf.
* It is NOT the permissions your application will have while executing.
* https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html
* @private
*/
private createEcsExecutionRole() : iam.IRole {
const ecsExecutionRole = new iam.Role(this, 'EcsExecutionRole', {
//assumedBy: new iam.ServicePrincipal(ecsTasksServicePrincipal),
assumedBy: new iam.ServicePrincipal("ecs-tasks.amazonaws.com"),
roleName: "EcsExecutionRole",
});
ecsExecutionRole.addManagedPolicy(iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonEC2ContainerRegistryReadOnly'));
ecsExecutionRole.addManagedPolicy(iam.ManagedPolicy.fromAwsManagedPolicyName('CloudWatchLogsFullAccess'));
return ecsExecutionRole;
}
/**
* Creates the IAM role (with all the required permissions) which will be used by the ECS tasks.
* https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html
* @private
*/
private createEcsTaskRole(): iam.IRole {
const ecsTaskRole = new iam.Role(this, 'OrderServiceEcsTaskRole', {
//assumedBy: new iam.ServicePrincipal(ecsTasksServicePrincipal),
assumedBy: new iam.ServicePrincipal("ecs-tasks.amazonaws.com"),
roleName: "OrderServiceEcsTaskRole",
});
ecsTaskRole.addManagedPolicy(iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonEC2ContainerRegistryReadOnly'));
ecsTaskRole.addManagedPolicy(iam.ManagedPolicy.fromAwsManagedPolicyName('CloudWatchLogsFullAccess'));
ecsTaskRole.addManagedPolicy(iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonS3ReadOnlyAccess'));
return ecsTaskRole;
}
}
This is code snippet of the VpcStack:
export class VpcStack extends Stack {
readonly coreVpc : ec2.Vpc;
constructor(scope: cdk.App, id: string) {
super(scope, id);
this.coreVpc = new ec2.Vpc(this, "CoreVpc", {
cidr: '10.0.0.0/16',
natGateways: 1,
enableDnsHostnames: true,
enableDnsSupport: true,
maxAzs: 3,
subnetConfiguration: [
{
cidrMask: 28,
name: 'Public',
subnetType: ec2.SubnetType.PUBLIC,
},
{
cidrMask: 24,
name: 'Private',
subnetType: ec2.SubnetType.PRIVATE,
}
]
});
this.setupInterfaceVpcEndpoints();
}
/**
* Builds VPC endpoints to access AWS services without using NAT Gateway.
* @private
*/
private setupInterfaceVpcEndpoints(): void {
// Allow ECS to pull Docker images without using NAT Gateway
// https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html
this.addInterfaceEndpoint("ECRDockerEndpoint", ec2.InterfaceVpcEndpointAwsService.ECR_DOCKER);
this.addInterfaceEndpoint("ECREndpoint", ec2.InterfaceVpcEndpointAwsService.ECR);
this.addInterfaceEndpoint("SecretManagerEndpoint", ec2.InterfaceVpcEndpointAwsService.SECRETS_MANAGER);
this.addInterfaceEndpoint("CloudWatchEndpoint", ec2.InterfaceVpcEndpointAwsService.CLOUDWATCH);
this.addInterfaceEndpoint("CloudWatchLogsEndpoint", ec2.InterfaceVpcEndpointAwsService.CLOUDWATCH_LOGS);
this.addInterfaceEndpoint("CloudWatchEventsEndpoint", ec2.InterfaceVpcEndpointAwsService.CLOUDWATCH_EVENTS);
this.addInterfaceEndpoint("SSMEndpoint", ec2.InterfaceVpcEndpointAwsService.SSM);
}
private addInterfaceEndpoint(name: string, awsService: ec2.InterfaceVpcEndpointAwsService): void {
const endpoint: ec2.InterfaceVpcEndpoint = this.coreVpc.addInterfaceEndpoint(`${name}`, {
service: awsService
});
endpoint.connections.allowFrom(ec2.Peer.ipv4(this.coreVpc.vpcCidrBlock), endpoint.connections.defaultPort!);
}
}
Solution 16:[16]
Go to Task Definitions > Update Task Definition. In the Task Role dropdown select ecsTaskExecutionRole
.
You need to modify this ecsTaskExecutionRole
in IAM settings to include the following permissions:
- SecretsManagerReadWrite
- CloudWatchFullAccess
- AmazonSSMFullAccess
- AmazonECSTaskExecutionRolePolicy
Then create your new task definition and should work.
Solution 17:[17]
If you are placing the tasks in a private subnet you might need to add Inbound and Outbound rules to allow traffic to the associated ACL.
Solution 18:[18]
If your Fargate is running in a private subnet with no access to internet, technically within your vpc should already have dkr vpc endpoint in place such that your Fargate (ver 1.3 and below) could reach to that endpoint and spin up the container. For ver 1.4 of Fargate, just need additional api ecr endpoint.
https://aws.amazon.com/blogs/containers/aws-fargate-launches-platform-version-1-4/
Solution 19:[19]
I just had this issue and the reason I was getting it was because I forgot to add inbound and outbound rules to the security group associated with my service. (added inbound from my ALB and outbound *)
Solution 20:[20]
for me it was a combination of not having secretsmanagerreadwrite policy attached to my IAM role (thanks Jinkko); AND not having public ip enabled on the compute instance (to get to the ECR repo)
Solution 21:[21]
In the ecsTaskExecutionRole => ECS-SecretsManager-Permission policy
make sure your region-specific Secret is added with the correct Access Level. Sometimes if you are working on a multi-region setup with the Secret created in one region then cloned it to another region, you still have to add it to ecsTaskExecutionRole => ECS-SecretsManager-Permission to make it accessible to your regional ECS.
Solution 22:[22]
For me it was incorrect secret ARNs referenced in my task role.
Solution 23:[23]
How to do "Launch tasks in a private subnet that has a VPC routing table configured to route outbound traffic via a NAT gateway in a public subnet. This way the NAT gateway can open a connection to ECR on behalf of the task" :
Assumptions of this solution:
- You have docker image in ECR repository
- You have an IAM role with the permissions, AmazonECSTaskExecutionRolePolicy
- You also want your task to use the same IP address. I have marked this optional if you do not need this part.
Solution:
- Create new cluster
- AWS > ECS > Clusters > Create cluster > Networking only > check box to create VPC > Create
- Create new task definition
- AWS > ECS > Task Definitions > Create new task definition > Fargate
- Add container > Image* field should contain Image URI from ECR
- AWS > ECS > Task Definitions > Create new task definition > Fargate
- Create Elastic IP address (OPTIONAL, ONLY IF YOU WANT CONSISTENT IP OUTPUT, LIKE IF USING PROXY SERVICE)
- AWS > VPC > Elastic IPs > Allocate Elastic IP address > Create
- Whitelist this IP on whatever service Fargate is going to try and access
- Create NAT gateway
- AWS > VPC > NAT Gateways > Create NAT gateway
- Choose auto-created subnet
- Connectivity type: Public
- ^Since you made it public on a subnet this is what is meant by "NAT gateway in a public subnet"
- (OPTIONAL) Select Elastic IP from dropdown
- AWS > VPC > NAT Gateways > Create NAT gateway
- Route public subnets to use internet gateway
- AWS > VPC > Route tables > find one w/ public subnets auto-created in step 1 > click on Route table ID > Edit routes > Add route > Destination is 0.0.0.0/0, Target is igw-{internet-gateway-autocreated-in-step-1}
- ^This is what allows the VPC to actually access the internet at all
- Create subnet
- AWS > VPC > Subnets > Create subnet > select auto-created VPC in step 1, for IPv4 if you're confused just put 10.0.0.0/24 > Add new subnet
- Route newly created subnet (in step 6) to use NAT
- AWS > VPC > Route tables > find one w/ subnet created in step 6 > click on Route table ID > Edit routs > Add route > Destination: 0.0.0.0/0, Target: nat-{nat-gateway-created-in-step-4}
- ^This is what is meant by "private subnet that has a VPC routing table configured to route outbound traffic via a NAT gateway"
- Run the Fargate task
- AWS > ECS > Clusters > your cluster > Run new Task
- Launch type: Fargate
- Task definition: your task
- Cluster: your cluster
- Cluster VPC: your VPC
- Subnet: subnet you created, NOT the auto-created ones
- Auto-assign public IP: this depends on if you are using an Elastic IP. If you did do that, then this should be disabled. If you did not allocate an Elastic IP address, then this should be enabled.
- Run task
Solution 24:[24]
For me I have a VPC with public and private subnets and nat gateway between public and private subnets. When I was trying to access secrets the service had to be launched in private subnets. Secret retrieval doesn't work in public subnets unless you have setup vpc endpoints. Works fine in private subnets using Fargate 1.4 version.
Solution 25:[25]
For me, my problem was that the NAT gateway I had configured for my private subnet was incorrectly configured as a private NAT gateway. Oops. Changing to a public NAT gateway and updating route tables resolved my problem
Solution 26:[26]
After checking everything on this AWS support page: https://aws.amazon.com/premiumsupport/knowledge-center/ecs-unable-to-pull-secrets/ and the other popular answers here, one more thing to check is that your secret that is being retrieved actually has a value set.
When using Secrets Manager, if your ECS Task is attempting to retrieve a secret that has been created but does not have a value set, then you will also receive this kind of error.
Setting a value for the secret will resolve this particular problem.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow