'Azure devops pipeline terraform error - 403 when attempting role assignment
I'm attempting to deploy an aks cluster and role assignment for the system assigned managed identity that is created via terraform but I'm getting a 403 response
azurerm_role_assignment.acrpull_role: Creating...
╷
│ Error: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '626eac40-c9dd-44cc-a528-3c3d3e069e85' with object id '626eac40-c9dd-44cc-a528-3c3d3e069e85' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/7b73e02c-dbff-4eb7-9d73-e73a2a17e818/resourceGroups/myaks-rg/providers/Microsoft.ContainerRegistry/registries/aksmattcloudgurutest/providers/Microsoft.Authorization/roleAssignments/c144ad6d-946f-1898-635e-0d0d27ca2f1c' or the scope is invalid. If access was recently granted, please refresh your credentials."
│
│ with azurerm_role_assignment.acrpull_role,
│ on main.tf line 53, in resource "azurerm_role_assignment" "acrpull_role":
│ 53: resource "azurerm_role_assignment" "acrpull_role" {
│
╵
This is only occurring in an Azure Devops Pipeline. My pipeline looks like the following...
trigger:
- main
pool:
vmImage: ubuntu-latest
steps:
- task: TerraformInstaller@0
inputs:
terraformVersion: '1.0.7'
- task: TerraformCLI@0
inputs:
command: 'init'
workingDirectory: '$(System.DefaultWorkingDirectory)/Shared/Pipeline/Cluster'
backendType: 'azurerm'
backendServiceArm: 'Matt Local Service Connection'
ensureBackend: true
backendAzureRmResourceGroupName: 'tfstate'
backendAzureRmResourceGroupLocation: 'UK South'
backendAzureRmStorageAccountName: 'tfstateq7nqv'
backendAzureRmContainerName: 'tfstate'
backendAzureRmKey: 'terraform.tfstate'
allowTelemetryCollection: true
- task: TerraformCLI@0
inputs:
command: 'plan'
workingDirectory: '$(System.DefaultWorkingDirectory)/Shared/Pipeline/Cluster'
environmentServiceName: 'Matt Local Service Connection'
allowTelemetryCollection: true
- task: TerraformCLI@0
inputs:
command: 'validate'
workingDirectory: '$(System.DefaultWorkingDirectory)/Shared/Pipeline/Cluster'
allowTelemetryCollection: true
- task: TerraformCLI@0
inputs:
command: 'apply'
workingDirectory: '$(System.DefaultWorkingDirectory)/Shared/Pipeline/Cluster'
environmentServiceName: 'Matt Local Service Connection'
allowTelemetryCollection: false
I'm using the terraform tasks from here - https://marketplace.visualstudio.com/items?itemName=charleszipp.azure-pipelines-tasks-terraform
This is my terraform file
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=2.46.0"
}
}
}
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "TerraformCluster" {
name = "terraform-cluster"
location = "UK South"
}
resource "azurerm_kubernetes_cluster" "TerraformClusterAKS" {
name = "terraform-cluster-aks1"
location = azurerm_resource_group.TerraformCluster.location
resource_group_name = azurerm_resource_group.TerraformCluster.name
dns_prefix = "terraform-cluster-aks1"
network_profile {
network_plugin = "azure"
}
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_D2_v2"
}
identity {
type = "SystemAssigned"
}
tags = {
Environment = "Production"
}
}
data "azurerm_container_registry" "this" {
depends_on = [
azurerm_kubernetes_cluster.TerraformClusterAKS
]
provider = azurerm
name = "aksmattcloudgurutest"
resource_group_name = "myaks-rg"
}
resource "azurerm_role_assignment" "acrpull_role" {
scope = data.azurerm_container_registry.this.id
role_definition_name = "AcrPull"
principal_id = azurerm_kubernetes_cluster.TerraformClusterAKS.identity[0].principal_id
}
Where am I going wrong here?
Solution 1:[1]
The Service Principal in AAD associated with the your ADO Service Connection ('Matt Local Service Connection') will need to be assigned the Owner role at the scope of the resource, or above (depending on where else you will be assigning permissions). You can read details about the various roles here the two most commonly used roles are Owner and Contributor, the key difference being that Owner allows for managing role assignments.
As part of this piece of work, you should also familiarize yourself with the principle of least privilege (if you do not already know about it). How it would apply in this case would be; if the Service Principal only needs Owner at the Resource level, then don't assign it Owner at the Resource Group or Subscription Level just because that is more convenient, you can always update the scope later on but it is much harder to undo any damage (assuming a malicious or inexperienced actor) on overly permissive role assignment after it has been exploited.
Solution 2:[2]
I tried everything to get this existing storage service to re-connect to the Azure Devops Pipeline to enable Terraform deployments.
Attempted and did not work: Break lease on tf state, remove tf state, update lease on tfstate, inline commands in ADO via powershell and bash to Purge terraform, re-install the Terraform plugin etc. etc. etc.)
What worked: What ended up working is to create a new storage account with a new storage container and new SAS token.
This worked to overcome the 403 forbidden error on the access of the Blob containing the TFState in ADLS from Azure Devops for Terraform deployments. This does not explain the how or the why, access controls / iam / access policies did not change. A tear down and recreate of the storage containing the TFState with the exact same settings under a difference storage account name worked.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Matt Stannett |
Solution 2 | Julian Wise |