I’m currently running Concourse for my team at work. It’s a two instance deployment, one web and one worker EC2 instance.

Most of the pipelines we write do stuff that require permissions to various AWS services. We needed to figure out a way to grant permissions to these AWS services to our pipelines.

Our first thought was to create an IAM role and assign that role to the Concourse Worker’s EC2 instance. Done! Every pipeline instantly had the permissions needed.

This worked as Proof-of-Concept when I was initally getting Concourse setup at work and wanted to test out some simple workflows. It did break down in our production environment though because our production workloads are spread across multiple AWS accounts and the Concourse worker can only have one IAM role assigned to it.

At this point I thought of two ways to solve this multi-aws accounts permissions issue:

  1. Create a Concourse Worker in each AWS account
  2. Create an IAM role in each AWS account that tasks on the single Concourse worker can assume

We went with option 2 because it was:

  • The cheapest; spinning up a bunch of EC2 instances costs $$$$$
  • More modular; we can make more granular roles since our pipelines always have to assume roles
  • Pipelines don’t depend on Worker’s that have certain attributes

To make it easier to assume IAM roles we did have the Concourse Worker assigned a role that would allow it to assume other IAM Roles. This is a very minimal requirement in my opinion though and still forces us to think about the permissions each pipeline, and not worry about the condition the Worker the pipeline is running on.

IAM Roles Setup

We created an IAM Role called concourse-worker. This role was assigned to the Concourse Worker’s EC2 instance. It’s set of permissions looked like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resources": ["arn:aws:iam::*:role/worker-permissions"]
    }
  ]
}

This role allows processes running on that worker to assume any IAM role called concourse-permissions in any AWS account. Let’s look at the setup for that role next.

These next set of IAM roles we called worker-permissions. We created one in each AWS account that we want Concourse to have acccess to. The worker-permissions role had a set of IAM permissions and most importantly, a Trust Policy that only allows the concourse-worker role to assume it. The Trust Policy looks like this (where 111111111111 is the AWS Account ID that the concourse-worker role is in):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Principal": {
        "AWS": "arn:aws:iam::111111111111:role/concourse-worker"
      }
    }
  ]
}

Assuming Roles Inside Pipelines

Now that our IAM roles are setup we can figure out how to assume these IAM roles. On my team, we mainly use bash and go so we created some helper functions and pipeline semantics to make it easy to assume roles in any tasks that we create. First, the pipeline semantics.

To assume an IAM role we need one piece of information: The role’s ARN.

On any task that we make you can add an aws_assume_role_arn environment variable set to the ARN of one of the worker-permissions roles. That env var will then be picked up by the helper functions (see below) which will try to assume the role.

We also decided to expose a variable to allow us to set how long the credentials for the assumed role should be good for. By default AWS sets a duration of 15 minutes. We decided to override this default and set it to one hour in our helper functions. This can be overridden by setting the env var aws_assume_role_duration, which assumes a value in seconds.

In practice, our pipeline YAML for a single task looks like this:

jobs:
- name: my-job
  plan:
  - ...<get steps>
  - task: install-foobar
    params:
      aws_assume_role_arn: arn:aws:iam::222222222222:role/worker-permissions

We don’t actually store the role ARN in the pipeline config, as shown above. We store the ARN in our secret manager for Concourse (Vault) and reference the path in Vault in our pipelines. So the env var ends up look like aws_assume_role_arn: ((roles/worker-permissions)). See the Concourse docs for how to reference Vault secrest in your pipelines: https://concourse-ci.org/vault-credential-manager.html

Assume Role Using bash

The aws CLI is required in order for this helper script to work.

In a file called aws-auth.sh we have the following script:

#!/usr/bin/env bash

set -euo pipefail

export AWS_PAGER=""

if [[ -z "${aws_assume_role_arn}" ]]; then
    echo "aws_assume_role_arn not provided. Please provide an IAM role for the task to assume."
    exit 1
fi

if [[ -z "${aws_assume_role_duration:-}" ]]; then
    echo "aws_assume_role_duration not provided. Defaulting to 1hr (3600 seconds) session length"
fi

# auth and assume the role
export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" \
    $(aws sts assume-role \
    --role-arn "${aws_assume_role_arn}"\
    --role-session-name "concourse-task" \
    --duration-seconds "${aws_assume_role_duration:-3600}" \
    --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
    --output text))

Near the top of all our bash scripts we then source this script in order to assume the role.

Assume Role Using go

We have a function that looks like the following that will assume Roles. An aws.Config is passed back to the caller and the standard set of AWS credential env vars are also set in case we spawn a separate process that needs permissions as well.

package helpers

import (
	"context"
	"log"
	"os"
    "strconv"
	"time"

	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/credentials/stscreds"
	"github.com/aws/aws-sdk-go-v2/service/sts"
)

func AssumeIamRole() aws.Config {
    roleArn := os.Getenv("aws_assume_role_arn")
	if roleArn == "" {
		log.Fatal("aws_assume_role_arn not provided. Please provide an IAM role for the task to assume.")
	}

    ssessionDuration := 1 * time.Hour
    readDuration := os.Getenv("aws_assume_role_duration")
	if readDuration != "" {
        i, err := strconv.Atoi(readDuration)
        if err != nil {
            log.Fatal("error aws_assume_role_duration, might not be an int:", err)
        }
        ssessionDuration = time.Duration(i) * time.Secon
	} else {
        log.Println("aws_assume_role_duration not provided. Defaulting to 1hr (3600 seconds) session length")
    }

	cfg, err := config.LoadDefaultConfig(context.Background(),
		config.WithRegion(awsRegion),
	)
	if err != nil {
		log.Fatal("error loading aws config: ", err)
	}

	stsClient := sts.NewFromConfig(cfg)
	provider := stscreds.NewAssumeRoleProvider(stsClient, roleArn,
		func(aro *stscreds.AssumeRoleOptions) {
			aro.Duration = ssessionDuration
		},
	)

	cfg.Credentials = aws.NewCredentialsCache(provider)
	creds, err := cfg.Credentials.Retrieve(context.Background())
	if err != nil {
		log.Fatal("failed to authenticate to AWS with IAM role:", roleArn)
	}

    // in case we spawn processes that also need permissions
	os.Setenv("AWS_ACCESS_KEY_ID", creds.AccessKeyID)
	os.Setenv("AWS_SECRET_ACCESS_KEY", creds.SecretAccessKey)
	os.Setenv("AWS_SESSION_TOKEN", creds.SessionToken)

	return cfg
}

IMDSv2 And Hop Limit

By default we’re using IMDSv2 on our EC2 instances. In order for processes running inside containers that Concourse spins up to successfully access the IMDSv2 endpoints, we had to set the hop limit on the instance to 2.