EKS fluent-bit unable to assume AWS role from service account - amazon-web-services

I'm going mad over a fluent bit DaemonSet installed via Helm in EKS on Account AWS yyyyyyy unable to send data to Kinesis in AWS account xxxxxxxxxx.
It looks like EKS does not have OIDC provider on IAM but it's false! Can you help?
fluent bit logs:
[2022/06/29 15:22:34] [debug] [output:kinesis_firehose:kinesis_firehose.0] firehose:PutRecordBatch: events=157, payload=71245 bytes
[2022/06/29 15:22:34] [debug] [output:kinesis_firehose:kinesis_firehose.0] Sending log records to delivery stream kinesis_backend
[2022/06/29 15:22:34] [debug] [http_client] not using http_proxy for header
[2022/06/29 15:22:34] [debug] [aws_credentials] Requesting credentials from the EC2 provider..
[2022/06/29 15:22:34] [debug] [input:tail:tail.0] inode=19100461 events: IN_MODIFY
[2022/06/29 15:22:34] [debug] [input chunk] update output instances with new chunk size diff=693
[2022/06/29 15:22:34] [debug] [input:tail:tail.0] inode=19100461 events: IN_MODIFY
[2022/06/29 15:22:34] [debug] [http_client] server firehose.eu-west-1.amazonaws.com:443 will close connection #74
[2022/06/29 15:22:34] [debug] [aws_client] firehose.eu-west-1.amazonaws.com: http_do=0, HTTP Status: 400
[2022/06/29 15:22:34] [error] [aws_client] auth error, refreshing creds
[2022/06/29 15:22:34] [debug] [aws_credentials] Refresh called on the env provider
[2022/06/29 15:22:34] [debug] [aws_credentials] Refresh called on the profile provider
[2022/06/29 15:22:34] [debug] [aws_credentials] Reading shared config file.
[2022/06/29 15:22:34] [debug] [aws_credentials] Shared config file /root/.aws/config does not exist
[2022/06/29 15:22:34] [debug] [aws_credentials] Reading shared credentials file.
[2022/06/29 15:22:34] [error] [aws_credentials] Shared credentials file /root/.aws/credentials does not exist
[2022/06/29 15:22:34] [debug] [aws_credentials] Refresh called on the EKS provider
[2022/06/29 15:22:34] [debug] [aws_credentials] Calling STS..
[2022/06/29 15:22:34] [debug] [http_client] not using http_proxy for header
[2022/06/29 15:22:34] [debug] [http_client] server sts.eu-west-1.amazonaws.com:443 will close connection #74
[2022/06/29 15:22:34] [debug] [aws_client] sts.eu-west-1.amazonaws.com: http_do=0, HTTP Status: 400
[2022/06/29 15:22:34] [debug] [aws_client] Unable to parse API response- response is not valid JSON.
[2022/06/29 15:22:34] [debug] [aws_credentials] STS raw response:
<ErrorResponse xmlns="https://sts.amazonaws.com/doc/2011-06-15/">
<Error>
<Type>Sender</Type>
<Code>InvalidIdentityToken</Code>
<Message>No OpenIDConnect provider found in your account for https://oidc.eks.eu-west-1.amazonaws.com/id/AAAAAAAAAAAAAAAAAA</Message>
</Error>
<RequestId>c517249d-c018-43c3-a712-d0e5080ded86</RequestId>
</ErrorResponse>
fluent-bit service account in namespace newrelic (created by fluentbit Helm chart)
kubectl -n newrelic describe sa fluent-bit
Name: fluent-bit
Namespace: newrelic
Labels: app.kubernetes.io/instance=fluent-bit
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=fluent-bit
app.kubernetes.io/version=1.9.4
helm.sh/chart=fluent-bit-0.20.2
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::xxxxxxxxxx:role/kinesis-write
meta.helm.sh/release-name: fluent-bit
meta.helm.sh/release-namespace: newrelic
Policy permissions attached to role arn:aws:iam::xxxxxxxxxx:role/kinesis-write
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"firehose:PutRecord",
"firehose:PutRecordBatch"
],
"Resource": "arn:aws:firehose:region:xxxxxxxxxx:deliverystream/kinesis-backend"
}
]
}
Role arn:aws:iam::xxxxxxxxxx:role/kinesis-write trusted relationships (I included OIDC Provider for my EKS cluster on account yyyyyyyyyy)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::yyyyyyyyy:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/id/AAAAAAAAAAAAAAAAAA"
},
"Action": [
"sts:AssumeRole",
"sts:AssumeRoleWithWebIdentity"
],
"Condition": {
"StringEquals": {
"oidc.eks.eu-west-1.amazonaws.com/id/AAAAAAAAAAAAAAAAAA:sub": "system:serviceaccount:newrelic:fluent-bit"
}
}
}
]
}

Related

Can not connect to AWS Amplify PubSub -> Socket error:undefined

I've been trying out all ways to get the Amplify/PubSub working without any luck. It seems all the documentation are rather outdated.
Here is what I have done so far. Please note all hashes are made up ;-)
Created a fresh React Native app
Installed amplify packages
Installed Amplify CLI
Invoked $ amplify configure
Invoked $ amplify init
Invoked $ amplify add auth
Invoked $ amplify push, which created the aws-exports.js object
Created a super simple component
import React from 'react';
import { View } from 'react-native';
import { withAuthenticator } from 'aws-amplify-react-native';
import Amplify, { Auth, PubSub } from 'aws-amplify';
import { AWSIoTProvider } from '#aws-amplify/pubsub/lib/Providers';
import awsmobile from './aws-exports';
Amplify.configure({
Auth: {
mandatorySignIn: true,
region: awsmobile.aws_cognito_region,
userPoolId: awsmobile.aws_user_pools_id,
identityPoolId: awsmobile.aws_cognito_identity_pool_id,
userPoolWebClientId: awsmobile.aws_user_pools_web_client_id,
},
Analytics: {
disabled: true,
},
});
Amplify.addPluggable(
new AWSIoTProvider({
aws_pubsub_region: 'ap-southeast-2',
aws_pubsub_endpoint:
'wss://a123456789d-ats.iot.ap-southeast-2.amazonaws.com/mqtt',
})
);
Amplify.Logger.LOG_LEVEL = 'DEBUG';
class App extends PureComponent {
componentDidMount() {
if (this.props.authState === 'signedIn') {
Auth.currentCredentials().then((creds) => {
// get the principal that needs to be attached to policy
console.log('principal to be attached', creds.identityId)
PubSub.subscribe('topic_1').subscribe({
next: (data) => console.log(JSON.stringify(data, null, 2)),
error: (msg) => console.log('ERROR: ', msg.error),
close: () => console.log('Done'),
});
});
}
}
render() {
return (
<View/>
);
}
}
export default withAuthenticator(App);
I attached AWS root certificate to my iPhone (see below)
10.Create a IAM policy for AWS IoT
IoTAppPolicy
iot:*
arn:aws:iot:ap-southeast-2:1234567890:*
11.Attach the principal I got from Auth.currentCredentials to the policy
aws iot attach-principal-policy --policy-name IoTAppPolicy --principal ap-southeast-2:db1234bc-5678-90123-4567-89ae0e123b4
12.Attach Policies to Auth Role
AWSIoTDataAccess
AWSIoTConfigAccess
Yet, when I run the app I get the following error log
[DEBUG] 51:42.745 SignIn - Sign In for test#test.com
[DEBUG] 51:44.616 AuthClass CognitoUserSession {idToken: CognitoIdToken, refreshToken: CognitoRefreshToken, accessToken: CognitoAccessToken, clockDrift: 0}
[DEBUG] 51:44.616 Credentials - set credentials from session
[DEBUG] 51:46.247 Credentials - Load credentials successfully
[DEBUG] 51:46.248 AuthClass - succeed to get cognito credentials
DEBUG] 51:47.150 Hub - Dispatching to auth with {event: "signIn", data: CognitoUser, message: "A user 2d...de has been signed in"}
[DEBUG] 51:47.151 SignIn CognitoUser {username: "2d...de", pool: CognitoUserPool, Session: null, client: Client, signInUserSession: CognitoUserSession, …}
[DEBUG] 51:47.152 AuthClass - Getting the session from this user: CognitoUser {username: "2d...de", pool: CognitoUserPool, Session: null, client: Client, signInUserSession: CognitoUserSession, …}
[DEBUG] 51:47.152 AuthClass - Succeed to get the user session CognitoUserSession {idToken: CognitoIdToken, refreshToken: CognitoRefreshToken, accessToken: CognitoAccessToken, clockDrift: 1}
[DEBUG] 51:47.574 AuthPiece - verified user attributes {verified: {…}, unverified: {…}}
[INFO] 51:47.575 Authenticator - Inside handleStateChange method current authState: signIn
[DEBUG] 51:47.575 VerifyContact - no unverified contact
[INFO] 51:47.578 Authenticator - authState has been updated to signedIn
[DEBUG] 51:47.581 AuthClass - getting current credentials
[DEBUG] 51:47.582 Credentials - getting credentials
[DEBUG] 51:47.582 Credentials - picking up credentials
[DEBUG] 51:47.582 Credentials - getting new cred promise
[DEBUG] 51:47.582 Credentials - checking if credentials exists and not expired
[DEBUG] 51:47.583 Credentials - credentials not changed and not expired, directly return
[DEBUG] 51:47.583 AnalyticsClass - Analytics has been disabled
[DEBUG] 51:47.584 PubSub - subscribe options undefined
[DEBUG] 51:47.584 MqttOverWSProvider - Subscribing to topic(s) topic1
[DEBUG] 51:47.584 Credentials - getting credentials
[DEBUG] 51:47.584 Credentials - picking up credentials
[DEBUG] 51:47.584 Credentials - getting new cred promise
[DEBUG] 51:47.585 Credentials - checking if credentials exists and not expired
[DEBUG] 51:47.585 Credentials - are these credentials expired?
[DEBUG] 51:47.585 Credentials - credentials not changed and not expired, directly return
[DEBUG] 51:47.586 Signer {region: "ap-southeast-2", service: "iotdevicegateway"}
[DEBUG] 51:47.590 MqttOverWSProvider - Creating new MQTT client cca4e07f-a15a-46ce-904d-483a83162018
[WARN] 52:50.152 MqttOverWSProvider - cca4e07f-a15a-46ce-904d-483a83162018 {
"errorCode": 7,
"errorMessage": "AMQJS0007E Socket error:undefined.",
"uri": "wss://a123456789d.iot.ap-southeast-2.amazonaws.com/mqtt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAU7VGXF6UOWZIUFOM%2F20200502%2Fap-southeast-2%2Fiotdevicegateway%2Faws4_request&X-Amz-Date=20200502T055147Z&X-Amz-SignedHeaders=host&X-Amz-Signature=010d8..7dba&X-Amz-Security-Token=IQ..lk%3D"
}
ERROR: Disconnected, error code: 7
[DEBUG] 52:50.153 MqttOverWSProvider - Unsubscribing from topic(s) topic1
Any idea why I can't connect to the topic?
Who would have thought that a package was missing.
After installing
yarn add amazon-cognito-identity-js
everything worked fine on my physical iPhone.
I went to the AWS IoT Core page and published a test message from there and voilà I get the following object back
{
"message": "Hello from AWS IoT console"
}

AWS Cognito cookie storage

I'm trying to set up Cognito to use cookies instead of localStorage for credentials so that I can keep the user logged in between domains, e.g. x.foo.com and y.foo.com. The first step is to get it working on localhost but I'm stuck.
The documentation shows a simple config change should do the trick?
The following debug messages are comitted to the console:
[DEBUG] 37:08.223 AuthClass
Object { idToken: {…}, refreshToken: {…}, accessToken: {…}, clockDrift: 0 }
ConsoleLogger.js:87
[DEBUG] 37:08.228 Credentials - No Cache module registered in Amplify ConsoleLogger.js:84
[DEBUG] 37:08.230 Credentials - set credentials from session ConsoleLogger.js:84
[DEBUG] 37:08.230 Credentials - No Cognito Federated Identity pool provided ConsoleLogger.js:84
[DEBUG] 37:08.230 AuthClass - cannot get cognito credentials No Cognito Federated Identity pool provided ConsoleLogger.js:94
[DEBUG] 37:08.231 AuthClass - Failed to get user from user pool ConsoleLogger.js:84
[ERROR] 37:08.232 AuthClass - Failed to get the signed in user No current user
It seems when you specify the cookieStorage config you need to manually apply a cache instance? How do I do that and will it solve the problem?
This config works:
{
region: 'eu-west-1',
userPoolId: 'eu-west-1_XXXXXX',
userPoolWebClientId: 'XXXXXX',
mandatorySignIn: false,
cookieStorage: {
domain: 'localhost',
secure: false,
path: '/',
expires: 365,
},
}
In particular, secure must be false for localhost unless you are using https (Firefox ignores this for localhost, but Chrome and Safari don't).

Unable to create new s3 bucket in terraform

I'm attempting to create a new s3 bucket and getting a conflict though I know the bucket name is new, unique, and has been many hours (8+) since that name was in use. Details attached. I've even tried with a new name that I know was never a bucket in my account (and likely never a bucket).
The name in the logs below is made up and not the one I was using, which was unique and namespaced to my domain.
If I use the aws s3 cli to make the bucket (i.e. aws s3 mb s3://{same-bucket-name} --region us-east-2) where {same-bucket-name} is the name of the bucket I want to create, it works fine.
2019-07-07T00:12:19.463-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: 2019/07/07 00:12:19 [DEBUG] Trying to create new S3 bucket: "my-unique-s3-bucket-name"
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: 2019/07/07 00:12:19 [DEBUG] [aws-sdk-go] DEBUG: Request s3/CreateBucket Details:
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: ---[ REQUEST POST-SIGN ]-----------------------------
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: PUT /my-unique-s3-bucket-name HTTP/1.1
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: Host: s3.us-east-2.amazonaws.com
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: User-Agent: aws-sdk-go/1.20.12 (go1.12.5; darwin; amd64) APN/1.0 HashiCorp/1.0 Terraform/0.12.2
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: Content-Length: 153
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: Authorization: AWS4-HMAC-SHA256 Credential=MYCREDS/20190707/us-east-2/s3/aws4_request, SignedHeaders=content-length;host;x-amz-acl;x-amz-content-sha256;x-amz-date, Signature=b5acd2dbcaf09eda51b4ea8448f1991d26c8eb8249a85e7ac28044864df377b9
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: X-Amz-Acl: public-read
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: X-Amz-Content-Sha256: 70cae86320841ea73b0bdc759f99920c7caa405e61af2742575750c6586272c9
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: X-Amz-Date: 20190707T041219Z
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: Accept-Encoding: gzip
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4:
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: <CreateBucketConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><LocationConstraint>us-east-2</LocationConstraint></CreateBucketConfiguration>
2019-07-07T00:12:19.464-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: -----------------------------------------------------
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: 2019/07/07 00:12:19 [DEBUG] [aws-sdk-go] DEBUG: Response s3/CreateBucket Details:
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: ---[ RESPONSE ]--------------------------------------
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: HTTP/1.1 409 Conflict
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: Connection: close
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: Transfer-Encoding: chunked
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: Content-Type: application/xml
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: Date: Sun, 07 Jul 2019 04:12:19 GMT
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: Server: AmazonS3
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: X-Amz-Id-2: v5M1x31BcVCS4DLIgqmCR4KRHipO3ZRbTSXF1PCS9+q9nyT8O5/3s04Z22o8t4x8JZ0HF9HWkO4=
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: X-Amz-Request-Id: 835B636D828335A1
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4:
2019-07-07T00:12:19.697-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4:
2019-07-07T00:12:19.698-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: -----------------------------------------------------
2019-07-07T00:12:19.698-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: 2019/07/07 00:12:19 [DEBUG] [aws-sdk-go] <?xml version="1.0" encoding="UTF-8"?>
2019-07-07T00:12:19.698-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: <Error><Code>OperationAborted</Code><Message>A conflicting conditional operation is currently in progress against this resource. Please try again.</Message><RequestId>835B636D828335A1</RequestId><HostId>v5M1x31BcVCS4DLIgqmCR4KRHipO3ZRbTSXF1PCS9+q9nyT8O5/3s04Z22o8t4x8JZ0HF9HWkO4=</HostId></Error>
2019-07-07T00:12:19.698-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: 2019/07/07 00:12:19 [DEBUG] [aws-sdk-go] DEBUG: Validate Response s3/CreateBucket failed, attempt 0/25, error OperationAborted: A conflicting conditional operation is currently in progress against this resource. Please try again.
2019-07-07T00:12:19.698-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: status code: 409, request id: 835B636D828335A1, host id: v5M1x31BcVCS4DLIgqmCR4KRHipO3ZRbTSXF1PCS9+q9nyT8O5/3s04Z22o8t4x8JZ0HF9HWkO4=
2019-07-07T00:12:19.698-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: 2019/07/07 00:12:19 [WARN] Got an error while trying to create S3 bucket my-unique-s3-bucket-name: OperationAborted: A conflicting conditional operation is currently in progress against this resource. Please try again.
2019-07-07T00:12:19.698-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: status code: 409, request id: 835B636D828335A1, host id: v5M1x31BcVCS4DLIgqmCR4KRHipO3ZRbTSXF1PCS9+q9nyT8O5/3s04Z22o8t4x8JZ0HF9HWkO4=
2019-07-07T00:12:19.698-0400 [DEBUG] plugin.terraform-provider-aws_v2.18.0_x4: 2019/07/07 00:12:19 [TRACE] Waiting 10s before next try
If the bucket did previously exist then there is an indeterminate amount of time before that bucket name is released.
Unfortunately the AWS docs aren't very specific here:
Important
If you want to continue to use the same bucket name, don't delete the
bucket. We recommend that you empty the bucket and keep it. After a
bucket is deleted, the name becomes available to reuse, but the name
might not be available for you to reuse for various reasons. For
example, it might take some time before the name can be reused, and
some other account could create a bucket with that name before you do.
You can talk to AWS support to confirm what's happening (and check that another AWS account doesn't have the bucket) but ultimately you just need to wait. If the S3 bucket matches a domain name that you control and you intend to use it for website hosting and someone else already has that S3 bucket then there is a process for getting that bucket name back to you, just as there is with CloudFront CNAMEs which are also globally unique.
You should also be able to check if the bucket name is available by running the following command:
aws s3api head-bucket --bucket [bucket name]
Ages back when we briefly tried deleting S3 buckets in test environments over night (along with everything else) we would occasionally see this error for over 48 hours while sometimes the bucket name was available again within a few hours. Unfortunately, AWS provide no guarantees here.

Dataflow setting Controller Service Account

I try to set up controller service account for Dataflow. In my dataflow options I have:
options.setGcpCredential(GoogleCredentials.fromStream(
new FileInputStream("key.json")).createScoped(someArrays));
options.setServiceAccount("xxx#yyy.iam.gserviceaccount.com");
But I'm getting:
WARNING: Request failed with code 403, performed 0 retries due to IOExceptions,
performed 0 retries due to unsuccessful status codes, HTTP framework says
request can be retried, (caller responsible for retrying):
https://dataflow.googleapis.com/v1b3/projects/MYPROJECT/locations/MYLOCATION/jobs
Exception in thread "main" java.lang.RuntimeException: Failed to create a workflow
job: (CODE): Current user cannot act as
service account "xxx#yyy.iam.gserviceaccount.com.
Causes: (CODE): Current user cannot act as
service account "xxx#yyy.iam.gserviceaccount.com.
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:791)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:173)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
...
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "(CODE): Current user cannot act as service account
xxx#yyy.iam.gserviceaccount.com. Causes: (CODE): Current user
cannot act as service account xxx#yyy.iam.gserviceaccount.com.",
"reason" : "forbidden"
} ],
"message" : "(CODE): Current user cannot act as service account
xxx#yyy.iam.gserviceaccount.com. Causes: (CODE): Current user
cannot act as service account xxx#yyy.iam.gserviceaccount.com.",
"status" : "PERMISSION_DENIED"
}
Am I missing some Roles or permissions?
Maybe someone is going to find it helpful:
For controller it was: Dataflow Worker and Storage Object Admin (that was found in Google's documentation).
For executor it was: Service Account User.
I've been hitting this error and thought it worth sharing my experiences (partly because I suspect I'll encounter this again in the future).
The terraform code to create my dataflow job is:
resource "google_dataflow_job" "wordcount" {
# https://stackoverflow.com/a/59931467/201657
name = "wordcount"
template_gcs_path = "gs://dataflow-templates/latest/Word_Count"
temp_gcs_location = "gs://${local.name-prefix}-functions/temp"
parameters = {
inputFile = "gs://dataflow-samples/shakespeare/kinglear.txt"
output = "gs://${local.name-prefix}-functions/wordcount/output"
}
service_account_email = "serviceAccount:${data.google_service_account.sa.email}"
}
The error message:
Error: googleapi: Error 400: (c3c0d991927a8658): Current user cannot act as service account serviceAccount:dataflowdemo#redacted.iam.gserviceaccount.com., badRequest
was returned from running terraform apply. Checking out the logs provided a lot more info:
gcloud logging read 'timestamp >= "2020-12-31T13:39:58.733249492Z" AND timestamp <= "2020-12-31T13:45:58.733249492Z"' --format="csv(timestamp,severity,textPayload)" --order=asc
which returned various log records, including this:
Permissions verification for controller service account failed. IAM role roles/dataflow.worker should be granted to controller service account dataflowdemo#redacted.iam.gserviceaccount.com.
so I granted that missing role grant
gcloud projects add-iam-policy-binding $PROJECT \
--member="serviceAccount:dataflowdemo#${PROJECT}.iam.gserviceaccount.com" \
--role="roles/dataflow.worker"
and ran terraform apply again. This time I got the same error in the terraform output but there were no errors to be seen in the logs.
I then followed the advice given at https://cloud.google.com/dataflow/docs/concepts/access-control#creating_jobs to also grant the roles/dataflow.admin:
gcloud projects add-iam-policy-binding $PROJECT \
--member="serviceAccount:dataflowdemo#${PROJECT}.iam.gserviceaccount.com" \
--role="roles/dataflow.admin"
but there was no discernible difference from the previous attempt.
I then tried turning on terraform debug logging which provided this info:
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: ---[ REQUEST ]---------------------------------------
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: POST /v1b3/projects/redacted/locations/europe-west1/templates?alt=json&prettyPrint=false HTTP/1.1
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Host: dataflow.googleapis.com
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: User-Agent: google-api-go-client/0.5 Terraform/0.14.2 (+https://www.terraform.io) Terraform-Plugin-SDK/2.1.0 terraform-provider-google/dev
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Content-Length: 385
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Content-Type: application/json
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: X-Goog-Api-Client: gl-go/1.14.5 gdcl/20201023
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Accept-Encoding: gzip
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5:
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: {
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "environment": {
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "serviceAccountEmail": "serviceAccount:dataflowdemo#redacted.iam.gserviceaccount.com",
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "tempLocation": "gs://jamiet-demo-functions/temp"
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: },
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "gcsPath": "gs://dataflow-templates/latest/Word_Count",
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "jobName": "wordcount",
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "parameters": {
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "inputFile": "gs://dataflow-samples/shakespeare/kinglear.txt",
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "output": "gs://jamiet-demo-functions/wordcount/output"
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: }
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: }
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5:
2020-12-31T16:04:13.129Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: -----------------------------------------------------
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: 2020/12/31 16:04:14 [DEBUG] Google API Response Details:
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: ---[ RESPONSE ]--------------------------------------
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: HTTP/1.1 400 Bad Request
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Connection: close
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Transfer-Encoding: chunked
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Alt-Svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Cache-Control: private
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Content-Type: application/json; charset=UTF-8
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Date: Thu, 31 Dec 2020 16:04:15 GMT
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Server: ESF
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Vary: Origin
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Vary: X-Origin
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: Vary: Referer
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: X-Content-Type-Options: nosniff
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: X-Frame-Options: SAMEORIGIN
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: X-Xss-Protection: 0
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5:
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: 1f9
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: {
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "error": {
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "code": 400,
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "message": "(dbacb1c39beb28c9): Current user cannot act as service account serviceAccount:dataflowdemo#redacted.iam.gserviceaccount.com.",
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "errors": [
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: {
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "message": "(dbacb1c39beb28c9): Current user cannot act as service account serviceAccount:dataflowdemo#redacted.iam.gserviceaccount.com.",
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "domain": "global",
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "reason": "badRequest"
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: }
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: ],
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: "status": "INVALID_ARGUMENT"
orm-provider-google_v3.51.0_x5: }
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: }
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5:
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: 0
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5:
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5:
2020-12-31T16:04:14.647Z [DEBUG] plugin.terraform-provider-google_v3.51.0_x5: -----------------------------------------------------
The error being returned from dataflow.googleapis.com is clearly evident:
Current user cannot act as service account serviceAccount:dataflowdemo#redacted.iam.gserviceaccount.com
At this stage I am puzzled as to why I can see an error being returned from the Google's dataflow API but there is nothing in the GCP logs indicating that an error occurred.
Then tho I had a bit of a lightbulb moment. Why does that error message mention "service account serviceAccount"? Then it hit me, I'd defined the service account incorrectly. Terraform code should have been:
resource "google_dataflow_job" "wordcount" {
# https://stackoverflow.com/a/59931467/201657
name = "wordcount"
template_gcs_path = "gs://dataflow-templates/latest/Word_Count"
temp_gcs_location = "gs://${local.name-prefix}-functions/temp"
parameters = {
inputFile = "gs://dataflow-samples/shakespeare/kinglear.txt"
output = "gs://${local.name-prefix}-functions/wordcount/output"
}
service_account_email = data.google_service_account.sa.email
}
I corrected it and it worked straight away. User error!!!
I then set about removing the various permissions that I'd added:
gcloud projects remove-iam-policy-binding $PROJECT \
--member="serviceAccount:dataflowdemo#${PROJECT}.iam.gserviceaccount.com" \
--role="roles/dataflow.admin"
gcloud projects remove-iam-policy-binding $PROJECT \
--member="serviceAccount:dataflowdemo#${PROJECT}.iam.gserviceaccount.com" \
--role="roles/dataflow.worker"
and terraform apply still worked. However, after removing the grant of role roles/dataflow.worker the job failed with error:
Workflow failed. Causes: Permissions verification for controller service account failed. IAM role roles/dataflow.worker should be granted to controller service account dataflowdemo#redacted.iam.gserviceaccount.com.
so clearly the documentation regarding the appropriate roles to grant (https://cloud.google.com/dataflow/docs/concepts/access-control#creating_jobs) is spot on.
As may be apparent, I started writing this post before I knew what the problem was and I thought it might be useful to document my investigation somewhere. Now that I've finished the investigation and the problem turns out to be one of PEBCAK its probably not so relevant to this thread anymore, and certainly shouldn't be accepted as an answer. Nevertheless, there is probably some useful information in here about how to go about investigating issues with terraform calling Google APIs, and it also reiterates the required role grants, so I'll leave it here in case it ever turns out to be useful.
I just hit this problem again so posting my solution up here as I fully expect I'll get bitten by this again at some point.
I was getting error:
Error: googleapi: Error 403: (a00eba23d59c1fa3): Current user cannot act as service account dataflow-controller-sa#myproject.iam.gserviceaccount.com. Causes: (a00eba23d59c15ac): Current user cannot act as service account dataflow-controller-sa#myproject.iam.gserviceaccount.com., forbidden
I was deploying the dataflow job, via terraform, using a different service account, deployer#myproject.iam.gserviceaccount.com
The solution was to grant that service account the roles/iam.serviceAccountUser role:
gcloud projects add-iam-policy-binding myproject \
--member=serviceAccount:deployer#myproject.iam.gserviceaccount.com \
--role=roles/iam.serviceAccountUser
For those that prefer custom IAM roles over predefined IAM roles the specific permission that was missing was iam.serviceAccounts.actAs.
Issue Got Resolved!
Go to GCP -> Console -> IAM -> ServiceAccount Email -> Add Permission -> Service Account User. as below

Running on EC2 using an instance profile using Terraform

I'm trying to run Terraform on an AWS EC2 instance, which is setup with an instance profile. However, Terraform doesn't seem to be implicitly using the Instance Profile and as such, I'm getting an "access denied" error whenever it tries to access my S3 remote state.
From the docs, I'm having trouble telling if, I am required to specify an AWS_METADATA_URL, or if there's anything else I'm explicitly required to do to make this work.
Per the Terraform docs:
EC2 Role If you're running Terraform from an EC2 instance with IAM
Instance Profile using IAM Role, Terraform will just ask the metadata
API endpoint for credentials.
This is a preferred approach over any other when running in EC2 as you
can avoid hard coding credentials. Instead these are leased on-the-fly
by Terraform which reduces the chance of leakage.
You can provide the custom metadata API endpoint via the
AWS_METADATA_URL variable which expects the endpoint URL, including
the version, and defaults to http://169.254.169.254:80/latest
Here's an example of what I'm trying to run:
# main.tf
provider "aws" {
region = "${var.region}"
}
terraform {
backend "s3" {}
}
module "core" {
// ....
}
# init .sh
terraform init -force-copy -input=false \
-backend-config="bucket=$TERRAFORM_STATE_BUCKET" \
-backend-config="key=$ENVIRONMENT/$SERVICE" \
-backend-config="region=$REGION" \
-upgrade=true
# AWS policy
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"*"
]
},
]
}
UPDATE
It seems the s3 list-objects command is failing in Terraform, though my policies should be allowing this to go through
-----------------------------------------------------
2018/02/20 21:09:37 [DEBUG] [aws-sdk-go] DEBUG: Response s3/ListObjects Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 403 Forbidden
Connection: close
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Tue, 20 Feb 2018 21:09:36 GMT
Server: AmazonS3
X-Amz-Id-2: OVK5E3d5R+Jgj3if5lxAXkwuERPZWsJNFJ7NeMYFbSrhQ/h4FfpV4z2mlgXFKT1Hg7lsqJ/jE6Q=
X-Amz-Request-Id: FE6B77C5C74BCFFF
-----------------------------------------------------
2018/02/20 21:09:37 [DEBUG] [aws-sdk-go] <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>FE6B77C5C74BCFFF</RequestId><HostId>OVK5E3d5R+Jgj3if5lxAXkwuERPZWsJNFJ7NeMYFbSrhQ/h4FfpV4z2mlgXFKT1Hg7lsqJ/jE6Q=</HostId></Error>
2018/02/20 21:09:37 [DEBUG] [aws-sdk-go] DEBUG: Validate Response s3/ListObjects failed, not retrying, error AccessDenied: Access Denied
status code: 403, request id: FE6B77C5C74BCFFF, host id: OVK5E3d5R+Jgj3if5lxAXkwuERPZWsJNFJ7NeMYFbSrhQ/h4FfpV4z2mlgXFKT1Hg7lsqJ/jE6Q=
2018/02/20 21:09:37 [DEBUG] plugin: waiting for all plugin processes to complete...
[31mError inspecting state in "s3": AccessDenied: Access Denied
status code: 403, request id: FE6B77C5C74BCFFF, host id: OVK5E3d5R+Jgj3if5lxAXkwuERPZWsJNFJ7NeMYFbSrhQ/h4FfpV4z2mlgXFKT1Hg7lsqJ/jE6Q=