Google Cloud : There were concurrent policy changes. Please retry the whole read-modify-write with exponential backoff - google-cloud-platform

POST METHOD URL
https://cloudresourcemanager.googleapis.com/v1/projects/project-name:setIamPolicy
Request:
{
"resource": "projects/project-name",
"policy": {
"bindings": [
{
"role": "roles/resourcemanager.organizationAdmin",
"members": [
"user:test12345678#domain.com"
]
}
],
"etag": "BwWWja0YfJA=",
"version": 3
}
}
Response:
{
"error": {
"code": 409,
"message": "There were concurrent policy changes. Please retry the
whole read-modify-write with exponential backoff.",
"status": "ABORTED" }
}

Documentation recommends using the read-modify-write pattern to update policy for a resource.
Reading the current policy by calling getIamPolicy().
Editing the returned policy, either by using a text editor or programmatically, to add or remove any desired members and their role grants.
Writing the updated policy by calling setIamPolicy().
Looks like in your case the policy you're trying to set and the policy that is currently active on the resource have diverged. One of the ways this can happen is if you did:
getIamPolicy() > policy.json
addIamPolicyBinding() or removeIamPolicyBinding()
setIamPolicy() policy.json
The policy version on the resource after #2, is out of sync with what #3 is trying to set, and so it throws an exception. To confirm you can compare the etag field in the policy your strying to set with the etag currently on the resource. There should be a mismatch.

This means that more than one change was performed at the same time. You should try to perform only one request to change policies at the same time.
Implementing Exponential backoff should help you with this error. It is as simple as handle your request retry with a time magnitude of n+1 + random_number_milliseconds seconds and retry the request

I was able to fix this issue by removing
"etag": "BwWWja0YfJA=",
"version": 3
from the template when using gcloud projects set-iam-policy command. It will ask you to overwrite the existing policy before committing the changes

Related

gcp: events are not consistent for resource type. "operation" is missing for buckets

According to following GCP link
standards event structure in json should have operation details. But found that for storage bucket operation entry is missing in log to identify it as last action occurred.
"operation": {
object (LogEntryOperation)
}
Other resource.type = firewall rule
"operation": {
"id": "operation-xxxxxxxxxxxxxxxxxxxxxx",
"producer": "compute.googleapis.com",
"last": true
}
How to get operation details as mandatory object to be received in events?
If GCP doesn't support operation:{} in events consistently any evidence would be helpful.
"last" is an optional field and may not be populated. This cannot be enforced.
You can try enabling bucket logging to gather more information about events regarding your storage or create a Feature Request at Public Issue Tracker to have this option added in future.

Error publishing ASP.NET Core Web API to AWS Serverless Lambda: 'AWSLambdaFullAccess' at 'policyArn' ... Member must have length greater than

For over a year I have been able to publish a ASP.NET Core Web API application using Visual Studio 2019 by selecting "Publish to AWS Lambda..." without incident (via a right click on the project). Until yesterday. Now it consistently fails to publish and rolls back.
The following two reasons are given as to why it has failed.
1 validation error detected: Value 'AWSLambdaFullAccess' at 'policyArn' failed to satisfy constraint: Member must have length greater than or equal to 20 (Service: AmazonIdentityManagement; Status Code: 400; Error Code: ValidationError; Request ID: ...; Proxy: null)
The following resource(s) failed to create: [AspNetCoreFunctionRole, Bucket]. Rollback requested by user.
I have looked at AWSLambdaFullAccess and AWSLambda_FullAccess and the other things and just have no model to follow or even know what it is referring to in any sense where I can imagine a fruitful path to proceed. What exactly is the "Member" it is referring to? Extensive research has yielded nothing of use.
I want to successfully publish my Web API. What can I look into to proceed?
This may not be the correct or ideal solution, I tried this approach and it worked
Step 1:
Changed the Access from "AWSLambdaFullAccess" to "AWSLambda_FullAccess" in serverless.template
"Resources": {
"AspNetCoreFunction": {
"Type": "AWS::Serverless::Function",
"Properties": {
"Handler": "SampleAPI::SampleAPI.LambdaEntryPoint::FunctionHandlerAsync",
"Runtime": "dotnetcore3.1",
"CodeUri": "",
"MemorySize": 256,
"Timeout": 30,
"Role": null,
"Policies": [
"AWSLambda_FullAccess"
],
"Environment": {
"Variables": {
"AppS3Bucket": {
Lambda publishing was successful after this step.
Step 2:
Then I faced an issue in accessing the DynamoDb table. I went to IAM role added the DynamoDb Execution role. (Previously I don't remember adding this role explicitly)
According to https://docs.aws.amazon.com/lambda/latest/dg/access-control-identity-based.html the AWSLambdaFullAccess policy has just been deprecated and as a result my stack which I tried to update was stuck in UPDATE_ROLLBACK_FAILED.
To fix this I had to take the following steps:
Manually continue the rollback of the stack from the CloudFormation page and ensuring that I was skipping the role which was referencing AWSLambdaFullAccess.
Change my AWSLambdaFullAccess reference to AWSLambda_FullAccess in the CloudFormation template
Update the stack using my newly updated CloudFormation template
Hope this is able to help someone!

Why is my S3 lifecycle policy not taking effect?

I have an S3 lifecycle policy to delete objects after 3 days, and I am using a prefix. My problem is that the policy works for all but one sub-directory. For example, lets say my bucket looks like this:
s3://my-bucket/myPrefix/env=dev/
s3://my-bucket/myPrefix/env=stg/
s3://my-bucket/myPrefix/env=prod/
When I check the stg and prod directories, there are no objects older than 3 days. However, when I check the dev directory, there are objects a lot older than that.
Note - There is a huge difference between the volume of data in dev compared to the other 2. Dev holds a lot more logs than the others.
My initial thought was that it was taking longer for Eventual Consistency to show what was deleted and what wasn't, but that theory is gone considering the time that has passed.
The issue seems related to the amount of data in this location under the prefix compared to the others, but I'm not sure what I can do to resolve this. Should I have another policy specific to this location, or is there a somewhere I can check to see what is causing the failure? I did not see anything in Cloudtrail for this event.
Here is my policy:
{
"Rules": [
{
"Expiration": {
"Days": 3
},
"ID": "Delete Object When Stale",
"Prefix": "myPrefix/",
"Status": "Enabled"
}
]
}

What's the most efficient way to determine the minimum AWS permissions necessary for a Terraform configuration?

I have a Terraform configuration targeting deployment on AWS. It applies beautifully when using an IAM user that has permission to do anything (i.e. {actions: ["*"], resources: ["*"]}.
In pursuit of automating the application of this Terraform configuration, I want to determine the minimum set of permissions necessary to apply the configuration initially and effect subsequent changes. I specifically want to avoid giving overbroad permissions in policy, e.g. {actions: ["s3:*"], resources: ["*"]}.
So far, I'm simply running terraform apply until an error occurs. I look at the output or at the terraform log output to see what API call failed and then add it to the deployment user policy. EC2 and S3 are particularly frustrating because the name of the actions seems to not necessarily align with the API method name. I'm several hours into this with easy way to tell how far long I am.
Is there a more efficient way to do this?
It'd be really nice if Terraform advised me what permission/action I need but that's a product enhancement best left to Hashicorp.
Here is another approach, similar to what was said above, but without getting into CloudTrail -
Give full permissions to your IAM user.
Run TF_LOG=trace terraform apply --auto-approve &> log.log
Run cat log.log | grep "DEBUG: Request"
You will get a list of all AWS Actions used.
While I still believe that such super strict policy will be a continuous pain and likely kill productivity (but might depend on the project), there is now a tool for this.
iamlive uses the Client Side Monitoring feature of the AWS SDK to create a minimal policy based on the executed API calls. As Terraform uses the AWS SDK, this works here as well.
In contrast to my previous (and accepted) answer, iamlive should even get the actual IAM actions right, which not necessarily match the API calls 1:1 (and which would be logged by CloudTrail).
For this to work with terraform, you should do export AWS_CSM_ENABLED=true
Efficient way I followed.
The way I deal with is, allow all permissions (*) for that service first, then deny some of them if not required.
For example
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowSpecifics",
"Action": [
"ec2:*",
"rds:*",
"s3:*",
"sns:*",
"sqs:*",
"iam:*",
"elasticloadbalancing:*",
"autoscaling:*",
"cloudwatch:*",
"cloudfront:*",
"route53:*",
"ecr:*",
"logs:*",
"ecs:*",
"application-autoscaling:*",
"logs:*",
"events:*",
"elasticache:*",
"es:*",
"kms:*",
"dynamodb:*"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Sid": "DenySpecifics",
"Action": [
"iam:*User*",
"iam:*Login*",
"iam:*Group*",
"iam:*Provider*",
"aws-portal:*",
"budgets:*",
"config:*",
"directconnect:*",
"aws-marketplace:*",
"aws-marketplace-management:*",
"ec2:*ReservedInstances*"
],
"Effect": "Deny",
"Resource": "*"
}
]
}
You can easily adjust the list in Deny session, if terraform doesn't need or your company doesn't use some aws services.
EDIT Feb 2021: there is a better way using iamlive and client side monitoring. Please see my other answer.
As I guess that there's no perfect solution, treat this answer a bit as result of my brain storming. At least for the initial permission setup, I could imagine the following:
Allow everything first and then process the CloudTrail logs to see, which API calls were made in a terraform apply / destroy cycle.
Afterwards, you update the IAM policy to include exactly these calls.
The tracking of minimum permissions is now provided by AWS itself. https://aws.amazon.com/blogs/security/iam-access-analyzer-makes-it-easier-to-implement-least-privilege-permissions-by-generating-iam-policies-based-on-access-activity/.
If you wanted to be picky about the minimum viable permission principle, you could use CloudFormation StackSets to deploy different roles with minimum permissions, so Terraform could assume them on each module call via different providers, i.e. if you have a module that deploys ASGs, LBs and EC2 instances, then:
include those actions in a role where the workload lives
add an terraform aws provider block that assumes that role
use that provider block within the module call.
The burden is to manage possibly quite a few terraform roles, but as I said, if you want to be picky or you have customer requirements to shrink down terraform user's permissions.
You could also download the CloudTrail event history for the last X days (up to 90) and run the following:
cat event_history.json <(echo "]}") | jq '[.Records[] | .eventName] | unique'
The echo thing is due to the file being too big and shrunk (unknown reason) when downloaded from CloudTrail's page. You can see it below:
> jsonlint event_history.json
Error: Parse error on line 1:
...iam.amazonaws.com"}}
-----------------------^
Expecting ',', ']', got 'EOF'
at Object.parseError (/usr/local/Cellar/jsonlint/1.6.0/libexec/lib/node_modules/jsonlint/lib/jsonlint.js:55:11)
at Object.parse (/usr/local/Cellar/jsonlint/1.6.0/libexec/lib/node_modules/jsonlint/lib/jsonlint.js:132:22)
at parse (/usr/local/Cellar/jsonlint/1.6.0/libexec/lib/node_modules/jsonlint/lib/cli.js:82:14)
at main (/usr/local/Cellar/jsonlint/1.6.0/libexec/lib/node_modules/jsonlint/lib/cli.js:136:14)
at Object.<anonymous> (/usr/local/Cellar/jsonlint/1.6.0/libexec/lib/node_modules/jsonlint/lib/cli.js:178:1)
at Module._compile (node:internal/modules/cjs/loader:1097:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1149:10)
at Module.load (node:internal/modules/cjs/loader:975:32)
at Function.Module._load (node:internal/modules/cjs/loader:822:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
As an addition to either of TF_LOG=trace / iamlive / CloudTrail approaches suggested before, please also note that to capture a complete set of actions required to manage a configuration (create/update/delete resources) one would need to actually apply three configurations:
Original one, to capture actions required to create resources.
Mutated one with as many resource arguments changed as possible, to capture actions required to update resources inplace.
Empty one (applied last) or terraform destroy to capture actions required to delete resources.
While configurations 1 and 3 are common to consider, configuration 2 is sometimes overlooked and it can be a tedious one to prepare. Without it Terraform will fail to apply changes that modify resources instead of deleting and recreating them.
Here is an extension on AvnerSo's answer:
cat log.log | ack -o "(?<=DEBUG: Request )[^ ]*" | sort -u
This command outputs every unique AWS request that Terraform has logged.
The "(?<=DEBUG: Request )[^ ]*" pattern performs a negative lookahead to find the first word after the match.
The -o flag only shows the match in the output.
sort -u selects the unique values from the list and sorts them.
Another option in addition to the previous answers is:
give broad permissions "s3:*", ... as explained earlier
Check the AWS Access Advisor tab in the AWS console for used permission and then trim down your permissions accordingly

S3 TVM Issue – getting access denied

I'm trying to let my iOS app upload to S3 using credentials it gets from a slightly modified anonymous token vending machine.
The policy statement my token vending machine returns is:
{"Statement":
[
{"Effect":"Allow",
"Action":"s3:*",
"Resource":"arn:aws:s3:::my-bucket-test",
"Condition": {
"StringLike": {
"s3:prefix": "66-*"
}
}
},
{"Effect":"Deny","Action":"sdb:*","Resource":["arn:aws:sdb:us-east-1:MYACCOUNTIDHERE:domain/__USERS_DOMAIN__","arn:aws:sdb:us-east-1:MYACCOUNTIDHERE:domain/TokenVendingMachine_DEVICES"]},
{"Effect":"Deny","Action":"iam:*","Resource":"*"}
]
}
The object I'm trying to put has the same bucket name and key 66-3315F11E-84FA-417F-9C32-AC4BE364AD99.natural.mp4.
As far as I understand this should work fine, but it doesn't, and throws an access denied message. Is there anything wrong with my policy statement?
You don't need to use prefix to refer to the resource for the context of Object operations. I'd also recommend restricting the S3 actions. Here is a recommend policy, based on the one from an article on an S3 Personal File Store. Feel free to remove the ListBucket if it doesn't make sense for you app.
{"Statement":
[
{"Effect":"Allow",
"Action":["s3:PutObject","s3:GetObject","s3:DeleteObject"],
"Resource":"arn:aws:s3:::my-bucket-test/66-*",
},
{"Effect":"Allow",
"Action":"s3:ListBucket",
"Resource":"arn:aws:s3:::my-bucket-test",
"Condition":{
"StringLike":{
"s3:prefix":"66-*"
}
}
},
{"Effect":"Deny","Action":"sdb:*","Resource":["arn:aws:sdb:us-east-1:MYACCOUNTIDHERE:domain/__USERS_DOMAIN__","arn:aws:sdb:us-east-1:MYACCOUNTIDHERE:domain/TokenVendingMachine_DEVICES"]},
{"Effect":"Deny","Action":"iam:*","Resource":"*"}
]
}