aws elasticbeanstalk: cannot deploy to worker environment via eb cli - amazon-web-services

I've created a worker environment for my eb application in order to take advantage of its "periodic tasks" capabilities using cron.yaml (located in the root of my application). It's a simple sinatra app (for now) that I would like to use to use to issue requests to my corresponding web server environment.
However, I'm having trouble deploying via the eb cli. Below is what happens I run eb deploy.
╰─➤ eb deploy
Creating application version archive "4882".
Uploading myapp/4882.zip to S3. This may take a while.
Upload Complete.
INFO: Environment update is starting.
ERROR: Service:AmazonCloudFormation, Message:Stack named 'awseb-e-1a2b3c4d5e-stack'
aborted operation. Current state: 'UPDATE_ROLLBACK_IN_PROGRESS'
Reason: The following resource(s) failed to create: [AWSEBWorkerCronLeaderRegistry].
I've looked around the CloudFormation dashboard to see to check for possible errors. After reading a bit of about what I could find regarding AWSEBWorkerCronLeaderRegistry, I found it that it's most likely a DynamoDB table that gets updated/created. However, when I look in the DynamoDB dashboard, there are no tables listed.
As always, any help, feedback, or guidance is appreciated.

If you are reluctant to add full DynamoDB access (like I was), Beanstalk now provides a Managed Policy for Worker environment permissions (AWSElasticBeanstalkWorkerTier). You can try adding one of these to your instance profile role instead.
See http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/iam-instanceprofile.html

We had the same issue and fixed it by attaching AmazonDynamoDBFullAccess to Elastic Beanstalk role (which was named aws-elasticbeanstalk-ec2-role in our case).

I was using Codepipeline to deploy my worker and was getting the same error. Eventually I tried giving AWS-CodePipeline-Service the AmazonDynamoDBFullAccess policy and that seemed to resolve the issue.

As Anthony suggested, when triggering the deploy from other services such as CodePileline, its service role needs the dynamodb:CreateTable permission to create the Leader Registry table (more info below) in DynamoDB.
Adding Full Access permission is a bad practice and should be avoided. Also, the managed policy AWSElasticBeanstalkWorkerTier does not have the appropriate permissions since it is for the worker to access DynamoDB and check if they are the current leader.
1. Find the Role that is trying to create the table:
Go to CloudTrail > Event History
Filter Event Name: CreateTable
Make sure the error code is AccessDenied
Locate the role name (i.e. AWSCodePipelineServiceRole-us-east-1-dev):
2. Add the permissions:
Go to IAM > Roles
Find the role in the list
Attach a policy with:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CreateCronLeaderTable",
"Effect": "Allow",
"Action": "dynamodb:CreateTable",
"Resource": "arn:aws:dynamodb:*:*:table/*-stack-AWSEBWorkerCronLeaderRegistry*"
}
]
}
3. Check results:
Redeploy by triggering the pipeline
Check Elasticbeanstalk for errors
Optionally go to CloudTrail and make sure the request succeded this time.
You may use this technique any time you are not sure of what permission should be attached to what.
About the Cron Leader Table
From the Periodic Tasks Documentation:
Elastic Beanstalk uses leader election to determine which instance in your worker environment queues the periodic task. Each instance attempts to become leader by writing to an Amazon DynamoDB table. The first instance that succeeds is the leader, and must continue to write to the table to maintain leader status. If the leader goes out of service, another instance quickly takes its place.
For those wondering, this DynamoDB table uses 10 RCU and 5 WCU which covered by the always free tier.

Related

AWS CloudWatch Agent: NoCredentialsError: Unable to locate credentials

I am receiving the following errors in the EC2 CloudWatch Agent logs, /var/logs/awslogs.log:
I verified the EC2 has a role:
And the role has the correct policies:
I have set the correct region in /etc/awslogs/awscli.conf:
I noticed that running aws configure list in the EC2 gives this:
Is this incorrect? Should it list the profile (EC2_Cloudwatch_Profile) there?
I was using terraform and reprovisioning by doing:
terraform destroy && terraform apply
Looks like due to IAM being a global service it is "eventually consistent" and not "immediately consistent", when the profile instance was destroyed, the terraform apply began too quickly. Despite the "destroy" being complete, the arn for the previous profile instance was still there, and was re-used. However, the ID changed to a new ID.
Replacing the EC2 would bring it up to speed with the correct ID. However, my solution is to just wait longer between terraform destroy and apply.

AWS on Terraform: Error deleting resource: timeout while waiting for state to become 'destroyed'

I'm using Terraform (v0.12.28) to launch my AWS environment (aws provider v2.70.0).
When I try to remove all resources with terraform destroy I'm facing the error below:
error deleting subnet (subnet-XXX): timeout while waiting for state to become 'destroyed' (last state: 'pending', timeout: 20m0s)
I can add my Terraform code but I think there is nothing special in my resources stack which basically includes:
VPC and Subnets.
Internet and NAT GTWs.
Application Load Balancers.
Route tables.
Auto-generated NACL and Elastic Network Interfaces (ENIs).
In my case, the problem seems to be related to the ENIs which are attached to the ALBs - as can be seen from the AWS console:
While searching for solutions I noticed that it is a common problem which can come in different resources and type of dependencies.
I'll focus in this question to problems which are related to VPC components (Subnets, ENIs etc') and resources that have dependency on them (Load Balancers, EC2,Lambda functions etc') and are failing to be deleted probably due to the fact that a detaching phase is required prior to the deletion.
Any help will be highly appreciated.
(*) The Terraform user for this environment (DEV) has full Admin privileges:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
So this shouldn't be related to policies.
Examples for related issues:
Update: Issue affecting HashiCorp Terraform resource deletions after the VPC Improvements to AWS Lambda (Solution doesn't work - I've an updated version of AWS provider).
AWS VPC - cannot detach "in use" AWS Lambda VPC ENI
Lambda Associated EC2 Subnet and Security Group Deletion Issues and Improvements
AWS: deletion of subnet times out because of scaling group
Error waiting for route table (rtb-xxxxxx) to become destroyed: timeout while waiting for state to become
Error waiting for internet gateway to detach / Cluster has node groups attached
I ran into this issue while trying to destroy an EKS cluster after I had already deployed services onto the cluster, specifically a load balancer. To fix this I manually deleted the load balancer and the security group associated to the load balancer.
Terraform is not aware of the resources provisioned by k8s and will not clean up dependent resources.
If you're unsure what resources are preventing Terraform from destroying infrastructure you can try an of:
Use terraform apply to get back into a good state and then use kubectl to clean up resources before running terraform destroy again.
This knowledge base article includes a script you can run to identify dependencies: https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-dependency-error-delete-vpc/
Review CloudTrail logs to see what resources were created. If this was an issue with EKS you can filter by username: AmazonEKS.
Another variation of this issue is a DependencyViolation error. Ex:
Error deleting VPC: DependencyViolation: The vpc 'vpc-xxxxx' has dependencies and cannot be deleted. status code: 400
I ran into this issue just now…
One (hacky) solution is to attempt to delete the subnet through the AWS Console. AWS will then tell you what is preventing the subnet from being deleted—for me, it was two network interfaces that needed to be detached and then deleted before Terraform had the power to delete my subnets.
Like Mathew Tinsley says, there are sometimes associated resources created implicitly by AWS that Terraform can't destroy by itself.
I had similar issue while destroying step function, and the problem was that it had active executions (status: Running). I stopped them and it successfully deleted the step function.

AWS Glue Job getting Access Denied when writing to S3

I have a Glue ETL job, created by CloudFormation. This job extracts data from RDS Aurora and write to S3.
When I run this job, I get the error below.
The job has an IAM service role.
This service role allows
Glue and RDS service,
assume arn:aws:iam::aws:policy/AmazonS3FullAccess and arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole, and
has full range of rds:* , kms:* , and s3:* actions allow to the corresponding RDS, KMS, and S3 resources.
I have the same error whether the S3 bucket is encrypted with either AES256 or aws:kms.
I get the same error whether the job has a Security Configuration or not.
I have a job doing the exactly same thing that I created manually and can run successfully without a Security Configuration.
What am I missing? Here's the full error log
"/mnt/yarn/usercache/root/appcache/application_1...5_0002/container_15...45_0002_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o145.pyWriteDynamicFrame.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 2.0 failed 4 times, most recent failure: Lost task 3.3 in stage 2.0 (TID 30, ip-10-....us-west-2.compute.internal, executor 1): com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: F...49), S3 Extended Request ID: eo...wXZw=
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588
Unfortunately the error doesn't tell us much except that it's failing during the write of your DynamicFrame.
There is only a handful of possible reasons for the 403, you can check if you have met them all:
Bucket Policy rules on the destination bucket.
The IAM Role needs permissions (although you mention having S3*)
If this is cross-account, then there is more to check with regards things like to allow-policies on the bucket and user. (In general a Trust for the Canonical Account ID is simplest)
I don't know how complicated your policy documents might be for the Role and Bucket, but remember that an explicit Deny statement takes precedence over an allow.
If the issue is KMS related, I would check to ensure your Subnet you select for the Glue Connection has a route to reach the KMS endpoints (You can add an Endpoint for KMS in VPC)
Make sure issue is not with the Temporary Directory that is also configured for your job or perhaps write-operations that are not your final.
Check that your account is the "object owner" of the location you are writing to (normally an issue when read/writing data between accounts)
If none of the above works, you can shed some more light with regards to your setup. Perhaps the code for write-operation.
In addition to Lydon's answer, error 403 is also received if your Data Source location is the same as the Data Target; defined when creating a Job in Glue. Change either of these if they are identical and the issue will be resolved.
You should add a Security configurations(mentioned under Secuity tab on Glue Console). providing S3 Encryption mode either SSE-KMS or SSE-S3.
Security Configuration
Now select the above security configuration while creating your job under Advance Properties.
Duly verify you IAM role & S3 bucket policy.
It will work
How are you providing permission for PassRole to glue role?
{
"Sid": "AllowAccessToRoleOnly",
"Effect": "Allow",
"Action": [
"iam:PassRole",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:ListRolePolicies",
"iam:ListAttachedRolePolicies"
],
"Resource": "arn:aws:iam::*:role/<role>"
}
Usually we create roles using <project>-<role>-<env> e.g. xyz-glue-dev where project name is xyz and env is dev. In that case we use "Resource": "arn:aws:iam::*:role/xyz-*-dev"
For me it was two things.
Access policy for a bucket should be given correctly - bucket/*, here I was missing the * part
Endpoint in VPC must be created for glue to access S3 https://docs.aws.amazon.com/glue/latest/dg/vpc-endpoints-s3.html
After these two settings, my glue job ran successfully. Hope this helps.
Make sure you have given the right policies.
I was facing the same issue, thought I had the role configured well.
But after I erased the role and followed this step, it worked ;]

How to use AWS ECS Task Role in Node AWS SDK code

Code that uses the AWS Node SDK doesn't seem to be able to gain the role permissions of the ECS task.
If I run the code on an EC2 ECS instance, the code seems to inherit the role on the instance, not of the task.
If I run the code on Fargate, the code doesn't get any permission.
By contrast, any bash scripts that run within the instance seem to have the proper permissions.
Indeed, the documentation doesn't mention this as an option for the node sdk, just:
Loaded from IAM roles for Amazon EC2 (if running on EC2),
Loaded from the shared credentials file (~/.aws/credentials),
Loaded from environment variables,
Loaded from a JSON file on disk,
Hardcoded in your application
Is there any way to have your node code gain the permissions of the ECS task?
This seems to be the logical way to pass permissions to your code. It works beautifully with code running on an instance.
The only workaround I can think of is to create one IAM user per ECS service and pass the API Key/Secret as environmental variables in the task definition. However, that doesn't seem very secure since it would be visible in plain text to anyone with access to the task definition.
Your question is missing a lot of details on how you setup your ECS Cluster plus I am not sure if the question is for ECS or for Fargate specifically.
Make sure that you are using the latest version of the SDK. Javascript supports ECS and Fargate task credentials.
Often there is confusion about credentials on ECS. There is the IAM role that is assigned to the Cluster EC2 instances and the IAM role that is assigned to ECS tasks.
The most common problem is the "Trust Relationship" has not been setup on the ECS Task Role. Select your IAM role and then the "Trust Relationships" tab and make sure that it looks like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
In addition to the standard Amazon ECS permissions required to run tasks and services, IAM users also require iam:PassRole permissions to use IAM roles for tasks.
Next verify that you are using the IAM role in the task definition. Specify the correct IAM role ARN in the Task Role field. Note that this different than Task Execution Role (which allows containers to pull images and publish logs).
Next make sure that your ECS Instances are using the latest version of the ECS Agent. The agent version is listed on the "ECS Instances" tab under the right hand side column "Agent version". The current version is 1.20.3.
Are you using an ECS optimized AMI? If not, add --net=host to your docker run command that starts the agent. Review this link for more information.
I figured it out. This was a weird one.
A colleague thought it would be "safer" if we call Object.freeze on proccess.env. This was somehow interfering with the SDK's ability to access the credentials.
Removed that "improvement" and all is fine again. I think the lesson is "do not mess with process.env".

GitHub Aws Code deployment shows "AWS CodeDeploy doesn't support the push event"

Oops, we weren’t able to send the test payload: AWS Code Deploy doesn't support the push event.
Above error shown to me when I am trying to test my hook service "Code Deploy For AWS". Also when I commit my code it should automatically deploy my new code, but it fails.
Can you help me out for above?
Several people have had this same issue, and there are a few things to double check and a few tricky parts in that AWS Blog post that aren't well explained.
Double check your IAM User that you created, and make sure it has the correct IAM policy. You can use the AWS-provided "AWSCodeDeployDeployerAccess" policy if you don't want to write your own
Check out this post in the AWS Developer Forum. The TLDR is that the deployment group must be all lower case. For some reason GitHub down-cases the deployment group name in the API call, which will cause a name mismatch with your deployment group in AWS.
Make sure that you set your "environments" property to the name of your deployment group when you set up your "GitHub Auto-Deployment" service. The blog post doesn't say that they need to match, but if you look at the screenshots, the author does in fact use the same string for both the "environments" property in the Auto-Deployment service and the Deployment Group property in the AWS CodeDeploy service
If you're still having a hard time setting up the GitHub hook or CodeDeploy in general, I encourage you to take my AWS CodeDeploy course
If possible can you paste the permission policy for the AWS user that you use to call CodeDeploy from Github? Most commonly a problem with your permission settings on the user would raise this error.
Also are you setting the aws_region configuration to the region where your CodeDeploy application exists? Otherwise github uses 'us-east-1' by default. Please see https://github.com/github/github-services/pull/1014
Thanks,
Surya.
I was getting the same issue while testing the service hooks, then I checked my deployment group name in AWS was different then the 'environments' value in Github, I changed to have the same value in both places. now it works.
And make sure the the IAM user you are using is having codeDeployAccess permission. In my case it is this or you can use the AWS existing policy for this, i.e 'AWSCodeDeployDeployerAccess'.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "codedeploy:*",
"Resource": "*"
}
]
}
Though it still show this error when I test the web hook service in Github but it really works when I push my code, some people mentioned the same as well in this post. So even though your web hook test shows error, you can ahead and test with a real git push.