Codedeploy with S3 always fails after 5 minutes - amazon-web-services

I've spent the better half of the day trying to setup CodeDeploy, CodePipeline, S3 and EC2.
Codepipeline will successfully:
Pick up detected changes in GitHub
Push the ZIP file up to S3
Trigger CodeDeploy to begin deployment
Also
EC2 has list and read access to S3
S3 allows all actions from EC2
I've followed this outdated guide mostly: https://cloudacademy.com/blog/how-to-deploy-application-code-from-s3-using-aws-codedeploy/
appspec.yml
version: 0.0
os: linux
files:
- source: /
destination: /var/www
hooks:
AfterInstall:
- location: hooks/after-install.sh
runas: root
I'm rather new to AWS and can't for my life find where the logs are telling me what's going on, nor do I get any error message that points me anywhere, so I've literally been shooting blind double checking everything all day and trying again and this is taunting me now:
Any help even if it's pointing me towards where I can actually find the error message would be tremendously appreciated, thanks for your time

This generally occurs for one of the following 3 reasons:
The CodeDeploy agent needs to be installed and running on the target instance.
No access to CodeDeploy and S3 service. Either ensure you are:
Running an instance in a public subnet with an internet gateway
Running an instance in a private subnet with a NAT gateway/NAT instance
The IAM permissions for the IAM role of the instance are not sufficient, for sufficient permissions attach the AWSCodeDeployRole policy.
As you have said your IAM role permissions are fine you are left with one of the other 2 scenarios.
Once these are working you can generally see the logs within the /var/log/aws/codedeploy-agent location.

Related

AWS CloudWatch Agent: NoCredentialsError: Unable to locate credentials

I am receiving the following errors in the EC2 CloudWatch Agent logs, /var/logs/awslogs.log:
I verified the EC2 has a role:
And the role has the correct policies:
I have set the correct region in /etc/awslogs/awscli.conf:
I noticed that running aws configure list in the EC2 gives this:
Is this incorrect? Should it list the profile (EC2_Cloudwatch_Profile) there?
I was using terraform and reprovisioning by doing:
terraform destroy && terraform apply
Looks like due to IAM being a global service it is "eventually consistent" and not "immediately consistent", when the profile instance was destroyed, the terraform apply began too quickly. Despite the "destroy" being complete, the arn for the previous profile instance was still there, and was re-used. However, the ID changed to a new ID.
Replacing the EC2 would bring it up to speed with the correct ID. However, my solution is to just wait longer between terraform destroy and apply.

CodeDeploy events not running

This is how my CodeDeploy status looks like:
This is first time I'm trying to set this up. I created EC2 and added following policies to attached IAM role:
and edited Trust relationships like this:
also I installed code deploy agent on EC2 instance.
this is my appspec.yml
version: 0.0
os: linux
files:
- source: .
destination: /home/ubuntu
hooks:
ApplicationStop:
- location: scripts/stop_server.sh
timeout: 5
runas: root
stop_server.sh is just an empty file
any ideas?
The most likely problem you're facing is that the agent either isn't installed or the instance doesn't have sufficient permissions. When there are no events started on the instance for the deployment, it means that CodeDeploy couldn't talk to the host for some reasons.
Here's the steps I would take:
Confirm that you installed the CodeDeploy agent
Confirm that you've created the IAM service role
Confirm that you have the IAM Instance Profile and that it's associated with the instance
Check that you can reach the CodeDeploy commands endpoint in your region from the box. i.e. ping codedeploy.us-east-1.amazonaws.com Otherwise, your networking setup might be too restrictive.
Look at the logs on the host to see what's going on

AWS CodeDeploy stuck in AllowTraffic step

I'm using AWS CodeDeploy to deploy my project (triggered by CodePipeline) to an autoscaling group (EC2 instances behind an ALB). This is my appSpec file:
version: 0.0
os: linux
files:
- source: /
destination: /var/www/html/test-deploy
overwrite: true
permissions:
- object: /var/www/html/test-deploy/codedeploy
pattern: "*.sh"
owner: root
group: root
mode: 755
type:
- file
hooks:
BeforeInstall:
- location: codedeploy/before_install.sh
timeout: 180
AfterInstall:
- location: codedeploy/after_install.sh
runas: centos
timeout: 180
The files get deployed successfully to the EC2 instance, but for some reason after the "BeforeAllowTraffic" nothing happens, like I waited 15 min and the next step was still at "pending".
The two .sh files do nothing fancy (and codedeploy passed those steps so I don't think that's the problem).
Can anyone point me to a direction? I don't get any error messages, so I don't even know how to debug it.
Thanks
I have got the same issue, after investigation, I found that my target group was "unhealthy". I just add the health check path/file i.e "/rorbots.txt" and rebooted the Ec2 Server and its fixed the problem.
We also had an unhealthy target instance. The problem was hosting two applications on the same instance, where one (application A) was responsible for health checks and talking to the load balancer, and the other one (application B without any open network ports) was being deployed. One instance was always getting stuck in AllowTraffic during app B deployments. I found the root cause when I looked at the target group for app A and saw that same instance in the "unhealthy" status, so of course deploying app B wasn't going to fix that. After I re-deployed app A and restored the instance back to health, app B deployments were able to progress.
Check your logs on your target group instances. It may be caused by one of the following:
the application startup command did not finish successfully
the application is not running due to an error
your target group's health check is NOT configured with the endpoint you expect
your application is NOT responding at the endpoint you expect

AWS Codedploy Agent Access denied from EC2 instance to S3

I have set up the Codedeploy Agent, however when I run it, I get the error:
Error: HEALT_CONSTRAINTS
By going further , this is the entry in the code deploy log from the EC2 instance:
InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Cannot reach InstanceService: Aws::S3::Errors::AccessDenied - Access Denied
I have done a simple wget from the bucket and it results:
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|xxxxxxxxx|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
On the opposite, if I use the AWS cli I can correctly reach the S3 bucket.
The EC2 instance is on a VPC, it has a role associated with full permission on S3, firewall settings inbound and outbound seem correct. So it is obviously something related to permissions in accessing from https.
The questions:
Under which credentials Code Deploy Agent runs ?
What permissions or roles have to be set on S3 bucket ?
The EC2 instance's credentials (the instance role) will be used when pulling from S3.
To be clear, the Service Role that CodeDeploy needs does not need S3 permissions. The ServiceRole CodeDeploy needs allows CodeDeploy to call AutoScaling & EC2 APIs to describe the instances so CodeDeploy knows how to deploy to them.
That being said, for your AccessDenied issue for S3, there are 2 things you need to check
The role that the EC2 instance(s) has s3:Get* and s3:List* (or more specific) permissions
The S3 bucket you want to deploy has a policy attached that allows the EC2 instance role to get the object.
Documentation for permissions: http://docs.aws.amazon.com/codedeploy/latest/userguide/instances-ec2-configure.html#instances-ec2-configure-2-verify-instance-profile-permissions
CodeDeploy uses "Service Roles" to access AWS resoures. In the AWS console for CodeDeploy, look for "Service role". Assign the IAM role that you created for CodeDeploy in your application settings.
If you have not created a IAM role for CodeDeploy, do so and then assign it to your CodeDeploy application.

Elastic BeanStalk MultiContainer docker fails

I want to deploy an multi-container application in elastic beanstalk. I get the following error.
Error 1: The EC2 instances failed to communicate with AWS Elastic
Beanstalk, either because of configuration problems with the VPC or a
failed EC2 instance. Check your VPC configuration and try launching
the environment again.
I have set up the VPC with just the public subnet and the security group that allows all traffic both inbound and outbound. I know this is not encouraged for production level deployment, but I have reduced the complexity to find the cause of the error.
So, the load balancer and the EC2 instance are inside the same public subnet that is attached with the internet gateway. They both share the same security group allowing all the traffic.
Before the above error, I also get another error stating
Error 2: No ecs task definition (or empty definition file) found in environment
Having said, I have bundled my Dockerrun.aws.json file with .ebextensions folder inside the source bundle which the beanstalk uses for deployment.
After all these errors, drilling down to two questions:
I cannot understand why No ecs task error appears, when I have packaged my dockerrun.aws.json file containing containerDefinitions?
Since there is no ecs task running, there is nothing running in the instance. Is this why beanstalk and ELB cannot communicate to the instance? (Assuming my public subnet and all traffic security group is not a problem)
The problem was the VPC. Even I had the simple VPC with just an public subnet, the beanstalk cannot talk to the instance and so cannot deploy the ECS task definition and docker containers in the instance.
By creating two subnets namely public and private and having an NAT instance in public subnet, which becomes the router for the instances in the private subnet. Making the above setup worked for me and I could deploy the ECS task definition successfully to the EC2 instance in the private subnet.
I found this question because I got the same error. Here are the steps that worked for me to actually deploy a multi-container app on Beanstalk:
To get past this particular error, I used the eb CLI tools. For some reason, using eb deploy instead of zipping and uploading myself fixed this. It didn't actually work, but it gave me a new error.
So, I changed my Dockerrun.aws.json, a file format that needs WAY more documentation, until I stopped getting errors about that.
Then, I got an even better error!
ERROR: [Instance: i-0*********0bb37cf] Command failed on instance.
Return code: 1 Output: (TRUNCATED)..._api_call
raise ClientError(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when
calling the GetObject operation: Access Denied
Failed to download authentication credentials [config file name] from [bucket name].
Hook /opt/elasticbeanstalk/hooks/appdeploy/enact/02update-
credentials.sh failed. For more detail, check /var/log/eb-activity.log
using console or EB CLI.
Per this part of the docs the way to solve this is to
Open the Roles page in the IAM console.
Choose aws-elasticbeanstalk-ec2-role.
On the Permissions tab, under Managed Policies, choose Attach Policy.
Select the managed policy for the additional services that your application uses. For example, AmazonS3FullAccess or AmazonDynamoDBFullAccess. (For our problem, the S3 one)
Choose Attach Policies.
This part got really exciting, because I got yet another error: Authentication credentials are not in JSON format as expected. Please generate the credentials using 'docker login'. (Keep in mind, I tried to follow the instructions on how to do this to the letter, but, oh well). Turns out this one was on me, I had malformed JSON in my DockerHub auth file stored on S3. I renamed the file to dockercfg.json to get syntax checking, and it seems the Beanstalk/ECS is okay with having the .json as part of the name, because this time... there was a different error: CannotPullContainerError: Error: image [DockerHub organization]/[repo name]:latest not found). Hmm, maybe there was a typo? Let's check:
$ docker run -it [DockerHub organization]/[repo name]:latest
Unable to find image '[DockerHub organization]/[repo name]:latest' locally
latest: Pulling from [DockerHub organization]/[repo name]
Ok, the repo is there. So... my auth is bad? Yup, turns out I followed an example in the DockerHub auth docs that was of what you shouldn't do. Your dockercfg.json should look like
{
"https://index.docker.io/v1/": {
"auth": "ZWpMQ=Vyd5zOmFsluMTkycN0ZGYmbn=WV2FtaGF2",
"email": "your#email.com"
}
}
There were a few more errors (volume sourcePath has to be a absolute path! That's what the invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed message means), but it eventually deployed. Sorry for the novel; hoping it helps someone.