aws: Networking issue - amazon-web-services

While running ecs fargate task aws automatically stopped my task with error
There was an error while describing network interfaces.
The networkInterface ID 'eni-0c21gdfgerg' does not exist
My task was running for more than a day but now it suddenly stopped.
I checked that eni- and that eni is not existing.
How can I troubleshoot it?

Seems like ecs fargate task have different stages and in Deprovisioning stage it deletes all the networking related stuff where network interface is also deleted.
Also I was viewing this task in Stopped stage that's why i was getting the error.

I faced the exactly same issue. "networkInterface ID 'eni-xx' does not exist. My exit code is 1. The application error.
I'm having a python project. It turns out I missed one dependency name in my requirement.txt.

Related

AWS CodeDeploy detected that the replacement task set is unhealthy?

I have an ECS Fargate app up and running on AWS, and I was deploying a newer version of my code through CodeDeploy blue-green deployment. I have been using this method for quite some time, and I have never encountered any problems before unless there was actually a problem with the app itself. As always, I initiated the deployment and waited until all the tasks were running, and checked that the traffic has been rerouted to the newer version of task sets. I tested the app on a couple of devices and made sure that it was working correctly. However, after around 20 minutes or so, my service was down for a few minutes and I get an error message like this on CodeDeploy : CodeDeploy detected that the replacement task set is unhealthy. I expected codedeploy to automatically roll-back the deployment, but it was still the newer version of task set that was receiving traffic, and it was working fine! I did see a couple of stopped tasks but I do not have access to their logs anymore since stopped tasks somehow evaporate and are not accessible after some time. I re-ran the deployment with the exact same task definition, and that worked fine too. Does anyone have any idea what might cause a task set to be in an unhealthy state? Thanks so much!
below is the image of the error
deployment status

AWS Fargate clusters stop logging shortly after being started up, although still being healthy and running

We have a blue/green deployment in AWS using Fargate.
In this deploy we run 2 tasks. They are being started and healthy and we see the first logs, in the ECS tasks themselves and in Cloudwatch. After some minutes though, we don't get any new log messages in either one anymore.
The instances though keep on running and all looks like normal.
I found this post here that describes a very similar issue, but does not have any answers yet: https://forums.aws.amazon.com/thread.jspa?messageID=987634&tstart=0
Anybody experienced something similar or can advice where to check for possible causes?

How to debug "Resource creation timed out waiting for completion" in AWS Cloudformation?

I'm brand new to AWS and I have a script which I believe should create an ECS cluster.
When I run the script, my stack hangs in the CREATE_IN_PROGRESS state for over an hour. Eventually, it fails and goes into ROLLBACK_COMPLETE.
When I'm in Cloudformation in the AWS console, I can go to "Events" and see that two Services which I'm trying to create are causing stack creation to fail. However, the only error message is Resource creation timed out waiting for completion.
I've tried the steps outlined here, namely, including going in to CloudTrail, but I'm not really sure what to look for and haven't found anything to help me resolve the issue. Again, I'm an AWS noob.
What are some steps I can take to get a more detailed error message? How do I go about debugging in AWS?
Any help is appreciated, let me know if I need to provide more info.
I was running into the same situation with CDK where my ECS would fail after 3 hours of CREATE_IN_PROGRESS. A big issue with debugging and troubleshooting is when ROLLBACK happened it wipes your ECS cluster and the event history. However, if you go to the ECS console's Task list you should see a task and I bet you it's stuck in a PENDING state. There are a lot of reasons for this. When the Task fails to reach the desired state it'll add the reason it failed to the Service's Events. To get there:
On this page there's an Events tab
Select a Task and it show that it STOPPED. In my case below it looks like it couldn't find the ECS container template image

delete ECS task for old revision - only use new revision

As you can see I have task definition for revision 4 and a task definition for revision 5. I want permanently stop running 4, and only run 5:
So in other words, the task that is PROVISIONING - I only want that one. The task that is RUNNING - I don't want that one to run anymore. How to achieve this?
I tried to replicate the scenario and it went well for me. So what I think is you need to dig further under the hood.
Your task is in provisioning state which I believe is related to your environment and not related to your task, service or cluster.
From the AWS Documentation :
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-lifecycle.html
**PROVISIONING**
Amazon ECS has to perform additional steps before the task is launched. For example, for tasks that use the awsvpc network mode, the elastic network interface needs to be provisioned.
You might want to check below things to start debugging :
Cloudformation template that ECS use to provision your resources.
Try looking into your VPC if anything got changed since the last deployment.
Security Groups, IAM Roles to find out if anything blocking your resource creation.

S3 to EC2 problems

I´m trying to transfer a project from TeamCity to my EC2 server using CodeDeploy. In the process we had a problem and the file isn´t being transfered from the S3 service to our EC2 instance. The error message is :
The overall deployment failed because too many individual instances
failed deployment, too few healthy instances are available for
deployment, or some instances in your deployment group are
experiencing problems. (Error code: HEALTH_CONSTRAINTS)
The team decided that the best way to solve this problem was reading the server log, but in the process we noticed that the server keeps shutting down alone and that was a huge problem, we tought that can be solved with the logs, so we tried to get them using CodeWatch ( our team created the correct IAM and with a run command installed the agent on the server). Sadly this work only managed to get shutting down or turning information on logs.
At this moment we don´t know how to solve this problem but our plain was to get all the logs and then see what is wrong.
I´m stucked at this part since setember, can someone help me ?