Debugging Pulumi "ResourceNotReady: exceeded wait attempts" - amazon-web-services

I am trying to deploy a fargate service on AWS ECS with Pulumi as IaC.
Everything works as expected when deploying my Fargate service with:
deploymentController: {
type: "ECS"
},
But changing it to:
deploymentController: {
type: "CODE_DEPLOY"
},
Ends with error message: "ResourceNotReady: exceeded wait attempts"
Is there any way to debug this that would help me to find out what resource Pulumi is waiting for?
Is there some hidden dependencies for Blue/Green deployment on ECS that is not obvious when deploying with Pulumi?

Are you deploying to an ECS Cluster living within a different stack than your Fargate service stack?
If so then that's the reason behin the timeout error. Cause the stack isn't able to ping the service and make sure it's steady ready, since it's in a different stack.

Related

Cancel AWS CDK deployment after X failed task

I am deploying a service on aws using an ApplicationLoadBalancedEc2Service.
Sometimes while doing some testing, I deploy a configuration that results in errors. The problem is that instead of canceling the deployment, the cdk just hangs for hours. The reason is that AWS tries to keep spinning up a task (which fails due to my wrong configuration).
Right now I have to set the task number to 0 through the AWS console. This will cause to successfully complete the deployment and allow me to spin a new version.
Is there a way to cancel the deployment and just rollback after X amount of failed tasks?
One way is to configure CodeDeploy to roll back the service to its previous version if the new deployment fails. This won't "cancel the CDK deployment", but will stabilize the service.
Another way is to add a Custom Resource with an asynchronous provider to poll the ECS service status, signaling CloudFormation if your success condition is not met. This will revert the CDK deployment itself.
You're looking for the Circuit Breaker feature:
declare const cluster: ecs.Cluster;
const loadBalancedEcsService = new ecsPatterns.ApplicationLoadBalancedEc2Service(this, 'Service', {
cluster,
memoryLimitMiB: 1024,
taskImageOptions: {
image: ecs.ContainerImage.fromRegistry('test'),
},
desiredCount: 2,
circuitBreaker: { rollback: true }
});
It will give your deploy between 10 and 200 tries (0.5 times your desired task count, with these min/max values), before to cancel your deploy. The rollback argument allows you to re-launch tasks with the previous task definition.

AWS CLI ecs run-task CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref

I'm trying to move from the Console to the CLI.
I have an ECS Cluster and a Task Definition. From the console, I can run a task WITHOUT any issue. The task comes green and I can use the public IP to access my service.
Now, I'd like to do the same but instead of creating the task using the Console, I'd like to use AWS cli.
I thought this was enough:
aws ecs run-task --cluster my-cluster \
--task-definition ecs-task-def:9 \
--launch-type FARGATE \
--network-configuration '{ "awsvpcConfiguration": { "subnets": ["subnet-XX1","subnet-XX2"], "securityGroups": ["sg-XXX"],"assignPublicIp": "ENABLED" }}'
However, the task gets stuck in PENDING state and after a while is STOPPED with the following error message:
CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "docker.io/username/container:latest": failed to do request: Head https://registry-1.docker.io/v2/username/container/manifests/latest: dial tcp x.x.x.x:443: i/o timeout
What concerns me is that I can run tasks from the Console using the same arguments (VPC, Subnets, Sec Group, etc) but I cannot make it work using the CLI.
If the issue was missing/wrong rules both Console and CLI should not work.
Anyone knows why?
Look like ECS cannot pull image from registry
CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "docker.io/username/container:latest": failed to do request: Head https://registry-1.docker.io/v2/username/container/manifests/latest: dial tcp x.x.x.x:443: i/o timeout
suggested that network through 443 has been blocked!? hence cannot pull image. Have you tried allow all traffic inbound & outbound on attached sg as well as check network connectivity from within attached subnet?
You can create a simple Lambda function with similar associated subnets & security groups then executing telnet/curl to registry endpoint to check connectivity.
example:
def test_book():
http = urllib3.PoolManager()
url = 'https://your-endpoint-here'
headers = {
"Accept": "application/json"
}
r = http.request(method='GET', url=url, headers=headers)
print(f'response_status: {r.status}\nresonse_headers: {r.headers}\nresponse_data: {r.data}')

"DeploymentLimitExceededException" on ECS Service (AWS)

I got the error when I created a service in ECS.
As the error says, the error happens in CodeDeploy.
The CodeDeploy deployment was not successful.
CodeDeploy The blue/green deployment was not successfully started
for the service: The Deployment Group 'DgpECS-blogClu-test' already
has an active Deployment 'd-6C9HNEPDA' (Service: AmazonCodeDeploy;
Status Code: 400; Error Code: DeploymentLimitExceededException;
Request ID: 5d4984d5-29fa-4681-97e4-acfa54b55e2b; Proxy: null)
How can I solve it?
Go to CodeDeploy. Then, check "Deployment group deployment history" at Application.
One deployment group is already running so you got the error because you cannot run multiple deployment groups at the same time.
You can stop it or wait for it to finish. Then, you can run another deployment group.

AWS Delete ECS Service with Bamboo - The service cannot be stopped

I am trying to stop a service on AWS with the Bamboo ECS Service Delete task. However, I got the following error:
Deleting service 'my-service' on cluster 'my-cluster':
Service request rejected by AWS!
com.amazonaws.services.ecs.model.InvalidParameterException: The service cannot be stopped while it is scaled above 0. (Service: AmazonECS; Status Code: 400; Error Code: InvalidParameterException; Request ID: 03dab8da-xyz)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1695)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1350)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1101)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:758)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:732)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:714)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:674)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:656)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:520)
at com.amazonaws.services.ecs.AmazonECSClient.doInvoke(AmazonECSClient.java:3289)
at com.amazonaws.services.ecs.AmazonECSClient.invoke(AmazonECSClient.java:3256)
at com.amazonaws.services.ecs.AmazonECSClient.invoke(AmazonECSClient.java:3245)
at com.amazonaws.services.ecs.AmazonECSClient.executeDeleteService(AmazonECSClient.java:859)
at com.amazonaws.services.ecs.AmazonECSClient.deleteService(AmazonECSClient.java:831)
at net.utoolity.atlassian.bamboo.taws.ECSServiceTask.executeDelete(ECSServiceTask.java:344)
at net.utoolity.atlassian.bamboo.taws.ECSServiceTask.execute(ECSServiceTask.java:141)
at net.utoolity.atlassian.bamboo.taws.AWSTask.execute(AWSTask.java:164)
at com.atlassian.bamboo.task.TaskExecutorImpl.lambda$executeTasks$3(TaskExecutorImpl.java:319)
at com.atlassian.bamboo.task.TaskExecutorImpl.executeTaskWithPrePostActions(TaskExecutorImpl.java:252)
at com.atlassian.bamboo.task.TaskExecutorImpl.executeTasks(TaskExecutorImpl.java:319)
at com.atlassian.bamboo.task.TaskExecutorImpl.execute(TaskExecutorImpl.java:112)
at com.atlassian.bamboo.build.pipeline.tasks.ExecuteBuildTask.call(ExecuteBuildTask.java:73)
at com.atlassian.bamboo.v2.build.agent.DefaultBuildAgent.executeBuildPhase(DefaultBuildAgent.java:203)
at com.atlassian.bamboo.v2.build.agent.DefaultBuildAgent.build(DefaultBuildAgent.java:175)
at com.atlassian.bamboo.v2.build.agent.BuildAgentControllerImpl.lambda$waitAndPerformBuild$0(BuildAgentControllerImpl.java:129)
at com.atlassian.bamboo.variable.CustomVariableContextImpl.withVariableSubstitutor(CustomVariableContextImpl.java:185)
at com.atlassian.bamboo.v2.build.agent.BuildAgentControllerImpl.waitAndPerformBuild(BuildAgentControllerImpl.java:123)
at com.atlassian.bamboo.v2.build.agent.DefaultBuildAgent$1.run(DefaultBuildAgent.java:126)
at com.atlassian.bamboo.utils.BambooRunnables$1.run(BambooRunnables.java:48)
at com.atlassian.bamboo.security.ImpersonationHelper.runWith(ImpersonationHelper.java:26)
at com.atlassian.bamboo.security.ImpersonationHelper.runWithSystemAuthority(ImpersonationHelper.java:17)
at com.atlassian.bamboo.security.ImpersonationHelper$1.run(ImpersonationHelper.java:41)
at java.lang.Thread.run(Thread.java:745)
Finished task 'delete ecs service' with result: Error
I assume that the reason is that there is already a task still running in this ECS service. However, when I am using the aws-cli command, then the service is deleted without any problems:
aws ecs delete-service --cluster my-cluster --service my-service --force
Maybe there is no force option in the Bamboo task. Any Ideas?
The solution was to update the ECS Service first and set the desired count to 0 and then to delete the service.

ECS logs: Fargate vs EC2

When I usually run a task in ECS using Fargate, the STDOUT is redirected automatically to cloudwatch and this application logs can be found without any complication.
To clarify, for example, in C#:
Console.WriteLine("log to write to CloudWatch")
That output is automatically redircted to CloudWatch logs when I use ECS with Fargate or Lambda functions
I would like to do the same using EC2.
The first impression using ECS with EC2 is that this is not as automatic as Fargate. Am I right?
Looking for a information I have found the following (apart of other older question or post):
In this question refers to an old post from the AWS blog, so
this could be obsolete.
In this AWS page, they describe a few steps where you need to
install some utilities to your EC2
So, summarizing, is there any way to see the STDOUT in cloudwatch when I use ECS with EC2 in the same way Fargate does?
So, summarizing, is there any way to see the STDOUT in cloudwatch when I use ECS with EC2 in the same way Fargate does?
If you mean EC2 logging as easily as Fargate does without any complex configuration, then no. You need to provide some configuration and utilities to your EC2 to allow logging to CloudWatch. As any EC2 instance we launch, ECS instances are just a virtual machine with some operational system with a default configuration, in this case, is Amazon ECS-optimized AMIs. Other services and configurations we should provide by ourself.
Besides the link above you provided, I found this CloudFormation template which configures EC2 Spot Fleet to log to CloudWatch in the same way your second link describes.
I don't think your correct. The StdOut logs from the ECS task launch are just as easily written and accessed running under EC2 as Fargate.
You just have this in your task definition which, as far as I can tell, is the same as in Fargate:
"containerDefinitions": [
{
"dnsSearchDomains": null,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "my-log-family",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "my-stream-name"
}
}
...
After it launches, you should see your logs under my-log-family
If you are trying to put application logs in CloudWatch, that's another matter... this is typically done using the CloudWatch logs agent which you'd have to install into the container, but the above will capture the StdOut.
This is how I did it.
Using the nugget: AWS.Logger.AspNetCore
An example of use:
static async Task Main(string[] args)
{
Logger loggerObj = new Logger();
ILogger<Program> logger = await loggerObj.CreateLogger("test", "eu-west-1");
logger.LogInformation("test info");
logger.LogError("test error");
}
public async Task<ILogger<Program>> CreateLogger(string logGroup, string region)
{
AWS.Logger.AWSLoggerConfig config = new AWS.Logger.AWSLoggerConfig();
config.Region = region;
config.LogStreamNameSuffix = "";
config.LogGroup = logGroup;
LoggerFactory logFactory = new LoggerFactory();
logFactory.AddAWSProvider(config);
return logFactory.CreateLogger<Program>();
}