Useless Amazon ECS Error Message when creating tasks - amazon-web-services

Using the ecs agent container on an Ubuntu instance, I am able to register the agent with my cluster.
I also have a service created in that cluster and task definitions as well. When I try to add a task to the cluster I get the useless error message:
Run tasks failed
Reasons : ["ATTRIBUTE"]
The ecs agent log has no related error message. Any thoughts on how I can get better debugging or what the issue might be?
The cli also returns the same useless error message
{
"tasks": [],
"failures": [
{
"arn": "arn:aws:ecs:us-east-1:sssssss:container-instance/sssssssssssss",
"reason": "ATTRIBUTE"
}
]
}

From the troubleshooting guide:
ATTRIBUTE (container instance ID)
Your task definition contains a parameter that requires a specific container instance attribute that is not available on your container instances. For more information on which attributes are required for specific task definition parameters and agent configuration variables, see Task Definition Parameters and Amazon ECS Container Agent Configuration.
You can find the attributes required for your task definition by looking at the requiredAttributes field. You can find the attributes that are present for your container instances in the result of the DescribeContainerInstances API call.

The ECS console webpage does not provide enough information, but you can connect to the EC2 instance to retrieve more logs.
You can try by manually restart ecs agent daemon, ecs agent docker.
Sometimes, you need to manually delete the checkpoint file
A cheatsheet with location of logs, commands can be found at
ecs-agent troubleshoot

Related

How and why of awslogs on ECS (fargate)

I am struggling to get a task running using ECS Fargate, and launched (ecs.runTask) from an AWS SDK script (JS/Node).
My current struggle is to get logs from the containers so that I can trouble shoot why they are stopping. I can't seem to get the Task Definition right so that they will be generated.
logConfiguration: {
logDriver: 'awslogs',
options: {
"awslogs-region": 'us-west-2',
"awslogs-group": 'myTask',
"awslogs-stream-prefix": "myTask",
"awslogs-create-group": "true"
}
}
I have set the log driver for them to awslogs, but when I try to view the logs in CloudWatch, I get various kinds of nothing:
If I specify the awslogs-create-group as "true" (it requires a string, rather than a Boolean, which is strange; I assume case doesn't matter), I nevertheless find that the group is not created.
If I create the group manually, I find that the log stream is not created.
I suspect that there may be an error in my permissions, though of course there is no error messaging to confirm. The docs (here) indicate that I need to attach certain policies to ecsInstanceRole, which seems to be a placeholder for a role that is used somewhere in the process.
But I have attached such a policy to my ECS executionRole, to the role that executes my API call to runTask, and I have looked for any other role that might be involved (an actual "instanceRole" doesn't seem to exist in the Task Def), and nothing is improving my situation.
I'd be happy to supply more information, but at this point I'm not sure where my blind spot is.
Can anyone see it?
Go to your Task Definition. You should find a section called "Task execution IAM role". The description says -
This role is required by tasks to pull container images and publish container logs to Amazon CloudWatch.
The role you attach here needs a policy like AmazonECSTaskExecutionRolePolicy (AWS managed policy), and the Trusted Entity is ecs-tasks.amazonaws.com.
Also, the awslogs option awslogs-create-group is not needed, I think.

Amazon ECS Service configuration return exactly 1 result, but got > '0'

I am trying to update an ECS service with bamboo and get the following error:
Failed to fetch resource from AWS!
java.lang.RuntimeException: Expected DescribeServiceRequest for
service 'my-service' to return exactly 1 result, but got
'0' at
net.utoolity.atlassian.bamboo.taws.aws.ECS.getSingleService(ECS.java:674)
at
net.utoolity.atlassian.bamboo.taws.ECSServiceTask.executeUpdate(ECSServiceTask.java:311)
at
net.utoolity.atlassian.bamboo.taws.ECSServiceTask.execute(ECSServiceTask.java:133)
at
net.utoolity.atlassian.bamboo.taws.AWSTask.execute(AWSTask.java:164)
at
com.atlassian.bamboo.task.TaskExecutorImpl.lambda$executeTasks$3(TaskExecutorImpl.java:319)
at
com.atlassian.bamboo.task.TaskExecutorImpl.executeTaskWithPrePostActions(TaskExecutorImpl.java:252)
at
com.atlassian.bamboo.task.TaskExecutorImpl.executeTasks(TaskExecutorImpl.java:319)
at
com.atlassian.bamboo.task.TaskExecutorImpl.execute(TaskExecutorImpl.java:112)
at
com.atlassian.bamboo.build.pipeline.tasks.ExecuteBuildTask.call(ExecuteBuildTask.java:73)
at
com.atlassian.bamboo.v2.build.agent.DefaultBuildAgent.executeBuildPhase(DefaultBuildAgent.java:203)
at
com.atlassian.bamboo.v2.build.agent.DefaultBuildAgent.build(DefaultBuildAgent.java:175)
at
com.atlassian.bamboo.v2.build.agent.BuildAgentControllerImpl.lambda$waitAndPerformBuild$0(BuildAgentControllerImpl.java:129)
at
com.atlassian.bamboo.variable.CustomVariableContextImpl.withVariableSubstitutor(CustomVariableContextImpl.java:185)
at
com.atlassian.bamboo.v2.build.agent.BuildAgentControllerImpl.waitAndPerformBuild(BuildAgentControllerImpl.java:123)
at
com.atlassian.bamboo.v2.build.agent.DefaultBuildAgent$1.run(DefaultBuildAgent.java:126)
at
com.atlassian.bamboo.utils.BambooRunnables$1.run(BambooRunnables.java:48)
at
com.atlassian.bamboo.security.ImpersonationHelper.runWith(ImpersonationHelper.java:26)
at
com.atlassian.bamboo.security.ImpersonationHelper.runWithSystemAuthority(ImpersonationHelper.java:17)
at
com.atlassian.bamboo.security.ImpersonationHelper$1.run(ImpersonationHelper.java:41)
at java.lang.Thread.run(Thread.java:745)
I am using the Force new deployment setting.
Any ideas what is the issue?
We have not been able to identify an bug in our code base right away, here's what's seemingly happening:
In order to append progress messages to the Bamboo build log, we need to call the DescribeServices API action before the call to the actual UpdateService API action, and the exception is thrown if and only if the targeted service cannot be found.
So at first glance there may be a subtle configuration issue, which happens to me every now and then when using Bamboo variables to reference resources from a preceding task, where it is easy to accidentally copy and paste the wrong variable name for example.
An incorrect reference in any of the following parameters of the Amazon ECS Service task's Update Service action would yield the resp. task action to fail with the error message at hand, because the DescribeServices API call in itself would succeed, yet fail to identify the target service:
Connector
Region
Service Name
For example, I've just reproduced the problem by using a non existing service name:
24-Oct-2019 17:37:05 Starting task 'Update sample ECS service (w/ ELB) - 2 instances' of type 'net.utoolity.atlassian.bamboo.tasks-for-aws:aws.ecs.service'
24-Oct-2019 17:37:05 Setting maxErrorRetry=7 and awaitTransitionInterval=15000
24-Oct-2019 17:37:05 Using session credentials provided by Identity Federation for AWS app (connector variable: 6f6fc85d-4ea5-43ce-8e70-25aba33a5fda).
24-Oct-2019 17:37:05 Selecting region eu-west-1
24-Oct-2019 17:37:05 Updating service 'NOT-A-SERVICE' on cluster 'TAWS-IT270-100-ubot':
24-Oct-2019 17:37:06 Failed to fetch resource from AWS!
24-Oct-2019 17:37:06 java.lang.RuntimeException: Expected DescribeServiceRequest for service 'NOT-A-SERVICE' to return exactly 1 result, but got '0'
...
Granted, the error message is not exactly helpful here, and we need to think about how to better handle this log pattern across our various tasks - the actual UpdateServiceAPI action would yield the much more appropriate ServiceNotFoundException exception in this scenario.
So assuming 'my-service' has been up and running before calling the 'Update Service' task action, can you please check whether the log from your failing Bamboo build may indicate this particular problem, for example by targeting another region by chance?
I could solve the issue by using a Shell Script Task and wrote a aws-cli command after exporting the keys. This workaround solved the issue:
aws ecs update-service --cluster my-cluster --service my-service --task-definition my-task-definition
So the AWS ECS is working fine and it should be a bug or misconfiguration in the Bamboo module.
But as mentioned in the other answer, the best approach would be to check if the configuration is correct.

Where to store AWS credentials in ECS service

I have an ECS service, which requires AWS credentials. I use ECR to store docker images and jenkins visible only for VPN connections to build images.
I see 2 possibilities to provide AWS credentials to the service
Store them as Jenkins secret and insert into the docker image during build
Make them a part of the environment when creating ECS Task definition
What is more secure? Are there other possibilities?
First thing, You should not use AWS credentials while working inside AWS, you should assign the role to Task definition or services instead of passing the credentials to docker build or task definition.
With IAM roles for Amazon ECS tasks, you can specify an IAM role that
can be used by the containers in a task. Applications must sign their
AWS API requests with AWS credentials, and this feature provides a
strategy for managing credentials for your applications to use,
similar to the way that Amazon EC2 instance profiles provide
credentials to EC2 instances
So sometimes the underlying application is not designed in a way that can use role so in this I will recommend storing ENV in the task definition but again from where to get the value of ENV?
Task definition support two methods to deal with ENV,
Plain text as direct value
 Use ‘valueFrom’ attribute for ECS task definition
The following is a snippet of a task definition showing the format when referencing an Systems Manager Parameter Store parameter.
{
"containerDefinitions": [{
"secrets": [{
"name": "environment_variable_name",
"valueFrom": "arn:aws:ssm:region:aws_account_id:parameter/parameter_name"
}]
}]
}
This is the most secure and recommended method by AWS documentation so this is the better way as compared to ENV in plain text inside Task definition or ENV in Dockerfile.
You can read more here and systems-manager-parameter-store.
But to use these you will must provide permission to task definition to access systems-manager-parameter-store.

ECS logs: Fargate vs EC2

When I usually run a task in ECS using Fargate, the STDOUT is redirected automatically to cloudwatch and this application logs can be found without any complication.
To clarify, for example, in C#:
Console.WriteLine("log to write to CloudWatch")
That output is automatically redircted to CloudWatch logs when I use ECS with Fargate or Lambda functions
I would like to do the same using EC2.
The first impression using ECS with EC2 is that this is not as automatic as Fargate. Am I right?
Looking for a information I have found the following (apart of other older question or post):
In this question refers to an old post from the AWS blog, so
this could be obsolete.
In this AWS page, they describe a few steps where you need to
install some utilities to your EC2
So, summarizing, is there any way to see the STDOUT in cloudwatch when I use ECS with EC2 in the same way Fargate does?
So, summarizing, is there any way to see the STDOUT in cloudwatch when I use ECS with EC2 in the same way Fargate does?
If you mean EC2 logging as easily as Fargate does without any complex configuration, then no. You need to provide some configuration and utilities to your EC2 to allow logging to CloudWatch. As any EC2 instance we launch, ECS instances are just a virtual machine with some operational system with a default configuration, in this case, is Amazon ECS-optimized AMIs. Other services and configurations we should provide by ourself.
Besides the link above you provided, I found this CloudFormation template which configures EC2 Spot Fleet to log to CloudWatch in the same way your second link describes.
I don't think your correct. The StdOut logs from the ECS task launch are just as easily written and accessed running under EC2 as Fargate.
You just have this in your task definition which, as far as I can tell, is the same as in Fargate:
"containerDefinitions": [
{
"dnsSearchDomains": null,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "my-log-family",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "my-stream-name"
}
}
...
After it launches, you should see your logs under my-log-family
If you are trying to put application logs in CloudWatch, that's another matter... this is typically done using the CloudWatch logs agent which you'd have to install into the container, but the above will capture the StdOut.
This is how I did it.
Using the nugget: AWS.Logger.AspNetCore
An example of use:
static async Task Main(string[] args)
{
Logger loggerObj = new Logger();
ILogger<Program> logger = await loggerObj.CreateLogger("test", "eu-west-1");
logger.LogInformation("test info");
logger.LogError("test error");
}
public async Task<ILogger<Program>> CreateLogger(string logGroup, string region)
{
AWS.Logger.AWSLoggerConfig config = new AWS.Logger.AWSLoggerConfig();
config.Region = region;
config.LogStreamNameSuffix = "";
config.LogGroup = logGroup;
LoggerFactory logFactory = new LoggerFactory();
logFactory.AddAWSProvider(config);
return logFactory.CreateLogger<Program>();
}

Elastic BeanStalk MultiContainer docker fails

I want to deploy an multi-container application in elastic beanstalk. I get the following error.
Error 1: The EC2 instances failed to communicate with AWS Elastic
Beanstalk, either because of configuration problems with the VPC or a
failed EC2 instance. Check your VPC configuration and try launching
the environment again.
I have set up the VPC with just the public subnet and the security group that allows all traffic both inbound and outbound. I know this is not encouraged for production level deployment, but I have reduced the complexity to find the cause of the error.
So, the load balancer and the EC2 instance are inside the same public subnet that is attached with the internet gateway. They both share the same security group allowing all the traffic.
Before the above error, I also get another error stating
Error 2: No ecs task definition (or empty definition file) found in environment
Having said, I have bundled my Dockerrun.aws.json file with .ebextensions folder inside the source bundle which the beanstalk uses for deployment.
After all these errors, drilling down to two questions:
I cannot understand why No ecs task error appears, when I have packaged my dockerrun.aws.json file containing containerDefinitions?
Since there is no ecs task running, there is nothing running in the instance. Is this why beanstalk and ELB cannot communicate to the instance? (Assuming my public subnet and all traffic security group is not a problem)
The problem was the VPC. Even I had the simple VPC with just an public subnet, the beanstalk cannot talk to the instance and so cannot deploy the ECS task definition and docker containers in the instance.
By creating two subnets namely public and private and having an NAT instance in public subnet, which becomes the router for the instances in the private subnet. Making the above setup worked for me and I could deploy the ECS task definition successfully to the EC2 instance in the private subnet.
I found this question because I got the same error. Here are the steps that worked for me to actually deploy a multi-container app on Beanstalk:
To get past this particular error, I used the eb CLI tools. For some reason, using eb deploy instead of zipping and uploading myself fixed this. It didn't actually work, but it gave me a new error.
So, I changed my Dockerrun.aws.json, a file format that needs WAY more documentation, until I stopped getting errors about that.
Then, I got an even better error!
ERROR: [Instance: i-0*********0bb37cf] Command failed on instance.
Return code: 1 Output: (TRUNCATED)..._api_call
raise ClientError(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when
calling the GetObject operation: Access Denied
Failed to download authentication credentials [config file name] from [bucket name].
Hook /opt/elasticbeanstalk/hooks/appdeploy/enact/02update-
credentials.sh failed. For more detail, check /var/log/eb-activity.log
using console or EB CLI.
Per this part of the docs the way to solve this is to
Open the Roles page in the IAM console.
Choose aws-elasticbeanstalk-ec2-role.
On the Permissions tab, under Managed Policies, choose Attach Policy.
Select the managed policy for the additional services that your application uses. For example, AmazonS3FullAccess or AmazonDynamoDBFullAccess. (For our problem, the S3 one)
Choose Attach Policies.
This part got really exciting, because I got yet another error: Authentication credentials are not in JSON format as expected. Please generate the credentials using 'docker login'. (Keep in mind, I tried to follow the instructions on how to do this to the letter, but, oh well). Turns out this one was on me, I had malformed JSON in my DockerHub auth file stored on S3. I renamed the file to dockercfg.json to get syntax checking, and it seems the Beanstalk/ECS is okay with having the .json as part of the name, because this time... there was a different error: CannotPullContainerError: Error: image [DockerHub organization]/[repo name]:latest not found). Hmm, maybe there was a typo? Let's check:
$ docker run -it [DockerHub organization]/[repo name]:latest
Unable to find image '[DockerHub organization]/[repo name]:latest' locally
latest: Pulling from [DockerHub organization]/[repo name]
Ok, the repo is there. So... my auth is bad? Yup, turns out I followed an example in the DockerHub auth docs that was of what you shouldn't do. Your dockercfg.json should look like
{
"https://index.docker.io/v1/": {
"auth": "ZWpMQ=Vyd5zOmFsluMTkycN0ZGYmbn=WV2FtaGF2",
"email": "your#email.com"
}
}
There were a few more errors (volume sourcePath has to be a absolute path! That's what the invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed message means), but it eventually deployed. Sorry for the novel; hoping it helps someone.