ECS fargate with consul-agent and proxy as sidecar container - amazon-web-services

I am trying to implement consul-agent and proxy as sidecar container inside my ECS fargate service. So, inside the task, there will be 3 containers running:
core-business-service-container
consul-agent-container
core-business-consul-proxy-container
All containers are up and running on ECS task. The node has registered in the consul UI as well. But, the service is not up in the consul UI.
I dig in and found the log of 'consul-agent-container', Here is the error log:
2021/03/12 03:33:14 [ERR] http: Request PUT
/v1/agent/check/pass/greeting-fargate-proxy-ttl?note=, error: CheckID
"greeting-fargate-proxy-ttl" does not have associated TTL
from=127.0.0.1:43252
Here are the commands I used to connect consul.
consul-agent-container:
"exec consul agent -ui -data-dir /consul/data -client="127.0.0.1"
-bind="{{ GetPrivateIP }}" -retry-join "172.31.79.139""
core-business-consul-proxy-container:
"exec consul connect proxy -register -service greeting-fargate
-http-addr 127.0.0.1:8500 -listen 127.0.0.1:8080 -service-addr 127.0.0.1:3000"

HashiCorp recently announced support for running Consul service mesh on ECS using Terraform to deploy the agent and sidecar components. You might want to consider this as an alternative solution to your existing workflow.
This solution is currently in tech preview. You can find more information in the blog post https://www.hashicorp.com/blog/announcing-consul-service-mesh-for-amazon-ecs.

I did not manage to get my proxy to work using the same method as you were using. But I remember reading somewhere that you should declare your Connect proxy inside the service registration config
{
"service": {
"name": "web",
"port": 8080,
"connect": { "sidecar_service": {} }
}
}
After you have done that I think you could just launch your proxy using:
consul connect proxy -sidecar-for <service-id>
I did not verify this because the application I was using used Spring Cloud Consul to register the services and I did not find where to register a proxy, but maybe this can help you further.

Related

aws ecs with capacityProviderStrategy | can deploymentController be updated once ecs service is created?

i created the ecs service with deployment controller type as ECS with capacityProviderStrategy and then wanted to modify it to CODE_DEPLOY. Is this restricted for some reason ?
i do not see an option in the UI with modify service
Tried the command \n aws ecs update-service --cluster=my-cluster --service=my-service --region=us-east-2 --deploymentController=CODE_DEPLOY
i get the error below
To see help text, you can run:
aws help
aws help
aws help
Unknown options: --deployment_controller={type=CODE_DEPLOY}
I tried my luck with a variation of --deployment_controller=CODE_DEPLOY however options itself is unknown to cli
Update:
During create service I noticed that CODE_DEPLOY controller is disabled when i choose capacityProviderStrategy. It looks like a limitation by design. Is it true ?
ref: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service_definition_parameters.html
Also from aws console, it looks like deployment controller is not editable (service recreate needed) https://i.stack.imgur.com/mRMv3.png

AWS CLI ecs run-task CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref

I'm trying to move from the Console to the CLI.
I have an ECS Cluster and a Task Definition. From the console, I can run a task WITHOUT any issue. The task comes green and I can use the public IP to access my service.
Now, I'd like to do the same but instead of creating the task using the Console, I'd like to use AWS cli.
I thought this was enough:
aws ecs run-task --cluster my-cluster \
--task-definition ecs-task-def:9 \
--launch-type FARGATE \
--network-configuration '{ "awsvpcConfiguration": { "subnets": ["subnet-XX1","subnet-XX2"], "securityGroups": ["sg-XXX"],"assignPublicIp": "ENABLED" }}'
However, the task gets stuck in PENDING state and after a while is STOPPED with the following error message:
CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "docker.io/username/container:latest": failed to do request: Head https://registry-1.docker.io/v2/username/container/manifests/latest: dial tcp x.x.x.x:443: i/o timeout
What concerns me is that I can run tasks from the Console using the same arguments (VPC, Subnets, Sec Group, etc) but I cannot make it work using the CLI.
If the issue was missing/wrong rules both Console and CLI should not work.
Anyone knows why?
Look like ECS cannot pull image from registry
CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "docker.io/username/container:latest": failed to do request: Head https://registry-1.docker.io/v2/username/container/manifests/latest: dial tcp x.x.x.x:443: i/o timeout
suggested that network through 443 has been blocked!? hence cannot pull image. Have you tried allow all traffic inbound & outbound on attached sg as well as check network connectivity from within attached subnet?
You can create a simple Lambda function with similar associated subnets & security groups then executing telnet/curl to registry endpoint to check connectivity.
example:
def test_book():
http = urllib3.PoolManager()
url = 'https://your-endpoint-here'
headers = {
"Accept": "application/json"
}
r = http.request(method='GET', url=url, headers=headers)
print(f'response_status: {r.status}\nresonse_headers: {r.headers}\nresponse_data: {r.data}')

Where to store AWS credentials in ECS service

I have an ECS service, which requires AWS credentials. I use ECR to store docker images and jenkins visible only for VPN connections to build images.
I see 2 possibilities to provide AWS credentials to the service
Store them as Jenkins secret and insert into the docker image during build
Make them a part of the environment when creating ECS Task definition
What is more secure? Are there other possibilities?
First thing, You should not use AWS credentials while working inside AWS, you should assign the role to Task definition or services instead of passing the credentials to docker build or task definition.
With IAM roles for Amazon ECS tasks, you can specify an IAM role that
can be used by the containers in a task. Applications must sign their
AWS API requests with AWS credentials, and this feature provides a
strategy for managing credentials for your applications to use,
similar to the way that Amazon EC2 instance profiles provide
credentials to EC2 instances
So sometimes the underlying application is not designed in a way that can use role so in this I will recommend storing ENV in the task definition but again from where to get the value of ENV?
Task definition support two methods to deal with ENV,
Plain text as direct value
 Use ‘valueFrom’ attribute for ECS task definition
The following is a snippet of a task definition showing the format when referencing an Systems Manager Parameter Store parameter.
{
"containerDefinitions": [{
"secrets": [{
"name": "environment_variable_name",
"valueFrom": "arn:aws:ssm:region:aws_account_id:parameter/parameter_name"
}]
}]
}
This is the most secure and recommended method by AWS documentation so this is the better way as compared to ENV in plain text inside Task definition or ENV in Dockerfile.
You can read more here and systems-manager-parameter-store.
But to use these you will must provide permission to task definition to access systems-manager-parameter-store.

Register AWS ECS task in service discovery namespace (private hosted zone)

I'm quite bad at using AWS but I'm trying to automate the set up of an ECS cluster with private DNS names in route53, using the new service discovery mechanism. I am able to click my way through the AWS UI to accomplish a DNS entry showing up in a private hosted zone but I cannot figure out the JSON parameters to add to the json for the command below to accomplish the same thing.
aws ecs create-service --cli-input-json file://aws/createService.json
and below is the approximate contents of the createService.json
referenced above
"cluster": "clustername",
"serviceName": "servicename",
"taskDefinition": "taskname",
"desiredCount": 1,
// here is where I'm guessing there should be some DNS config referencing some
// namespace or similar that I cannot figure out...
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": [
"subnet-11111111"
],
"securityGroups": [
"sg-111111111"
],
"assignPublicIp": "DISABLED"
}
}
I'd be grateful for any ideas since my googling skills apparently aren't good enough for this problem as it seems. Many thanks!
To automatically have an ECS service register instances into a servicediscovery service you can use the serviceRegistries attribute. Add the following to the ECS service definition json:
{
...
"serviceRegistries": [
{
"registryArn": "arn:aws:servicediscovery:region:aws_account_id:service/srv-utcrh6wavdkggqtk"
}
]
}
The attribute contains a list of autodiscovery services that should be updated by ECS when it creates or destroys a task as part of the service. Each registry is referenced using the ARN of the autodiscovery service.
To get the Arn use the AWS cli command aws servicediscovery list-services
Strangely the documentation of the ECS service definition does not contain information about this attribute. However this tutorial about service discovery does.
As it turns out there is no support in ecs create service for adding it to the service registry, i.e. the route53 private hosted zone. Instead I had to use aws servicediscovery create-service and then servicediscovery register-instance to finally get an entry in my private hosted zone.
This became a quite complicated solution so I'll instead give Terraform a shot at it since I found they recently added support for ECS service discovery and see where that takes me...

Useless Amazon ECS Error Message when creating tasks

Using the ecs agent container on an Ubuntu instance, I am able to register the agent with my cluster.
I also have a service created in that cluster and task definitions as well. When I try to add a task to the cluster I get the useless error message:
Run tasks failed
Reasons : ["ATTRIBUTE"]
The ecs agent log has no related error message. Any thoughts on how I can get better debugging or what the issue might be?
The cli also returns the same useless error message
{
"tasks": [],
"failures": [
{
"arn": "arn:aws:ecs:us-east-1:sssssss:container-instance/sssssssssssss",
"reason": "ATTRIBUTE"
}
]
}
From the troubleshooting guide:
ATTRIBUTE (container instance ID)
Your task definition contains a parameter that requires a specific container instance attribute that is not available on your container instances. For more information on which attributes are required for specific task definition parameters and agent configuration variables, see Task Definition Parameters and Amazon ECS Container Agent Configuration.
You can find the attributes required for your task definition by looking at the requiredAttributes field. You can find the attributes that are present for your container instances in the result of the DescribeContainerInstances API call.
The ECS console webpage does not provide enough information, but you can connect to the EC2 instance to retrieve more logs.
You can try by manually restart ecs agent daemon, ecs agent docker.
Sometimes, you need to manually delete the checkpoint file
A cheatsheet with location of logs, commands can be found at
ecs-agent troubleshoot