Streaming Cloudwatch Logs to Amazon ES - amazon-web-services

I'm using Fargate to deploy my application. To log the container logs, I'm using awslogs as the log-driver. Now I want to ship my logs to Amazon ES service. While going through the docs for shipping, I encountered a note that mentions
Streaming large amounts of CloudWatch Logs data to other
destinations might result in high usage charges.
I want to understand what all will I be billed for while shipping the logs to ELK? How do they define large amounts?
Will I be billed for
a) Cloudwatch?
b) Log driver?
c) Lambda function? Does every log-line triggers a lambda function?
Lastly, is there still a possibility to lower the cost more?

Personally I would look running fluent or fluentbit in another container along side your application https://docs.fluentbit.io/manual/pipeline/outputs/elasticsearch
You can send your logs direct to ES then without any cloudwatch costs.
EDIT
Here's the final solution, just in case someone is looking for a cheaper solution.
Run Fluentd/Fuentbit in another container alongside your application
Using the Github Config, I was able to forward the logs to ES with the below config.
{
"family": "workflow",
"cpu": "256",
"memory": "512",
"containerDefinitions": [
{
"name": "log_router",
"image": "docker.io/amazon/aws-for-fluent-bit:latest",
"essential": true,
"firelensConfiguration": {
"type": "fluentbit",
"options":{
"enable-ecs-log-metadata":"true"
}
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "your_log_group",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"memoryReservation": 50
},
{
"name": "ContainerName",
"image": "YourImage",
"cpu": 0,
"memoryReservation": 128,
"portMappings": [
{
"containerPort": 5005,
"protocol": "tcp"
}
],
"essential": true,
"command": [
"YOUR COMMAND"
],
"environment": [],
"logConfiguration": {
"logDriver": "awsfirelens",
"secretOptions": [],
"options": {
"Name": "es",
"Host": "YOUR_ES_DOMAIN_URL",
"Port": "443",
"tls": "On",
"Index": "INDEX_NAME",
"Type": "TYPE"
}
},
"resourceRequirements": []
}
]
}
The log_router container collects the logs and ships it to ES. For more info, refer Custom Log Routing
Please note that the log_router container is required in the case of Fargate, but not with ECS.
This is the cheapest solution I know which does not involves Cloudwatch, Lamdas, Kinesis.

Like every resource, AWS charges for use and for maintenance. therefore, the charges will be for the execution of the lambda function and Storing the data in CloudWatch. the reason they mentioned that: Streaming large amounts of CloudWatch Logs data to other destinations might result in high usage charges. Is because it takes time for the lambda function to process the log and insert it into ES, When you try to stream a large number of logs the lambda function will be executed for a longer time.
Lambda function? Does every log-line triggers a lambda function?
Yes, when enabling the streaming from CloudWatch to ES every log inserted to CloudWatch triggers the lambda function.
Image from demonstration (see the trigger):
Is there still a possibility to lower the cost more?
The only way to lower the cost (when using this implementation) is to write your own lambda function which will be triggered every X seconds\minutes and insert to log to ES.
As much as I can tell the cost gap will be Meaningless.
More information:
Lambda code .
How this is working behind the scenes .

Related

AWS Batch Job stuck RUNNABLE when Launch template is configured

I have configured Step Function with AWS Batch Jobs. All configuration working well but I need to customize starting instance. For this purpose I use Launch Template service and build simple (empty) configuration based on instance type used in AWS Batch configuration. When Compute Environment is build with Launch Template, Batch Job is stuck on RUNNABLE stage. When I run AWS Batch Job without Launch Template everything works OK. Lunching instance form template also works OK. Could anyone give me any advice what is wrong or missing? Below are definitions of whole stack elements.
Launch Template definition
Compute environment details Overview
Compute environment name senet-cluster-r5ad-2xlarge-v3-4
Compute environment ARN arn:aws:batch:eu-central-1:xxxxxxxxxxx:compute-environment/senet-cluster-r5ad-2xlarge-v3-4
ECS Cluster name arn:aws:ecs:eu-central-1:xxxxxxxxxxxx:cluster/senet-cluster-r5ad-2xlarge-v3-4_Batch_3323aafe-d7a4-3cfe-91e5-c1079ee9d02e
Type MANAGED
Status VALID
State ENABLED
Service role arn:aws:iam::xxxxxxxxxxx:role/service-role/AWSBatchServiceRole
Compute resources
Minimum vCPUs 0
Desired vCPUs 0
Maximum vCPUs 25
Instance types r5ad.2xlarge
Allocation strategy BEST_FIT
Launch template lt-023ebdcd5df6073df
Launch template version $Default
Instance rolearn:aws:iam::xxxxxxxxxxx:instance-profile/ecsInstanceRole
Spot fleet role
EC2 Keypair senet-test-keys
AMI id ami-0b418580298265d5c
vpcId vpc-0917ea63
Subnets subnet-49332034, subnet-8902a7e3, subnet-9de503d1
Security groups sg-cdbbd9af, sg-047ea19daf36aa269
AWS Batch Job Definition
{
"jobDefinitionName": "senet-cluster-job-def-3",
"jobDefinitionArn": "arn:aws:batch:eu-central-1:xxxxxxxxxxxxxx:job-definition/senet-cluster-job-def-3:9",
"revision": 9,
"status": "ACTIVE",
"type": "container",
"parameters": {},
"containerProperties": {
"image": "xxxxxxxxxxx.dkr.ecr.eu-central-1.amazonaws.com/senet/batch-process:latest",
"vcpus": 4,
"memory": 60000,
"command": [],
"jobRoleArn": "arn:aws:iam::xxxxxxxxxxxxx:role/AWSS3BatchFullAccess-senet",
"volumes": [],
"environment": [
{
"name": "BATCH_FILE_S3_URL",
"value": "s3://senet-batch/senet_jobs.sh"
},
{
"name": "AWS_DEFAULT_REGION",
"value": "eu-central-1"
},
{
"name": "BATCH_FILE_TYPE",
"value": "script"
}
],
"mountPoints": [],
"ulimits": [],
"user": "root",
"resourceRequirements": [],
"linuxParameters": {
"devices": []
}
}
}
For those of you who had the same problem. Here are the solution works for me. it took me a few days to figure it out.
The default AWS AMI snapshots need at least 30G of storage. When you do not have the launch template, the cloudformation will use the correct storage size.
In my case, I defined only 8G of storage in my launch template. And when the launch template is used, the jobs are stuck in runnable.
Simply change the storage in your launch template to anything bigger than 30G. It shall work.
Also, do not forget IamInstanceProfile and SecurityGroupIds are required in the launch template for the job to get started.

After updating Fargate TaskDefinition, CloudWatch events that trigger tasks fail because of inactive task definitions

I have a series of tasks defined in ECS that run on a recurring schedule. I recently made a minor change to update my task definition in Terraform to change default environment variables for my container (from DEBUG to PRODUCTION):
"environment": [
{"name": "ENVIRONMENT", "value": "PRODUCTION"}
]
I had this task running using the Scheduled Tasks feature of Fargate, setting it at a rate of every 4 hours. However, after updating my task definition, I began to see that the tasks were not being triggered by CloudWatch, since my last container log was from several days ago.
I dug deeper into the issue using CloudTrail, and noticed one particular part of the entry for a RunTask event:
"eventTime": "2018-12-10T17:26:46Z",
"eventSource": "ecs.amazonaws.com",
"eventName": "RunTask",
"awsRegion": "us-east-1",
"sourceIPAddress": "events.amazonaws.com",
"userAgent": "events.amazonaws.com",
"errorCode": "InvalidParameterException",
"errorMessage": "TaskDefinition is inactive",
Further down in the log, I noticed that the task definition ECS was attempting to run was
"taskDefinition": "arn:aws:ecs:us-east-1:XXXXX:task-
definition/important-task-name:2",
However, in my ECS task definitions, the latest version of important-task-name was 3. So it looks like the events are not triggering because I am using an "inactive" version of my task definition.
Is there any way for me to schedule tasks in AWS Fargate without having to manually go through the console and stop/restart/update each cluster's scheduled update? Isn't there any way to simply ask CloudWatch to pull the latest active task definition?
You can use CloudWatch Event Rules to control scheduled tasks and whenever you update a task definition you can also update your rule. Say you have two files:
myRule.json
{
"Name": "run-every-minute",
"ScheduleExpression": "cron(0/1 * * * ? *)",
"State": "ENABLED",
"Description": "a task that will run every minute",
"RoleArn": "arn:aws:iam::${IAM_NUMBER}:role/ecsEventsRole",
"EventBusName": "default"
}
myTargets.json
{
"Rule": "run-every-minute",
"Targets": [
{
"Id": "scheduled-task-example",
"Arn": "arn:aws:ecs:${REGION}:${IAM_NUMBER}:cluster/mycluster",
"RoleArn": "arn:aws:iam::${IAM_NUMBER}:role/ecsEventsRole",
"Input": "{\"containerOverrides\":[{\"name\":\"myTask\",\"environment\":[{\"name\":\"ENVIRONMENT\",\"value\":\"production\"},{\"name\":\"foo\",\"value\":\"bar\"}]}]}",
"EcsParameters": {
"TaskDefinitionArn": "arn:aws:ecs:${REGION}:${IAM_NUMBER}:task-definition/myTaskDefinition",
"TaskCount": 1,
"LaunchType": "FARGATE",
"NetworkConfiguration": {
"awsvpcConfiguration": {
"Subnets": [
"subnet-xyz1",
"subnet-xyz2",
],
"SecurityGroups": [
"sg-xyz"
],
"AssignPublicIp": "ENABLED"
}
},
"PlatformVersion": "LATEST"
}
}
]
}
Now, whenever there's a new revision of myTaskDefinition you may update your rule, e.g.:
aws events put-rule --cli-input-json file://myRule.json --region $REGION
aws events put-targets --cli-input-json file://myTargets.json --region $REGION
echo 'done'
But of course, replace IAM_NUMBER and REGION with your credentials,
Cloud Map seems like a solution for these types of problems.
https://aws.amazon.com/about-aws/whats-new/2018/11/aws-fargate-and-amazon-ecs-now-integrate-with-aws-cloud-map/

ECS task_definition environment variable needs IP address

So I have two container definitions for a service that I am trying to run on ECS. For one of the services (Kafka), it requires the IP Address of the other service (Zookeeper). In the pure docker world we can achieve this using the name of the container, however in AWS the container name is appended by AWS to create a unique name, so how do we achieve the same behaviour?
Currently my Terraform task definitions look like:
[
{
"name": "${service_name}",
"image": "zookeeper:latest",
"cpu": 1024,
"memory": 1024,
"essential": true,
"portMappings": [
{ "containerPort": ${container_port}, "protocol": "tcp" }
],
"networkMode": "awsvpc"
},
{
"name": "kafka",
"image": "ches/kafka:latest",
"environment": [
{ "name": "ZOOKEEPER_IP", "value": "${service_name}" }
],
"cpu": 1024,
"memory": 1024,
"essential": true,
"networkMode": "awsvpc"
}
]
I don't know enough about the rest of the setup to give really concrete advice, but there's a few options:
Put both containers in the same task, and use links between them
Use route53 auto naming to get DNS names for each service task, specify those in the task definition environment, also described as ecs service discovery
Put the service tasks behind a load balancer, and use DNS names from route53 and possibly host matching on the load balancer, specify the DNS names in the task definition environment
Consider using some kind of service discovery / service mesh framework (Consul, for instance)
There are posts describing some of the alternatives. Here's one:
How to setup service discovery in Amazon ECS

AWS ECS Service for Wordpress

I created a service for wordpress on AWS ECS with the following container definitions
{
"containerDefinitions": [
{
"name": "wordpress",
"links": [
"mysql"
],
"image": "wordpress",
"essential": true,
"portMappings": [
{
"containerPort": 0,
"hostPort": 80
}
],
"memory": 250,
"cpu": 10
},
{
"environment": [
{
"name": "MYSQL_ROOT_PASSWORD",
"value": "password"
}
],
"name": "mysql",
"image": "mysql",
"cpu": 10,
"memory": 250,
"essential": true
}
],
"family": "wordpress"
}
Then went over to the public IP and completed the Wordpress installation. I also added a few posts.
But now, when I update the service to use a an updated task definition (Updated mysql container image)
"image": "mysql:latest"
I loose all the posts created and data and Wordpress prompts me to install again.
What am i doing wrong?
I also tried to use host volumes but to no vail - creates a bind mount and a docker managed volume (Did a docker inspect on container).
So, every time I update the task it resets Wordpress.
If your container needs access to the original data each time it
starts, you require a file system that your containers can connect to
regardless of which instance they’re running on. That’s where EFS
comes in.
EFS allows you to persist data onto a durable shared file system that
all of the ECS container instances in the ECS cluster can use.
Step-by-step Instructions to Setup an AWS ECS Cluster
Using Data Volumes in Tasks
Using Amazon EFS to Persist Data from Amazon ECS Containers

Where are the volumes located when using ECS and Fargate?

I have the following setup (I've stripped out the non-important fields):
{
"ECSTask": {
"Type": "AWS::ECS::TaskDefinition",
"Properties": {
"ContainerDefinitions": [
{
"Name": "mysql",
"Image": "mysql",
"MountPoints": [{"SourceVolume": "mysql", "ContainerPath": "/var/lib/mysql"}]
}
],
"RequiresCompatibilities": ["FARGATE"],
"Volumes": [{"Name": "mysql"}]
}
}
}
It seems to work (the container does start properly), but I'm not quite sure where exactly is this volume being saved. I assumed it would be an EBS volume, but I don't see it there. I guess it's internal to my task - but in that case - how do I access it? How can I control its limits (min/max size etc)? How can I create a backup for this volume?
Thanks.
Fargate does not support persistent volumes. Any volumes created attached to fargate tasks are ephemeral and cannot be initialized from an external source or backed up, sadly.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_data_volumes.html