Set custom log group for AWS Batch Job

Set custom log group for AWS Batch Job - amazon-web-services

I would like to programmatically determine, at runtime, the log-group where the logs from my Batch jobs are sent.
This page indicates that one can do so with a JSON object specifying:
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "awslogs-wordpress",
"awslogs-stream-prefix": "awslogs-example"
}
}
However, the syntax for submit_job seems to only allow string:string parameters to be sent to the job definition:
jobDefinition='string',
parameters={
'string': 'string'
},
How do I send the nested LogConfiguration options into my submitted job? My current command is:
response = client.submit_job(
jobDefinition=job_def,
jobName=options['file_suffix'],
jobQueue='HLM2',
arrayProperties={'size': job_array_size},
tags={'OutputBucket': options['output_bucket'],
'SubFolder': options['subfolder']},
containerOverrides={
'environment': [
{
'name': 'COMMAND_FILE',
'value': file_key
}
]
},
propagateTags=True)
If someone could provide a specific example of how to add the LogConfiguration override, I'd appreciate it.

Related

How to log library logs to AWS CloudWatch using Serilog?

Currently, I am working on ASP.NET 6 Web API and as a logger We use Serilog to log all the necessary details to cloudwatch and it's working fine. Now I need to add library logs such as AWS errors to cloudwatch. Currently there is an option for that in config file but it only saves logs as a file which results a No space left on device : '/app/Logs/serilog-aws-errors.txt' and the details in the file didn't appear on cloudwatch logs.
This is the appsettings data I use,
"Serilog": {
"Using": [ "AWS.Logger.SeriLog", "Serilog.Sinks.Console", "Serilog.Sinks.File" ],
"MinimumLevel": "Debug",
"WriteTo": [
{ "Name": "AWSSeriLog" },
{ "Name": "Console" },
{
"Name": "File",
"Args": {
"path": "Logs/webapi-.txt",
"rollingInterval": "Day"
}
}
],
"Region": "eu-west-2",
"LogGroup": "/development/serilog",
"LibraryLogFileName": "Logs/serilog-aws-errors.txt"
}
I need to know that there is a way to log the details in serilog-aws-errors.txt to AWS cloudwatch or S3 bucket.

This depends a lot on where in AWS you are trying to deploy your service. In ECS or Fargate you can log directly to the console. This would be a snippet of the container Definition:
"containerDefinitions": [
{
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/dev/ecs/my-api-logs-here",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
With the configuration above you only need the Serilog.Sinks.Console and everything will log without a special AWS sink. To write to the console you can just use
loggerConfiguration.WriteTo.Async(a =>
{
a.Console(new JsonFormatter());
});
When deployed to Fargate or ECS, these console logs will appear in your CloudWatch logs. No additional sink is necessary. Lambda logs have a similar setup. See: https://docs.aws.amazon.com/lambda/latest/dg/csharp-logging.html for more details.
If you want to use the Serilog.Sinks.AwsCloudWatch, it does have some nice features, but the setup is a little different. You probably won't want to log to the console or your file sink at all. Instead, you'll just log directly to CloudWatch. You'll want to set it up according to their instructions on Github: https://github.com/Cimpress-MCP/serilog-sinks-awscloudwatch. You can get this up and running in your local environment and then set your app settings up in a way that this only runs when deployed to AWS, and locally you still use the console or file settings.
var options = new CloudWatchSinkOptions
{
// the name of the CloudWatch Log group for logging
LogGroupName = logGroupName,
// the main formatter of the log event
TextFormatter = formatter,
// other defaults defaults
MinimumLogEventLevel = LogEventLevel.Information,
BatchSizeLimit = 100,
QueueSizeLimit = 10000,
Period = TimeSpan.FromSeconds(10),
CreateLogGroup = true,
LogStreamNameProvider = new DefaultLogStreamProvider(),
RetryAttempts = 5
};
// setup AWS CloudWatch client
var client = new AmazonCloudWatchLogsClient(myAwsRegion);
// Attach the sink to the logger configuration
Log.Logger = new LoggerConfiguration()
.WriteTo.AmazonCloudWatch(options, client)
.CreateLogger();

How to run "docker run hello-world" using AWS Batch

I have build using boto3 a workflow that creates a compute environment, creates a job queue, registers a job definition and finally submits job. Trying 'ls' command works fine, however, when trying command 'docker run hello-world' does not work.
Code to create comp env:
response = client.create_compute_environment(
computeEnvironmentName=com_env_name,
type='MANAGED',
state='ENABLED',
computeResources={
'type': 'EC2',
'allocationStrategy': 'BEST_FIT',
'minvCpus': 0,
'maxvCpus': 5,
'instanceTypes': [
'c3.large',
],
'ec2Configuration': [{
'imageType': 'ECS_AL2',
}],
'subnets': [
subnet_id,
],
'securityGroupIds': [
sec_gr_id,
],
'instanceRole': 'ecsInstanceRole',
},
serviceRole = 'arn:aws:iam::blabla
)
The job queue is defined as:
response = batch_client.create_job_queue(
jobQueueName=queue_name,
state='ENABLED',
priority=1,
computeEnvironmentOrder=[
{
'order': 1,
'computeEnvironment': com_env_name
},
],
)
My goal is to run 'docker run hello-world'. The job definition is defined as follows:
response = batch.register_job_definition(
jobDefinitionName=job_def_name,
type='container',
containerProperties={
'image': 'custom-image',
'memory': 2048,
'vcpus': 2,
'command': ['ls'],
'environment': [
{
'name': "DOCKER_HOST",
'value': "unix:///var/run/docker.sock"
},
],
'volumes': [
{
'host': {
'sourcePath': '//var/run/docker.sock'
},
'name': 'docker'
}],
'mountPoints': [
{
'containerPath': '/var/run/docker.sock',
'sourceVolume': 'docker'
}],
},
)
Are the volumes and mount points properly set? What is missing? Is there a connection between dockers to establish? The output error after submitting the job is:
CannotStartContainerError: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "docker run hello-world": executable file not found in $PATH: unknown
The code for job submission is:
response = batch.submit_job(
jobDefinition=job_def_name,
jobName=job_nom,
jobQueue=job_queue_name,
containerOverrides={
'command': ['docker run hello-world',]
}

I notice two things:
You have an extra slash in your sourcePath
The error message you get seems to indicate that the docker executable doesn't exist in the image you're running. You'll need to use an image that supports docker in docker, such as the standard docker image

Is it possible to extract "instanceId" from EventBridge event data, and use it as Target Value?

I was able to setup AutoScaling events as rules in EventBridge to trigger SSM Commands, but I've noticed that with my chosen Target Value the event is passed to all my active EC2 Instances. My Target key is a tag shared by those instances, so my mistake makes sense now.
I'm pretty new to EventBridge, so I was wondering if there's a way to actually target the instance that triggered the AutoScaling event (as in extracting the "InstanceId" that's present in the event data and use that as my new Target Value). I saw the Input Transformer, but I think that just transforms the event data to pass to the target.
Thanks!
EDIT - help with js code for Lambda + SSM RunCommand
I realize I can achieve this by setting EventBridge to invoke a Lambda function instead of the SSM RunCommand directly. Can anyone help with the javaScript code to call a shell command on the ec2 instance specified in the event data (event.detail.EC2InstanceId)? I can't seem to find a relevant and up-to-date base template online, and I'm not familiar enough with js or Lambda. Any help is greatly appreciated! Thanks
Sample of Event data, as per aws docs
{
"version": "0",
"id": "12345678-1234-1234-1234-123456789012",
"detail-type": "EC2 Instance Launch Successful",
"source": "aws.autoscaling",
"account": "123456789012",
"time": "yyyy-mm-ddThh:mm:ssZ",
"region": "us-west-2",
"resources": [
"auto-scaling-group-arn",
"instance-arn"
],
"detail": {
"StatusCode": "InProgress",
"Description": "Launching a new EC2 instance: i-12345678",
"AutoScalingGroupName": "my-auto-scaling-group",
"ActivityId": "87654321-4321-4321-4321-210987654321",
"Details": {
"Availability Zone": "us-west-2b",
"Subnet ID": "subnet-12345678"
},
"RequestId": "12345678-1234-1234-1234-123456789012",
"StatusMessage": "",
"EndTime": "yyyy-mm-ddThh:mm:ssZ",
"EC2InstanceId": "i-1234567890abcdef0",
"StartTime": "yyyy-mm-ddThh:mm:ssZ",
"Cause": "description-text"
}
}
Edit 2 - my Lambda code so far
'use strict'
const ssm = new (require('aws-sdk/clients/ssm'))()
exports.handler = async (event) => {
const instanceId = event.detail.EC2InstanceId
var params = {
DocumentName: "AWS-RunShellScript",
InstanceIds: [ instanceId ],
TimeoutSeconds: 30,
Parameters: {
commands: ["/path/to/my/ec2/script.sh"],
workingDirectory: [],
executionTimeout: ["15"]
}
};
const data = await ssm.sendCommand(params).promise()
const response = {
statusCode: 200,
body: "Run Command success",
};
return response;
}

Yes, but through Lambda
EventBridge -> Lambda (using SSM api) -> EC2
Thank you #Sándor Bakos for helping me out!! My JavaScript ended up not working for some reason, so I ended up just using part of the python code linked in the comments.
1. add ssm:SendCommand permission:
After I let Lambda create a basic role during function creation, I added an inline policy to allow Systems Manager's SendCommand. This needs access to your documents/*, instances/* and managed-instances/*
2. code - python 3.9
import boto3
import botocore
import time
def lambda_handler(event=None, context=None):
try:
client = boto3.client('ssm')
instance_id = event['detail']['EC2InstanceId']
command = '/path/to/my/script.sh'
client.send_command(
InstanceIds = [ instance_id ],
DocumentName = 'AWS-RunShellScript',
Parameters = {
'commands': [ command ],
'executionTimeout': [ '60' ]
}
)

You can do this without using lambda, as I just did, by using eventbridge's input transformers.
I specified a new automation document that called the document I was trying to use (AWS-ApplyAnsiblePlaybooks).
My document called out the InstanceId as a parameter and is passed this by the input transformer from EventBridge. I had to pass the event into lambda just to see how to parse the JSON event object to get the desired instance ID - this ended up being
$.detail.EC2InstanceID
(it was coming from an autoscaling group).
I then passed it into a template that was used for the runbook
{"InstanceId":[<instance>]}
This template was read in my runbook as a parameter.
This was the SSM playbook inputs I used to run the AWS-ApplyAnsiblePlaybook Document, I just mapped each parameter to the specified parameters in the nested playbook:
"inputs": {
"InstanceIds": ["{{ InstanceId }}"],
"DocumentName": "AWS-ApplyAnsiblePlaybooks",
"Parameters": {
"SourceType": "S3",
"SourceInfo": {"path": "https://testansiblebucketab.s3.amazonaws.com/"},
"InstallDependencies": "True",
"PlaybookFile": "ansible-test.yml",
"ExtraVariables": "SSM=True",
"Check": "False",
"Verbose": "-v",
"TimeoutSeconds": "3600"
}
See the document below for reference. They used a document that was already set up to receive the variable
https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-tutorial-eventbridge-input-transformers.html
This is the full automation playbook I used, most of the parameters are defaults from the nested playbook:
{
"description": "Runs Ansible Playbook on Launch Success Instances",
"schemaVersion": "0.3",
"assumeRole": "<Place your automation role ARN here>",
"parameters": {
"InstanceId": {
"type": "String",
"description": "(Required) The ID of the Amazon EC2 instance."
}
},
"mainSteps": [
{
"name": "RunAnsiblePlaybook",
"action": "aws:runCommand",
"inputs": {
"InstanceIds": ["{{ InstanceId }}"],
"DocumentName": "AWS-ApplyAnsiblePlaybooks",
"Parameters": {
"SourceType": "S3",
"SourceInfo": {"path": "https://testansiblebucketab.s3.amazonaws.com/"},
"InstallDependencies": "True",
"PlaybookFile": "ansible-test.yml",
"ExtraVariables": "SSM=True",
"Check": "False",
"Verbose": "-v",
"TimeoutSeconds": "3600"
}
}
}
]
}

How can I get the TaskId of a Fargate ecs Container

Similar to this question How to get Task ID from within ECS container? but I want to get the TaskId for my Fargate task. How can you do this? Like others I want this for logging information.
I'm running a Spring App with ELK stack for logging and would like if possible to include the TaskId in the logs if possible.
Edit
I actually never got this to work by the way, here is my code:
private String getTaskIdInternal() {
String url = System.getenv("ECS_CONTAINER_METADATA_URI_V4") + "/task";
logger.info("Getting ecsMetaDataURL={}", url);
if (url == null) {
throw new RuntimeException("ECS_CONTAINER_METADATA_URI_V4 env variable not defined");
}
RestTemplate restTemplate = new RestTemplate();
ResponseEntity<JsonNode> response = restTemplate.getForEntity(url, JsonNode.class);
logger.info("ecsMetaData={}", response);
JsonNode map = response.getBody();
String taskArn = map.get("TaskARN").asText();
String[] splitTaskArn = taskArn.split("/");
String taskId = splitTaskArn[splitTaskArn.length - 1];
logger.info("ecsTaskId={}", taskId);
return taskId;
}
But I always get this stack trace:
Could not get the taskId from ECS. exception=org.springframework.web.client.HttpClientErrorException: 403 Forbidden
at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:118)
at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:103)
at org.springframework.web.client.ResponseErrorHandler.handleError(ResponseErrorHandler.java:63)
at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:732)
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:690)
at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:646)
at org.springframework.web.client.RestTemplate.getForEntity(RestTemplate.java:325)

If you're trying to get the task id in Fargate for ECS you make use of metadata endpoints.
Assuming you're using version 1.4.0 of Fargate you can get this via a http request to ${ECS_CONTAINER_METADATA_URI_V4}/task.
An example response from this endpoint is below
{
"Cluster": "arn:aws:ecs:us-west-2:&ExampleAWSAccountNo1;:cluster/default",
"TaskARN": "arn:aws:ecs:us-west-2:&ExampleAWSAccountNo1;:task/default/febee046097849aba589d4435207c04a",
"Family": "query-metadata",
"Revision": "7",
"DesiredStatus": "RUNNING",
"KnownStatus": "RUNNING",
"Limits": {
"CPU": 0.25,
"Memory": 512
},
"PullStartedAt": "2020-03-26T22:25:40.420726088Z",
"PullStoppedAt": "2020-03-26T22:26:22.235177616Z",
"AvailabilityZone": "us-west-2c",
"Containers": [
{
"DockerId": "febee046097849aba589d4435207c04aquery-metadata",
"Name": "query-metadata",
"DockerName": "query-metadata",
"Image": "mreferre/eksutils",
"ImageID": "sha256:1b146e73f801617610dcb00441c6423e7c85a7583dd4a65ed1be03cb0e123311",
"Labels": {
"com.amazonaws.ecs.cluster": "arn:aws:ecs:us-west-2:&ExampleAWSAccountNo1;:cluster/default",
"com.amazonaws.ecs.container-name": "query-metadata",
"com.amazonaws.ecs.task-arn": "arn:aws:ecs:us-west-2:&ExampleAWSAccountNo1;:task/default/febee046097849aba589d4435207c04a",
"com.amazonaws.ecs.task-definition-family": "query-metadata",
"com.amazonaws.ecs.task-definition-version": "7"
},
"DesiredStatus": "RUNNING",
"KnownStatus": "RUNNING",
"Limits": {
"CPU": 2
},
"CreatedAt": "2020-03-26T22:26:24.534553758Z",
"StartedAt": "2020-03-26T22:26:24.534553758Z",
"Type": "NORMAL",
"Networks": [
{
"NetworkMode": "awsvpc",
"IPv4Addresses": [
"10.0.0.108"
],
"AttachmentIndex": 0,
"IPv4SubnetCIDRBlock": "10.0.0.0/24",
"MACAddress": "0a:62:17:7a:36:68",
"DomainNameServers": [
"10.0.0.2"
],
"DomainNameSearchList": [
"us-west-2.compute.internal"
],
"PrivateDNSName": "ip-10-0-0-108.us-west-2.compute.internal",
"SubnetGatewayIpv4Address": ""
}
]
}
]
}
As you can see you would need to parse the TaskARN to get the TaskID (it is the last part of the ARN if you split by "/".
Amazon do specify the following in the documentation that should be noted.
For tasks using the Fargate launch type and platform versions prior to 1.4.0, the task metadata version 3 and 2 endpoint are supported. For more information, see Task Metadata Endpoint version 3 or Task Metadata Endpoint version 2.

The link in the accepted answer is for EC2 launch type. The direct doc link for Fargate is: https://docs.aws.amazon.com/AmazonECS/latest/userguide/task-metadata-endpoint-v4-fargate.html. The json content seems to be pretty much the same though.

HIVE_INVALID_METADATA in Amazon Athena

How can I work around the following error in Amazon Athena?
HIVE_INVALID_METADATA: com.facebook.presto.hive.DataCatalogException: Error: : expected at the position 8 of 'struct<x-amz-request-id:string,action:string,label:string,category:string,when:string>' but '-' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null)
When looking at position 8 in the database table connected to Athena generated by AWS Glue, I can see that it has a column named attributes with a corresponding struct data type:
struct <
x-amz-request-id:string,
action:string,
label:string,
category:string,
when:string
>
My guess is that the error occurs because the attributes field is not always populated (c.f. the _session.start event below) and does not always contain all fields (e.g. the DocumentHandling event below does not contain the attributes.x-amz-request-id field). What is the appropriate way to address this problem? Can I make a column optional in Glue? Can (should?) Glue fill the struct with empty strings? Other options?
Background: I have the following backend structure:
Amazon PinPoint Analytics collects metrics from my application.
The PinPoint event stream has been configured to forward the events to an Amazon Kinesis Firehose delivery stream.
Kinesis Firehose writes data to S3
Use AWS Glue to crawl S3
Use Athena to write queries based on the databases and tables generated by AWS Glue
I can see PinPoint events successfully being added to json files in S3, e.g.
First event in a file:
{
"event_type": "_session.start",
"event_timestamp": 1524835188519,
"arrival_timestamp": 1524835192884,
"event_version": "3.1",
"application": {
"app_id": "[an app id]",
"cognito_identity_pool_id": "[a pool id]",
"sdk": {
"name": "Mozilla",
"version": "5.0"
}
},
"client": {
"client_id": "[a client id]",
"cognito_id": "[a cognito id]"
},
"device": {
"locale": {
"code": "en_GB",
"country": "GB",
"language": "en"
},
"make": "generic web browser",
"model": "Unknown",
"platform": {
"name": "macos",
"version": "10.12.6"
}
},
"session": {
"session_id": "[a session id]",
"start_timestamp": 1524835188519
},
"attributes": {},
"client_context": {
"custom": {
"legacy_identifier": "50ebf77917c74f9590c0c0abbe5522d2"
}
},
"awsAccountId": "672057540201"
}
Second event in the same file:
{
"event_type": "DocumentHandling",
"event_timestamp": 1524835194932,
"arrival_timestamp": 1524835200692,
"event_version": "3.1",
"application": {
"app_id": "[an app id]",
"cognito_identity_pool_id": "[a pool id]",
"sdk": {
"name": "Mozilla",
"version": "5.0"
}
},
"client": {
"client_id": "[a client id]",
"cognito_id": "[a cognito id]"
},
"device": {
"locale": {
"code": "en_GB",
"country": "GB",
"language": "en"
},
"make": "generic web browser",
"model": "Unknown",
"platform": {
"name": "macos",
"version": "10.12.6"
}
},
"session": {},
"attributes": {
"action": "Button-click",
"label": "FavoriteStar",
"category": "Navigation"
},
"metrics": {
"details": 40.0
},
"client_context": {
"custom": {
"legacy_identifier": "50ebf77917c74f9590c0c0abbe5522d2"
}
},
"awsAccountId": "[aws account id]"
}
Next, AWS Glue has generated a database and a table. Specifically, I see that there is a column named attributes that has the value of
struct <
x-amz-request-id:string,
action:string,
label:string,
category:string,
when:string
>
However, when I attempt to Preview table from Athena, i.e. execute the query
SELECT * FROM "pinpoint-test"."pinpoint_testfirehose" limit 10;
I get the error message described earlier.
Side note, I have tried to remove the attributes field (by editing the database table from Glue), but that results in Internal error when executing the SQL query from Athena.

This is a known limitation. Athena table and database names allow only underscore special characters#
Athena table and database names cannot contain special characters, other than underscore (_).
Source: http://docs.aws.amazon.com/athena/latest/ug/known-limitations.html

Use tick (`) when table name has - in the name
Example:
SELECT * FROM `pinpoint-test`.`pinpoint_testfirehose` limit 10;
Make sure you select "default" database on the left pane.

I believe the problem is your struct element name: x-amz-request-id
The "-" in the name.
I'm currently dealing with a similar issue since my elements in my struct have "::" in the name.
Sample data:
some_key: {
"system::date": date,
"system::nps_rating": 0
}
Glue derived struct Schema (it tried to escape them with ):
struct <
system\:\:date:String
system\:\:nps_rating:Int
>
But that still gives me an error in Athena.
I don't have a good solution for this other than changing Struct to STRING and trying to process the data that way.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Set custom log group for AWS Batch Job - amazon-web-services

Related

How to log library logs to AWS CloudWatch using Serilog?

How to run "docker run hello-world" using AWS Batch

Is it possible to extract "instanceId" from EventBridge event data, and use it as Target Value?

How can I get the TaskId of a Fargate ecs Container

HIVE_INVALID_METADATA in Amazon Athena

Categories

Resources