How do I consume json logs inside Fargate using CDK? - amazon-web-services

I have a docker container running in Fargate that emits json logs to the console using log4j-layout-template.
The logs emitted look like this:
{"#timestamp":"2022-03-22T09:08:16.838Z","ecs.version":"1.2.0","log.level":"INFO","message":"Server version name: Apache Tomcat/8.5.76","process.thread.name":"main","log.logger":"org.apache.catalina.startup.VersionLoggerListener"}
{"#timestamp":"2022-03-22T09:08:16.838Z","ecs.version":"1.2.0","log.level":"INFO","message":"Server built: Feb 23 2022 17:59:11 UTC","process.thread.name":"main","log.logger":"org.apache.catalina.startup.VersionLoggerListener"}
I configure my CDK with the following:
var def = ingestGatewayTaskDefinition.addContainer(
id + "Container",
ContainerDefinitionOptions
.builder()
.image(fromEcrRepository(ecrRepository))
.memoryLimitMiB(memory)
.cpu(cpu)
.environment(environment)
.secrets(secrets)
.logging(
LogDriver.awsLogs(
AwsLogDriverProps
.builder()
.logGroup(
LogGroup.Builder
.create(this, props.getServiceName())
.logGroupName("dev/" + props.getServiceName())
.retention(RetentionDays.ONE_DAY)
.build()
)
.streamPrefix("dev/" + props.getServiceName())
//.datetimeFormat("%Y-%m-%dT%H:%M:%SZ") //??
.build()
)
)
.build()
);
But in Cloud Watch the message portion is the json and is not parsed but should be discoverable.
How do I parse these fields?
This is what is ends up looking like:
What I am looking for in Cloud Watch is this:
#timestamp
ecs.version
log.level
message
log.logger
2022-03-22T09:08:16.838Z
1.2.0
INFO
Server version name:...
org.apache...
2022-03-22T09:08:16.838Z
1.2.0
INFO
"Server built:...
org.apache...

There's nothing wrong with the parsing, your events are being parsed correctly.
The following query should work correctly:
fields #timestamp, #message
| filter log.level="INFO"
| sort #timestamp desc
The Log Stream UI does not show the inferred nested structure, but it's still available for querying.

Related

Packer: Receiving ID not implemented for builder when using build.ID

When trying to pass through build.ID to shell-local post processor the evaluate string in the post processor is ERR_ID_NOT_IMPLEMENTED_BY_BUILDER I am using vsphere-iso.
The docs mention
Here is the list of available build variables:
ID: Represents the VM being provisioned. For example, in Amazon it is the instance ID; in DigitalOcean, it is the Droplet ID; in VMware, it is the VM name.
So I assumed it was supported with vsphere-iso?
Basically I am trying to passthrough the evaluated vm/template name through to a post powershell post processor.
Here is the post processor config:
post-processor "shell-local" {
environment_vars = [
"VCENTER_USER=${var.vsphere_username}",
"VCENTER_PASSWORD=${var.vsphere_password}",
"VCENTER_SERVER=${var.vsphere_endpoint}",
"TEMPLATE_NAME=${build.ID}",
"TEMPLATE_UUID=${local.build_uuid}",
]
env_var_format = "$env:%s=\"%s\"; "
execute_command = ["${var.common_post_processor_cli}.exe", "{{.Vars}} {{.Script}}"]
script = "scripts/windows/cleanup.ps1"
}
Here is the post processor script
param(
[string]
$TemplateName = $env:TEMPLATE_NAME
)
Write-Host $TemplateName
Here is the result logged to the console
==> vsphere-iso.windows-server-standard-dexp (shell-local): Running local shell script: scripts/windows/cleanup.ps1
vsphere-iso.windows-server-standard-dexp (shell-local): ERR_ID_NOT_IMPLEMENTED_BY_BUILDER

AWS Lambda reading Athena database file and writing S3 to no avail

A help because are not writing the file to the S3 bucket
What did I do:
import time
import boto3
query = 'SELECT * FROM db_lambda.tb_inicial limit 10'
DATABASE = 'db_lambda'
output = 's3: // bucket-lambda-test1 / result /'
def lambda_handler (event, context):
client = boto3.client ('athena')
# Execution
response = client.start_query_execution (
QueryString = query,
QueryExecutionContext = {
Database: DATABASE
},
ResultConfiguration = {
'OutputLocation': output,
}
)
return response
return
IAM role created with:
AmazonS3FullAccess
AmazonAthenaFullAccess
CloudWatchLogsFullAccess
AmazonVPCFullAccess
AWSLambda_FullAccess
When running Lambda message:
Response:
{
"statusCode": 200,
"body": "\" Hello from Lambda! \ ""
}
Request ID:
"f2dd5cd2-070c-41ea-939f-d4909ce39fd0"
Function logs:
START RequestId: f2dd5cd2-070c-41ea-939f-d4909ce39fd0 Version: $ LATEST
END RequestId: f2dd5cd2-070c-41ea-939f-d4909ce39fd0
REPORT RequestId: f2dd5cd2-070c-41ea-939f-d4909ce39fd0 Duration: 0.84 ms Billed Duration: 1 ms Memory Size: 128 MB Max Memory Used: 52 MB
How I did the test:
Configure test event
A function can have a maximum of 10 test events. The events are maintained, so that you can change your computer or web browser and test the function with the same events.
Create new test event
Edit saved test events
Test event saved
{
}
The "Hello from Lambda" message is the default code in a Lambda function. It would appear that you did not click 'Deploy' before testing the function. Clicking Deploy will save the Lambda code.
Also, once you get it running, please note that start_query_execution() will simply start the Athena query. You will need to use get_query_results() to obtain the results.

GCP Dataflow extract JOB_ID

For a DataFlow Job, I need to extract Job_ID from JOB_NAME. I have the below command and the corresponding o/p. Can you please guide how to extract JOB_ID from the below response
$ gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job"
JOB_ID NAME TYPE CREATION_TIME STATE REGION
2020-10-07_10_11_20-15879763245819496196 sample-job Streaming 2020-10-07 17:11:21 Running us-central1
If we can use Python script to achieve it, even that will be fine.
gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job" --format="value(JOB_ID)"
You can use standard command line tools to parse the response of that command, for example
gcloud dataflow jobs list --region=us-central1 --status=active --filter="name=sample-job" | tail -n 1 | cut -f 1 -d " "
Alternatively, if this is from a Python program already, you can use the Dataflow API directly instead of using the gcloud tool, like in How to list down all the dataflow jobs using python API
With python, you can retrieve the jobs' list with a REST request to the Dataflow's method https://dataflow.googleapis.com/v1b3/projects/{projectId}/jobs
Then, the json response can be parsed to filter the job name you are searching for by using a if clause:
if job["name"] == 'sample-job'
I tested this approached and it worked:
import requests
import json
base_url = 'https://dataflow.googleapis.com/v1b3/projects/'
project_id = '<MY_PROJECT_ID>'
location = '<REGION>'
response = requests.get(f'{base_url}{project_id}/locations/{location}/jobs', headers = {'Authorization':'Bearer <BEARER_TOKEN_HERE>'})
# <BEARER_TOKEN_HERE> can be retrieved with 'gcloud auth print-access-token' obtained with an account that has access to Dataflow jobs.
# Another authentication mechanism can be found in the link provided by danielm
jobslist = response.json()
for key,jobs in jobslist.items():
for job in jobs:
if job["name"] == 'beamapp-0907191546-413196':
print(job["name"]," Found, job ID:",job["id"])
else:
print(job["name"]," Not matched")
# Output:
# windowedwordcount-0908012420-bd342f98 Not matched
# beamapp-0907200305-106040 Not matched
# beamapp-0907192915-394932 Not matched
# beamapp-0907191546-413196 Found, job ID: 2020-09-07...154989572
Created my GIST with Python script to achieve it.

CloudWatch logs acting weird

I have two log files with multi-line log statements. Both of them have same datetime format at the begining of each log statement. The configuration looks like this:
state_file = /var/lib/awslogs/agent-state
[/opt/logdir/log1.0]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log1.0
log_stream_name = /opt/logdir/logs/log1.0
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group
[/opt/logdir/log2-console.log]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log2-console.log
log_stream_name = /opt/logdir/log2-console.log
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group
The cloudwatch logs agent is sending log1.0 logs correctly to my log group on cloudwatch, however, its not sending log files for log2-console.log.
awslogs.log says:
2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196444000, 'start_position': 42330916L, 'end_position': 42331504L}, reason: timestamp is more than 2 hours in future.
2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196451000, 'start_position': 42331504L, 'end_position': 42332092L}, reason: timestamp is more than 2 hours in future.
Though server time is correct. Also weird thing is Line numbers mentioned in start_position and end_position does not exist in actual log file being pushed.
Anyone else experiencing this issue?
I was able to fix this.
The state of awslogs was broken. The state is stored in a sqlite database in /var/awslogs/state/agent-state. You can access it via
sudo sqlite3 /var/awslogs/state/agent-state
sudo is needed to have write access.
List all streams with
select * from stream_state;
Look up your log stream and note the source_id which is part of a json data structure in the v column.
Then, list all records with this source_id (in my case it was 7675f84405fcb8fe5b6bb14eaa0c4bfd) in the push_state table
select * from push_state where k="7675f84405fcb8fe5b6bb14eaa0c4bfd";
The resulting record has a json data structure in the v column which contains a batch_timestamp. And this batch_timestamp seams to be wrong. It was in the past and any newer (more than 2 hours) log entries were not processed anymore.
The solution is to update this record. Copy the v column, replace the batch_timestamp with the current timestamp and update with something like
update push_state set v='... insert new value here ...' where k='7675f84405fcb8fe5b6bb14eaa0c4bfd';
Restart the service with
sudo /etc/init.d/awslogs restart
I hope it works for you!
We had the same issue and the following steps fixed the issue.
If log groups are not updating with latest events:
Run These steps:
Stopped the awslogs service
Deleted file /var/awslogs/state/agent-state
Updated /var/awslogs/etc/awslogs.conf configuration from hostaname to
instance ID Ex:
log_stream_name = {hostname} to log_stream_name = {instance_id}
Started awslogs service.
I was able to resolve this issue on Amazon Linux by:
sudo yum reinstall awslogs
sudo service awslogs restart
This method retained my config files in /var/awslogs/, though you may wish to back them up before a reinstall.
Note: In my troubleshooting, I had also deleted my Log Group via the AWS Console. The restart fully reloaded all historical logs, but at the present timestamp, which is of less value. I'm unsure if deleting the Log Group was this was necessary for this method to work. You might want to look at setting the initial_position config to end_of_file before you restart.
I found the reason. The time zone in my docker container is inconsistent with the time zone of my host computer. After setting the two time zones to be consistent, the problem is solved

AWS logs agent setup

We have recently setup AWS logs agent on one of our test servers. Our log files usually contain multi-line events. e.g one of our log event is:
[10-Jun-2016 07:30:16 UTC] SQS Post Response: Array
(
[Status] => 200
[ResponseBody] => <?xml version="1.0"?><SendMessageResponse xmlns="http://queue.amazonaws.com/doc/2009-02-01/"><SendMessageResult><MessageId>053c7sdf5-1e23-wa9d-99d8-2a0cf9eewe7a</MessageId><MD5OfMessageBody>8e542d2c2a1325a85eeb9sdfwersd58f</MD5OfMessageBody></SendMessageResult><ResponseMetadata><RequestId>4esdfr30-c39b-526b-bds2-14e4gju18af</RequestId></ResponseMetadata></SendMessageResponse>
)
The log agent reference documentation says to use 'multi_line_start_pattern' option for such logs. Our AWS Log agent config is as follows:
[httpd_info.log]
file = /var/log/httpd/info.log*
log_stream_name = info.log
initial_position = start_of_file
log_group_name = test.server.name
multi_line_start_pattern = '(\[)+\d{2}-[a-zA-Z]{3}+-\d{4}'
However, the logs agent reporting breaks on aforementioned and similar events. The way it is being reported to CloudWatch Logs is as follows:
Event 1:
[10-Jun-2016 11:21:26 UTC] SQS Post Response: Array
Event 2:
( [Status] => 200 [ResponseBody] => <?xml version="1.0"?><SendMessageResponse xmlns="http://queue.amazonaws.com/doc/2009-02-01/"><SendMessageResult><MessageId>053c7sdf5-1e23-wa9d-99d8-2a0cf9eewe7a</MessageId><MD5OfMessageBody>8e542d2c2a1325a85eeb9sdfwersd58f</MD5OfMessageBody></SendMessageResult><ResponseMetadata><RequestId>4esdfr30-c39b-526b-bds2-14e4gju18af</RequestId></ResponseMetadata></SendMessageResponse>
Event 3:
)
Despite of the fact that its only a single event. Any clue whats going on here?
I think all you need to add is the following to your awslogs.conf
datetime_format = %d-%b-%Y %H:%M:%S UTC
time_zone = UTC
multi_line_start_pattern = {datetime_format}
http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html
multi_line_start_pattern
Specifies the pattern for identifying the start of a log message. A log message is made of a line that matches the pattern and any following lines that don't match the pattern. The valid values are regular expression or {datetime_format}. When using {datetime_format}, the datetime_format option should be specified. The default value is ‘^[^\s]' so any line that begins with non-whitespace character closes the previous log message and starts a new log message.
If that datetime format didn't work, you would need to update your regex to actually match your specific datetime. I don't think the one you have listed above actually works for your given format.
You could try this for instance:
[\d{2}-[\w]{3}-\d{4}\s{1}\d{2}:\d{2}:\d{2}\s{1}\w+]
does match
[10-Jun-2016 11:21:26 UTC]
See here: http://www.regexpal.com/?fam=96811
Once completed, issue a restart of the service and check to see if its parsing correctly.
$ sudo service awslogs restart