I do have a EC2 instance and a docker container running on it. Currently this docker container uses awslog driver to push logs to CloudWatch. If I go to CloudWatch console, I see a very large log stream (with container id as name) which contains all logs of last 16 days (since I've created the container). It almost seems like if I have this container running for 1 year, this log stream will keep all logs of 1 year. I am not quite sure what is the maximum size limit of a CloudWatch log stream, but most likely it will have a limit, at least I believe.
So my question is;
How to chunk this huge logstream? Hopefully by current date, smth like {{.ContainerId}}{{.CurrentDate}}
What is the maximum size limit of a CloudWatch log stream?
Is it a good practice to append onto a single huge log stream?
The following is the definition of Cloudwatch Log Stream as defined in the docs, here
Log streams
A log stream is a sequence of log events that share the same source. More specifically, a log stream is generally intended to represent the sequence of events coming from the application instance or resource being monitored. For example, a log stream may be associated with an Apache access log on a specific host. When you no longer need a log stream, you can delete it using the aws logs delete-log-stream command.
Unfortunately what you want is not possible at the moment. Not sure what exactly is your use-case but you can filter the logs streams using time, so separating them is not really necessary. See start-time and end-time in filter-log-events
You might want to define the following awslog driver options to get a better stream name.
awslogs-stream-prefix see docs
Related
I'm using an AWS Glue job to move and transform data across S3 buckets, and I'd like to build custom accumulators to monitor the number of rows that I'm receiving and sending, along with other custom metrics. What is the best way to monitor these metrics? According to this document: https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html I can keep track of general metrics on my glue job but there doesn't seem to be a good way to send custom metrics through cloudwatch.
I have done lots of similar project like this, each micro batch can be:
a file or a bunch of file
a time interval of data from API
a partition of records from database
etc ...
Your use case is can be break down into three question:
given a bunch of input, how could you define a task_id
how you want to define the metrics for your task, you need to define a simple dictionary structure for this metrics data
find a backend data store to store the metrics data
find a way to query the metrics data
In some business use case, you also need to store status information to track each of the input, are they succeeded? failed? in-progress? stuck? and you may want to control retry, and concurrency control (avoid multiple worker working on the same input)
DynamoDB is the perfect backend for this type of use case. It is a super fast, no ops, pay as you go, automatically scaling key-value store.
There's a Python library that implemented this pattern https://github.com/MacHu-GWU/pynamodb_mate-project/blob/master/examples/patterns/status-tracker.ipynb
Here's an example:
put your glue ETL job main logic in a function:
def glue_job() -> dict:
...
return your_metrics
given an input, calculate the task id identifier, then you just need
tracker = Tracker.new(task_id)
# start the job, it will succeed
with tracker.start_job():
# do some work
your_metrics = glue_job()
# save your metrics in dynamodb
tracker.set_data(your_metrics)
Consider enabling continuous logging on your AWS Glue Job. This will allow you to do custom logging via. CloudWatch. Custom logging can include information such as row count.
More specifically
Enable continuous logging for you Glue Job
Add logger = glueContext.get_logger() at the beginning of you Glue Job
Add logger.info("Custom logging message that will be sent to CloudWatch") where you want to log information to CloudWatch. For example if I have a data frame named df I could log the number of rows to CloudWatch by adding logger.info("Row count of df " + str(df.count()))
Your log messages will be located under the CloudWatch log groups /aws-glue/jobs/logs-v2 under the log stream named glue_run_id -driver.
You can also reference the "Logging Application-Specific Messages Using the Custom Script Logger" section of the AWS documentation Enabling Continuous Logging for AWS Glue Jobs for more information on application specific logging.
I have a CloudWatch set up on my EC2 instance to transfer logs to specific log groups.
In time, those logs can grow quite big in size so I wanted to delete them for example, on weekly basis.
I was wondering if there is any option of setting up auto-cleanup from EC2 instance, of transferred logs using Cloudwatch?
What would be the best way to achieve that?
To remove the logfiles from EC2 running Linux, you have two choices:
If you're using logfiles that already rotate based on time or other value, you can use the auto_removal option to delete them after the log agent is finished. See docs.
If you're using a file that's constantly updated, you'll need to use logrotate, which is a program invoked by CRON that will rename, compress, and delete old files. There's a good intro doc here.
If you use logrotate, here's an example config that I've found useful for high-volume log sources. It performs a rotate if the file reaches 100 megabytes, rather than just doing it every day (you'll need to run it from cron.hourly to make that useful). Most important, it enables copytruncate, which will truncate the file in-place, allowing the program to continue writing to it.
/var/log/filename.log {
rotate 7
daily
maxsize 100M
nodateext
missingok
notifempty
copytruncate
compress
delaycompress
}
Setup
I use Azure stream analytics to stream data into Azure warehouse staging table.
The input source of the job is a EventHub stream.
I notice when I'm updating the job, the job input event backlog goes up massively after the start.
It looks like the job starting to process the complete EventHub queue again from the beginning.
Questions
how is the stream position management organised in stream analytics
is it possible to define a stream position where the job starts (event after queued after a specific point in time for example)
So far done
I notice a similar question here on StackOverflow.
There is mentioned a variable name "eventStartTime".
But since I use an "asaproj" project within visual studio to create, update and deploy the job I don't know where to place this before deploying.
For updating job without stop, it will use previous setting of "Joboutputstarttime", so it is possible for job starting to process the data from the beginning.
you can stop the job first, then choose "Joboutputstarttime" before you will start the job.
You can reference this document https://learn.microsoft.com/en-us/azure/stream-analytics/start-job to see detailed information for each mode. for your scenario, "When last stopped" mode maybe the one you need and it will not process data from beginning of the eventhub queue.
I am working on a project, where i am streaming my data to AWS CloudWatch logs, via AWS Cloudwatch logs agent. I can see that there are log-groups getting created and there are logs inside it. Still AWS Cloudwatch is showing 0 bytes stored. Why is that?
This is an issue, as I am also streaming the data to our ES domain, and this stored data might be stopping it. Or am I doing something wrong? Here is a screenshot of the info :
What am I doing wrong?
storedBytes -> (long)
The number of bytes stored.
IMPORTANT: Starting on June 17, 2019, this parameter will be
deprecated for log streams, and will be reported as zero. This change
applies only to log streams. The storedBytes parameter for log groups
is not affected.
Source
I have a Lambda function and its logs in Cloudwatch (Log group and Log Stream). Is it possible to filter (in Cloudwatch Management Console) all logs that contain "error"? For example logs containing "Process exited before completing request".
In Log Groups there is a button "Search Events". You must click on it first.
Then it "changes" to "Filter Streams":
Now you should just type your filter and select the beginning date-time.
So this is kind of a side issue, but it was relevant for us. (I posted this to another answer on StackOverflow but thought it would be relevant to this conversation too)
We've noticed that tailing and searching logs gets really slow after a log group has a lot of Log Streams in it, like when an AWS Lambda Function has had a lot of invocations. This is because "tail" type utilities and searching need to connect to each log stream to run. Log Events get expired and deleted due to the policy you set on the Log Group itself, but the Log Streams never get cleaned up. I made a few little utility scripts to help with that:
https://github.com/four43/aws-cloudwatch-log-clean
Hopefully that save you some agony over waiting for those logs to get searched.
You can also use CloudWatch Insights (https://aws.amazon.com/about-aws/whats-new/2018/11/announcing-amazon-cloudwatch-logs-insights-fast-interactive-log-analytics/) which is an AWS extension to CloudWatch logs that gives a pretty powerful query and analytics tool. However it can be slow. Some of my queries take up to a minute. Okay, if you really need that data.
You could also use a tool I created called SenseLogs. It downloads CloudWatch data to your browser where you can do queries like you ask about. You can use either full text and search for "error" or if your log data is structured (JSON), you can use a Javascript like expression language to filter by field, eg:
error == 'critical'
Posting an update as CloudWatch has changed since 2016:
In the Log Groups there is a Search all button for a full-text search
Then just type your search: