I've looked on the AWS forums and elsewhere but haven't found a solution. I have a lambda function that, when invoked, creates a log stream which populates with log events. After about 12 hours or so, the log stream is still present, but when I open it, I see the following:
The link explains how to start sending event data, but I already have this set up, and I am sending event data, it just disappears after a certain time period.
I'm guessing there is some setting somewhere (either for max storage allowed or for whether logs get purged), but if there is, I haven't found it.
Another reason for missing data in the log stream might be a corrupted agent-state file. First check your logs
vim /var/log/awslogs.log
If you find something like "Caught exception: An error occurred (InvalidSequenceTokenException) when calling the PutLogEvents operation: The given sequenceToken is invalid. The next expected sequenceToken is:" you can regenerate the agent-state file as follows:
sudo rm /var/lib/awslogs/agent-state
sudo service awslogs stop
sudo service awslogs start
TL;DR: Just use the CLI. See Update 2 below.
This is really bizarre but I can replicate it...
I un-checked the "Expire Events After" box, and lo and behold I was able to open older log streams. What seems REALLY odd is that if I choose to display the "Stored Bytes" data, many of the files are listed at 0 bytes even though they have log events:
Update 1:
This solution no longer works as I can only view the log events in the first two log streams. What's more is that the Stored Bytes column displays different (and more accurate) data:
This leads me to believe that AWS made some kind of update.
UPDATE 2:
Just use the CLI. I've verified that I can retrieve log events from the CLI that I cannot retrieve via the web console.
First install the CLI (if you haven't already) and use the following command:
aws logs get-log-events --log-group-name NAME-OF-LOGGROUP --log-stream-name LOG-STREAM-NAME // be sure to escape special characters such as /, [, $ etc
Related
I currently have my application running in ECS. I have enabled the awslogs agent indicating the Log group and the region. Everything works great, send the logs to the Log group and create a Log stream. However, every time I restart the container, it creates a new Log stream.
Is there a way that instead of creating a Log stream as the container restarts, it all goes into a single Log stream?
I've been looking for a solution for a long time and I haven't found anything.
For example, instead of there being 2 Log streams, there is only 1 each time the container is restarted.
Something like this:
The simplest way is to use the PutLogEvents api directly. Beyond that you can get as fancy as you want. You could use a firelens side car container in your task to handle all events using a logging api that writes directly to cloudwatch.
For example, you can do this in python with boto3 cloudwatch put_log_events
response = boto3.client("logs").put_log_events(
logGroupName="your-log-group",
logStreamName="your-log-stream",
logEvents=[
{"timestamp": 123, "message": "log message"},
],
)
A task is running for a few seconds before terminating, I don't know why, and it's not pushing any logs.
I'm using the "awslogs" driver and the log group exists in CloudWatch.
The "Logs" tab is empty. The log-stream is created in CW but it's devoid of actual log events. There are also no results under Insights for that stream.
The task role has the permissions mentioned at https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_cloudwatch_logs.html .
Any idea what the deal is with the logs?
The command wasn't valid nor was it comma-separated. It was terminating too early in the workflow to log anything, but yet after any other deployment issue would be identified. So, it was looking like it was successful but in reality wasn't yet even running. Interestingly, it would still take around a minute to terminate, so maybe this includes the overhead of pulling the image.
Timestamps indicate that task started and exited after some seconds. awslogs will send logs if container has been successfully started, so, in this case it may not be helping. You can follow step 6 of documentation to diagnose. Specifically, if you have a container that has stopped, expand the container and inspect the Status reason row to see what caused the task state to change. In most cases, that will lead you to actual cause
I am trying to run a very simple custom command "echo helloworld" in GoCD as per the Getting Started Guide Part 2 however, the job does not finish with the Console saying Waiting for console logs and raw output saying Console log for this job is unavailable as it may have been purged by Go or deleted externally.
My job looks like the following which was taken from typing "echo" in the Lookup Command (which is different to the Getting Started example which I tried first with the same result)
Judging from the screenshot, the problem seems to be that no agent is assigned to the task. For an agent to be assigned, it must satisfy all of these conditions:
An agent must be running, and connected to the server
The agent must be enabled on the "Agents" page
If you use environments, the job and the agent need to be in the same environment
The agent needs to have all of the resources assigned that are configured in the job
Found the issue.
The Pipelines have to be in the same Environment to work.
Found the solution after searching, but leaving this here if somebody happens to run into similar kind of confusion. See resolution in the end.
I'm trying to figure out why AWS CloudWatch log service fails to understand the right timestamp for my log events. Currently all my events are being saved under Time 2017-01-01 no matter what the actual timestamp in the event is.
I'm feeding the log from syslog where docker is saving the logged events and I configured docker to put the timestamp in format:
170105/103242 (%y%m%d/%H%M%S)
I configured awslogs service with parameters:
datetime_format = %y%m%d/%H%M%S
I restarted the service and hit the server, but still when I go to CloudWatch and see the log entries, even entries that indeed start with timestamp 170105/103242 are actually saved as events that belong to date 2017-01-01 containing all events between 01-01 and 01-05
When I look at the awslogs.log I can see following lines:
2017-01-05 11:05:28,633 - cwlogs.push - INFO - 29223 - MainThread - Missing or invalid value for use_gzip_http_content_encoding config. Defaulting to using gzip encoding.
2017-01-05 11:05:28,633 - cwlogs.push - INFO - 29223 - MainThread - Using default logging configuration.
This makes me think that the configuration probably isn't actually reading/using the datetime_format but I don't understand why it decides to end up using default. I tried to put
use_gzip_http_content_encoding = true
under general settings, but it doesn't change the errors.
I am running out of ideas - has anyone managed to configure awslogger in a way where the datetime_format is actually used correctly?
Edit:
I'm currently hacking more console logs to local python2.7 push.py to see what is going on :)
RESOLVED:
Ok, problem was that I came into this project after the initial setup had been created and I had the impression that the logger was configured to use the .conf file in location:
/etc/awslogs/awslogs.conf
that was dynamically populated.
The environment had a script that gave this location to awslogs-agent-setup.py which tried to make the agent understand that configuration should be read from here.
However this script didn't actually do what it was supposed to do and when the service started, it actually read the config from
/var/awslogs/etc/awslogs.conf
Which contained the default values.
So the actual resolution was to change the datetime_format parameter in the default config and forget about the config I thought the service was using.
Add logging to /var/awslogs/lib/python2.7/site-packages/cwlogs/push.py and see how the actual config parameters are interpreted.
You will probably find out that the service is actually using configuration file at default location:
/var/awslogs/etc/awslogs.conf
and hence you have to edit configuration values there for them to be actually read.
I'm a newbie to CentOS and wanted to know the best way to parse journal logs to CloudWatch Logs.
My thought processes so far are:
Use FIFO to parse the journal logs and ingest this to Cloudwatch Logs, - It looks like this could come with draw backs where logs could be dropped if we hit buffering limits.
Forward journal logs to syslog and send syslogs to Cloudwatch Logs --
The idea is essentially to have everything logging to journald as JSON and then forward this across to CloudWatch Logs.
What is the best way to do this? How have others solved this problem?
Take a look at https://github.com/advantageous/systemd-cloud-watch
We had problems with journald-cloudwatch-logs. It just did not work for us at all.
It does not limit the size of the message or commandLine that it sends to CloudWatch and the CloudWatch sends back an error that journald-cloudwatch-logs cannot handle which makes it out of sync.
systemd-cloud-watch is stateless and it asks CloudWatch where it left off.
systemd-cloud-watch also creates the log-group if missing.
systemd-cloud-watch also uses the name tag and the private ip address so that you can easily find the log you are looking for.
We also include a packer file to show you how to build and configure a systemd-cloud-watch image with EC2/Centos/Systemd. There is no question about how to configure systemd because we have a working example.
Take a look at https://github.com/saymedia/journald-cloudwatch-logs by Matin Atkins.
This open source project creates a binary that does exactly what you want - ship your (systemd) journald logs to AWS CloudWatch Logs.
The project depends on libsystemd to forward directly to CloudWatch. It does not rely on forwarding to syslog. This is a good thing.
The project appears to use golang's concurrent channels to read the logs and batches writes.
Vector can be used to ship logs from journald to AWS CloudWatch Logs.
journald can be used as a source and AWS Cloudwatch Logs as a sink.
I'm working on integrating this with an existing deployment of about 6 EC2 instances that generate about 30 GB of logs daily. I'll update this answer with any caveats or gotchas after we've used Vector in production for a few weeks.
EDIT 8/17/2020
A few things to be aware of. The match batch size for the PutLogEvents is 1MB and there is a max of 5 requests per second per stream. See the limits here..
To help with that, in my set up each journald unit has it's own log stream. Also, there are a lot of fields that the Vector journald sink includes, I used a vector transform to remove all the ones I didn't need. However, I'm still running into rate limits.
EDIT 10/6/2020
I have this running in production now. I had to update the version of vector I was using from 0.8.1 to 0.10.0 to take care an issue with vector not respecting the max bytes per batch requirement for AWS CloudWatch logs. As far as the rate limit issues I was experiencing, it turns out I wasn't having any issues. I was getting this message in the vector logs tower_limit::rate::service: rate limit exceeded, disabling service. What that actually means is that vector is pausing send logs temporarily to respect the rate limit of the sink. Also, each Cloudwatch Log Stream can consume up to 18 GB per hour which is fine for my 30 GB per day requirement for over 30 different services on 6 VMs.
One issue I did run into was causing the CPU to spike on our main API service. I had a source for each service unit to tail the journald logs. I believe this somehow blocked our API from not being able to write to journald (not 100% though). What I did was have one source and specified multiple units to follow so there was only one command tailing the logs and I increased the batch size since each service generates a lot of logs. I then used vector's template syntax to split the Log Group and Log Stream based on the service name. Below is an example configuration:
[sources.journald_logs]
type = "journald"
units = ["api", "sshd", "vector", "review", "other-service"]
batch_size = 100
[sinks.cloud_watch_logs]
type = "aws_cloudwatch_logs"
inputs = ["journald_logs"]
group_name = "/production/{{host}}/{{_SYSTEMD_UNIT}}"
healthcheck = true
region = "${region}"
stream_name = "{{_SYSTEMD_UNIT}}"
encoding = "json"
I have one final issue I need to iron out, but it's not related to this question. I'm using a file source for nginx since it writes to an access log file. Vector is consuming 80% of the CPU on that machine getting the logs and sending them to AWS CloudWatch. Filebeat also runs on the same box sending the logs to Logstash, but it's never caused any issues. Once we get vector working reliably we'll retire the Elastic Stack, but for now we have them running side by side.