Customizing ec2 log rotation to S3 on elastic beanstalk - amazon-web-services

I have an AWS elastic beanstalk environment with some amazon linux instances running a tomcat8 server. I have enabled log rotation from the beanstalk console and I can see the logs getting published to S3 every hour.
I would like to reduce the frequency of the rotation from 1 hour to maybe 12 hours (or something customizable that i can decide later - if customization is limited, i can fallback to daily). The only related pointers that I've found in the documentation is that the logrotate configuration is at /etc/logrotate.elasticbeanstalk.hourly/ and that the cron job runs hourly as defined here /etc/cron.hourly/.
The default logrotate configuration for tomcat is set to rotate after size:10mb but the force flag in the cron task basically ignores this and ends up rotating the log file much sooner (I don't have a whole lot of traffic). Too many log files makes it very annoying to use for any sort of debugging later.
How can I go about changing the logrotate configuration and override the cron job? Is the recommended option to overwrite these config files via a script in .ebextensions folder?
When an instance is terminated (replaced by another one during rolling updates or any other reason) then does Elastic Beanstalk automatically back up the pending logs to S3 or do we lose them? What changes should I do or not do in above configuration so I can ensure that all logs are updated to S3?

Related

Python pipeline on AWS Cloud

I have few python scripts which need to be executed in sequence on AWS Cloud so what are the best and simplest options? These script files are proof of concept so little bit dirty also but need to run overnight. Most of the script finishes within 10 mins but couple of them can take up to 1 hour running on a single core.
We do not have any servers like Jenkins, airflow etc...we are planning to use existing aws services.
Please let me know, Thanks.
1) EC2 Instance (Manually controlled)
Upload your scripts to an S3 bucket Use default VPC
launch EC2 Instance
Use SSM Remote session to log in
Run AWS CLI (AWS S3 Sync to download from S3)
Run them Manually
stop instance when done.
To be clean, make a SH file (or master .py file) to do the work. If you want it to stop charging you money afterwards, add command to stop instance when complete.
Least amount of work
2) If you want to run scripts daily
- Script out the work above (include modifying the Autoscale group at end to go to one box)
- Create an EC2 Auto Scale Group and launch it on a CRON job schedule.
It will start up, do the work, and then shut down and stop charging you.
3) Lambda
Pretty much like option 2, but AWS will do most of the work for you.
Either put all your scripts into one lambda..or put each script into its own lambda and have a master that does sync invoke of each script in the order you want.
You have a cloudwatch alarm trigger daily and does the work
I would say that if you are in POC mode, option 1 is best decision. It is likely closest to what you already do where you are currently executing. This is what #jarmod recommended already.
You didn't mention anything about which AWS resources your python scripts need to access or at least the purpose of the scripts, so it is difficult to provide a solution.
However a good option is to use AWS Batch.

How can I configure Elastic Beanstalk to show me only the relevant log file(s)?

I'm an application developer with very limited knowledge of infrastructure. At my last job we frequently deployed Java web services (built as WAR files) to Elastic Beanstalk, and much of the infrastructure had already been set up before I ever started there, so I got to focus primarily on the code and not how things were tied together. One feature of Elastic Beanstalk that often came in handy was the button to "Request Logs," where you can select either the "Last 100 Lines" or the "Full Logs." What I'm used to seeing when clicking this button is to directly view the logs generated by my web service.
Now, at the new job, the infrastructure requirements are a little different, as we have to Dockerize everything before deploying it. I've been trying to stand up a Spring Boot web app inside a Docker container in Elastic Beanstalk, and have been running into trouble with that. And I also noticed a bizarre difference in behavior when I went to "Request Logs." Now when I choose one of those options, instead of dropping me into the relevant log file directly, it downloads a ZIP file containing the entire /var/log directory, with quite a number of disparate and irrelevant log files in there. I understand that there's no way for Amazon to know, necessarily, that I don't care about X log file but do care about Y log file, but was surprised that the behavior is different from what I was used to. I'm assuming this means the EB configuration at the last job was set up in a specific way to filter the one relevant log file, or something like that.
Is there some way to configure an Elastic Beanstalk application to only return one specific log file when you "Request Logs," rather than a ZIP file of the /var/log directory? Is this done with ebextensions or something like that? How can I do this?
Not too sure about the Beanstalk console, but using the EBCLI, if you enable CloudWatch log streaming (note that this would cost you to store logs in CloudWatch) for your Beanstalk instances, you can perform:
eb logs --stream --log-group <CloudWatch logGroup name>
The above command basically gives you the logs for your instance specific to the file/log group you specified. In order for the above command to work, you need to enable CloudWatch log streaming:
eb logs --stream enable
As an aside, to determine which log groups your environment presently has, perform:
aws logs describe-log-groups --region <region> | grep <beanstalk environment name>

How to keep logs in AWS if application restarts?

I run a spring boot application in AWS with Docker. Sometimes Amazon have to restart a hardware. Then Environment Health of instance in Beanstalk goes Degraded, Warning, and restarts.
I do want my app logs from the last 7 days but it was restarted due to unforeseen AWS hardware issues so I lost my information. How can I avoid it and make AWS to save all my logs even after restart?
It is true that archiving logs to S3 would work for the most part but you may want to consider installing and configuring the CloudWatch Logs agent - http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/QuickStartEC2Instance.html
This will stream logs directly to CloudWatch and save them upon termination. You also could consider numerous other solutions for this such as Sumo Logic, ELK, Splunk, etc.
You should always build solutions so as to be ready even when hardware crashes. One possible solution could be that while rotating log files send them to S3 bucket. You can create a cron-job to do this.

Managing/deleting/rotating/streaming Elastic Beanstalk Logs

I am using Amazon EB for the first time. I've setup a Rails app running on linux and puma.
So far, I've been viewing logs through the eb logs command. I know that we can set EB to rotate the logs to S3 or stream it to CloudWatch.
My question here revolved around the deletion of the various log files.
Will the various logs, such as puma.log be deleted automatically or must I do it myself?
If i setup log rotations to S3, will the log files on the EC2 instance be deleted (and a fresh copy created in its place) when it gets rotated to S3? Or does it just keep growing indefinitely?
If i stream it to CloudWatch, will the same copy of the log be kept on the EC2 instance and grow indefinitely?
I've googled around but can't seem to find any notion of "Log management" or "log deletion" in the docs or on SO.
I'm using beanstalk on a LAMP project and I can answer a few of your questions.
You have to setup your log rotation policy at least on your app logs. Check if your base image already rotate this logs for you. The config should be in /etc/logrotate.conf for linux
When you use S3 logs with Beanstalk, it already tails and delete the logs after 15min. http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.logging.html#health-logs-s3location
The same copy of the log will be kept in your EC2 instance. Your log rotation policy /etc/logrotate.conf will be the one that will delete it. awslogs will keep some metadata to know which was the processed chunk of the logs so it does not create duplicates.
If you want an example on how to use cloudwatch logs with elasticbeanstalk check: http://www.albertsola.pro/store-aws-beanstalk-symfony-and-apache-logs-in-cloudwatch-logs/

Easier way to access ElasticBeanstalk EC2 Log files

I am programming a Jersey service on Tomcat via EBS with LoadBalancer. I am finding getting the EC2's S3 catalina files very cumbersome. Currently I need to determine the EC2 instance(s) then work my way to each of the S3 locations, download the files, then I can diagnose.
The snapshot doesn't help due to the amount of requests that come in, it doesn't hold enough info and by the time I get the snapshot, it has "rolled" off the snapshot.
Two questions:
1) Is there an easier approach to logs files via AWS? (Increase time before rotation which I don't believe is supported as of now, scripts, etc)
2) Is there any software or scripts to access all the logs under load balancer? I am basically wanting to say "give me all logs for this EBS" and have it get all logs for that day under all servers for that load balancer (up or down)". The clincher is down. Problem becomes more complex when the load balancer takes down an instance right when the issue occurs.
Thanks!
As an immediate solution to your problem you can follow the approach suggested in this answer. Essentially you can modify the logrotate configuration to rotate for a bigger log size using ebextensions.
Then snapshot logs should work for you.
Let me know if you need more clarifications on this approach.
AWS has released CloudWatch Logs just last week, which enables you to to monitor and troubleshoot your systems and applications using your existing system, application, and custom log files:
You can send your existing system, application, and custom log files to CloudWatch Logs and monitor these logs in near real-time. [...] you can store your logs using highly durable, low-cost storage for later access.
See the introductory blog post Store and Monitor OS & Application Log Files with Amazon CloudWatch for an illustrated walk through, which touches on using Elastic Beanstalk and CloudWatch Logs already - this is further detailed in Using AWS Elastic Beanstalk with Amazon CloudWatch Logs.