AWS - Log aggregation and visualization - amazon-web-services

We have couple of application running on AWS. Currently we are redirecting all our logs to single bucket. However for ease of access to users, I am thinking to install ELK Stack on EC2 instance.
Would want to check if there is alternate way available where I don't have to maintain this stack.
Scaling won't be an issue, as this is only for logs generated through application running on AWS, so not ingestion or processing is required. mostly log4j logs.

You can go for either the managed Elasticsearch available in AWS or setup your own in an EC2 instance
It usually comes down to the price involved and the amount of time you have in hand in setting up and maintaining your own setup
With your own setup, you can do a lot more configurations than that provided by the managed service and also helps in reducing the cost
You can find more info on this blog

Related

Centralized Logs Storage for Load Balanced Managed Instance Group

I have setup a Managed Instance Group with initial 3 instances (I installed Lumen inside, and the web server is auto started) to be used with the GCP load balancer. The LB works great.
However, whenever I need to trace lumen logs, I need to SSH every single instance to view the logs. Is there any best practices of one centralized storage I can refer to for the logs?
Can I mount the lumen logs into a centralized disk e.g. GCP filestore volume, or Google storage bucket or using FluendD to dump my logs into GCP Logging?
Please, I need to know the best industrial practice. THanks
STACK DRIVER is the right option for your case
https://cloud.google.com/logging/docs/agent/installation#joint-install
Install the stack driver logging agent on compute engine instances. You can track your logs lively also you can create visualizations and useful analysis out of it. Stackdriver is the best industry standard for the people who is using GCP. remember the pricing. Please check the pricing details

Exploring tools to trigger build script to rollout specific git branch to a subset of the amazon ec2 instances

We have multiple amazon ec2 instances behind a load balancer. Our build script is written in phing and is integrated with git.
We are looking for a tool (like Jenkins or Amazon code deploy) which could display all the active instances currently behind load balancer and then allow us to select some of them (or select a group defined previously) and then trigger either of the following (whichever is better) -
a build script hosted on the same dedicated server where the tool is hosted.
or the respective build scripts hosted on the selected ec2 instances.
We should be able to do the following -
specify a git branch name, optionally, when we trigger the build script for any group of instances.
be able to roll out in batches of boxes, so as to get some time to monitor load, and then move to next batch if all is good. Best way, I guess, would be to specify a size of the batch (e.g. 10), so that the process waits for a user prompt after rollout on every batch completes.
So, if we have to rollout two different git branches to two groups of instances, we should be able to run them in two steps (if we do not specify batch size).
Would like to know about experiences of people who dealt with something similar.
For CodeDeploy, it supports Git (more precisely, GitHub). It also allows you to deploy only to tagged EC2 instances. If combined with custom DeploymentConfig (http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-create-deployment-configuration.html), you can also control how fast (the size of the batch) to deploy.
I would re-structure the question:
The choices you have for application deployment
and whether the tool has option to perform rolling deployments.
Jenkins is software for CI/CD, which will have to use plugins,custom scripting or leverage an existing orchestration software setup for doing the deployments.
For software orchestration, you have many choices, some of the more famous tools are Chef, puppet, ansible etc.. All of these would need you to manage some kind of centralized setup. All such software support application deployment.
You need to make a decision on whether you would want to invest in maintaining such a setup.
If you decide against such a setup, you have the option of using managed services such as AWS OpsWorks, AWS CodeDeploy, hosted chef etc.
In choosing any of these services, you delegate the management of orchestration software to a vendor, which will ensure the service is up all the time.
AWS code deploy and AWS OpsWorks are managed services on aws and work pretty well on AWS setups.
AWS OpsWorks uses chef under the hood.
AWS CodeDeploy only provides a subset of what OpsWorks provides and is responsible only for deployments. With AWS code deploy you get convenient visualization of your software deployments through AWS console.
With AWS code deploy, you can achieve the goal of partial roll out to ec2 instances.
You can do the same with other tools as well but CodeDeploy on AWS environment will take least amount of work.
CodeDeploy also allows you to deploy from GIT. Please refer to the following aws documentation
http://docs.aws.amazon.com/codedeploy/latest/userguide/github-integ-tutorial.html
The pitfall with code deploy is the fact that the agent that will run on instances has been tested for and is supported for only a limited number of OS combinations.(http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-run-agent.html#how-to-run-agent-supported-oses)
Also in future if you decide to move away from AWS, you will have to redo the deployment related work.
CodeDeploy service only charges you for the underneath AWS resources.
Please find the link to pricing documentation below:
https://aws.amazon.com/codedeploy/pricing/

Amazon ec2 set up best practices for Rails app with mysql or postgres

I have to setup ec2 for a medium rails app running on apache2, mysql, capistrano and a few background services. I would like to know what is the best practices that every developer usually does to set up his rails app. I would like to know what kind of setup that is easy to scale and can mimimic at least
auto deployment
security
regular data backup and an easy and quick way to restore the data
server recovery
fault tolerance
I am also interested in how to monitor the server status and performance and other kind of best practice would be also helpful.
ps: take into account also that my app database will grow a fast.
I think a good look into the AWS docs and in particular the architecture center would be the best place to start. However, let me address as many of your questions as I can.
Database
The easiest way to get a scalable, fault tolerant database on AWS is to use the Relational Database Service. You should read the docs and best practices to ensure you get the most out of it - ie. multiple AZs.
EC2 Servers
The most recommended way to structure your servers is to decouple them into Web Servers (serve html to users) and App Servers (application logic, usually returns json or xml etc). See this architecture example.
However, the key is to use an AutoScaling group behind an Elastic Load Balancer.
Automation
If you want to use capistrano, just install it into your servers. You could create a pre-configured AMI with it installed along with whatever else you want. Alternatively, you could install it in a deployment script. However, the most recommended method for this kind of thing is to use the AWS OpsWorks service which is Chef in the cloud.
Server Recovery & Fault Tolerance
If you use EC2 AutoScaling, if a server becomes unavailable ie. hardware fails or it stops replying to EC2 health checks, AutoScaling will automatically terminate it and launch a replacement.
With the addition of the ELB and ELB health checks, instances that stop responding to web requests can be brought out of service by the ELB.
You need to read the docs for more info on this.
Backup and Recovery
For backing up data on EBS volumes attached to EC2 instances, use EBS Snapshots. However, the best types of architectures keep EC2 instances stateless - they don't store anything except application code, if they died it wouldn't matter. In these situations all data, including user files can be stored on S3. On S3, you have a number of back up options such as Cross Region Replication and or data archiving to Glacier
Monitoring
AWS provides CloudWatch which can provide you with hypervisor visible metrics such as network in and out, CPU utilization and more. If you want to get more data, you could use custom metrics and push things like eg. memory usage. In addition to cloudwatch, you could use a server level monitoring tool.
Deployment
I recommend AWS Code Deploy.
Security
Use Security Groups to open only the ports you want users to be able to connect on. Also, use security groups to lock down important ports eg.22 to only a specific set of IPs. You can also use Network ACLS to block undesired traffic. AWS provides more information and suggestions here.
I also recommend you read this Whitepaper.

updating all files on AWS EC2

I'm trying to determine the "best" way for a small company to keep web app EC2 instances in sync with current files while using autoscaling.
From my research, CloudFormation, Chef, Puppet, OpsWorks, and others seem like the tools to do so. All of them seem to have a decent learning curve, so I am hoping someone can point me in the right direction and I'll learn one.
The initial setup I am after is:
Route53
1x Load Balancer
2x EC2 (different AZ) - Apache/PHP
1x ElastiCache Redis
2x EC2 (different AZ) w/ MySQL
Email thru Google Apps
Customer File/Image Storage via S3
CloudFront for CDN
The only major challenge I can see is versioning/syncing the web/app server. We're small now, so I could probably just manually update the EBS or even using rsync, but I would rather automate it and be setup for autoscaling.
This is probably too broad of a question and may be closed, but let me give you a few thoughts.
Why not use RDS for MySQL?
You need to get into the thought of how to make and promote disk images. In the cloud world, you don't want to be rsyncing around a bunch of files from server to server. When you are ready to publish a revised set of code, just make am image from your staging environment, start new EC2 instances in your ELB based on that image, and turn off old instances. You may have a little different deployment sequence if you need to coordinate with DB schema changes, but that is a pretty straightforward approach.
You should still seek to automate some of your activities using tools such as those you mentioned. You don't need to do this all at once. Just figure out a manual part in your process that you want to automate and do it.

Easier way to access ElasticBeanstalk EC2 Log files

I am programming a Jersey service on Tomcat via EBS with LoadBalancer. I am finding getting the EC2's S3 catalina files very cumbersome. Currently I need to determine the EC2 instance(s) then work my way to each of the S3 locations, download the files, then I can diagnose.
The snapshot doesn't help due to the amount of requests that come in, it doesn't hold enough info and by the time I get the snapshot, it has "rolled" off the snapshot.
Two questions:
1) Is there an easier approach to logs files via AWS? (Increase time before rotation which I don't believe is supported as of now, scripts, etc)
2) Is there any software or scripts to access all the logs under load balancer? I am basically wanting to say "give me all logs for this EBS" and have it get all logs for that day under all servers for that load balancer (up or down)". The clincher is down. Problem becomes more complex when the load balancer takes down an instance right when the issue occurs.
Thanks!
As an immediate solution to your problem you can follow the approach suggested in this answer. Essentially you can modify the logrotate configuration to rotate for a bigger log size using ebextensions.
Then snapshot logs should work for you.
Let me know if you need more clarifications on this approach.
AWS has released CloudWatch Logs just last week, which enables you to to monitor and troubleshoot your systems and applications using your existing system, application, and custom log files:
You can send your existing system, application, and custom log files to CloudWatch Logs and monitor these logs in near real-time. [...] you can store your logs using highly durable, low-cost storage for later access.
See the introductory blog post Store and Monitor OS & Application Log Files with Amazon CloudWatch for an illustrated walk through, which touches on using Elastic Beanstalk and CloudWatch Logs already - this is further detailed in Using AWS Elastic Beanstalk with Amazon CloudWatch Logs.