We use Airflow to orchestrate aws ETL jobs. Some of these jobs start an EMR cluster and add EMR steps to that cluster. Airflow pulls logs from Cloudwatch. Logs created during an EMR step is not directly available in CloudWatch, but are only found in hidden in the cluster logs. If an EMR step fails, the logs to identify the error is therefore not readily available. We would like logs created from EMR steps to be displayed in CloudWatch. We have tried the installing the CloudWatch agent, but that does not seem to work. How can we get logstreams from EMR steps to CloudWatch?
Related
It is not clear to me that
application logging inside a Spark App itself, running on AWS EMR,
executed via spark-shell or Steps
will end up in CloudWatch Logs for reporting, if the CloudWatch Agent is installed on the EMR Cluster.
Will it or not?
I was reading the Datadog docs on how to monitor AWS Elastic Map Reduce using Datadog because I need to get metrics for failed EMR steps.
LIKE HERE
I think the most accurate metric is aws.elasticmapreduce.jobs_failed but as the image says, is only available for Hadoop V1, but I'm using Hadoop V2... so I don't see it in my Datadog Metric Explorer
Can someone help me?
Is there a replacement for that metric in Hadoop v2?
Is there another way to monitor failed EMR steps with Datadog?
The steps are Sqoop jobs
I'm trying to automate the turning on and off process of Redis Cluster in aws. I saw the following link for reference (https://forums.aws.amazon.com/thread.jspa?threadID=149772). Is there a way to do it via cloudwatch ?
I am very new to aws platform.
Check the documentation regarding scale in/out
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/redis-cluster-resharding-online.html It also has commands to reshard a cluster manually.
Check CloudWatch metrics from the Redis cluster. https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/CacheMetrics.HostLevel.html and https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/CacheMetrics.Redis.html Choose the metrics that will trigger autoscaling
You can trigger an AWS Lambda on some event for a metric https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html
From the Lambda you cal call aws cli to reshard the cluster as described in 1. Example: https://alestic.com/2016/11/aws-lambda-awscli/
If you need to turn off the cluster completely, instead of the resharding commands just use https://docs.aws.amazon.com/cli/latest/reference/elasticache/delete-cache-cluster.html
I use AWS EMR cluster to run HIVE query. For query optimization purpose, sometime I need to kill a long-running step but keep the EMR cluster live so I can keep using it. Is there a way to do it either in HIVE CLI or AWS console?
Please refer here for the detail. To cancel steps using the AWS CLI:
aws emr cancel-steps --cluster-id j-2QUAJ7T3OTEI8 --step-ids s-3M8DKCZYYN1QE
I am running a spark cluster on AWS EMR. How do I get all all the details of the jobs and executors that are running on AWS EMR without using the spark UI. I am going to use it for monitoring and optimization.
You can checkout nagios or ganglia for cluster health but you cant see the jobs running on spark with these tools.
If you are using AWS EMR you can do that using lynx server. something like below.
Login to the master node of the cluster.
try the below command
lynx http://localhost:4040
Note : before you type the command make sure you are running a job