I was reading the Datadog docs on how to monitor AWS Elastic Map Reduce using Datadog because I need to get metrics for failed EMR steps.
LIKE HERE
I think the most accurate metric is aws.elasticmapreduce.jobs_failed but as the image says, is only available for Hadoop V1, but I'm using Hadoop V2... so I don't see it in my Datadog Metric Explorer
Can someone help me?
Is there a replacement for that metric in Hadoop v2?
Is there another way to monitor failed EMR steps with Datadog?
The steps are Sqoop jobs
Related
We use Airflow to orchestrate aws ETL jobs. Some of these jobs start an EMR cluster and add EMR steps to that cluster. Airflow pulls logs from Cloudwatch. Logs created during an EMR step is not directly available in CloudWatch, but are only found in hidden in the cluster logs. If an EMR step fails, the logs to identify the error is therefore not readily available. We would like logs created from EMR steps to be displayed in CloudWatch. We have tried the installing the CloudWatch agent, but that does not seem to work. How can we get logstreams from EMR steps to CloudWatch?
Using Grafana's CloudWatch data source and a little InfluxDB magic, I can pull many metrics from my live environment; like CPU utilisation, memory utilisation, host count, thread count, ect etc.
These metrics will make more sense if I can spot the moments of live deployments on that graph.ELB Health Host Count metric kinda helps but does not show deployments, rather shows auto scale activities.
I can't find any metrics in AWS CloudWatch adapter for CodeDeploy. Dooes anybody has a way of doing this?
(My Env: Sprint Boot app on Docker containers deployed on AWS Fargate using CodeDeploy)
You can push datapoints into a CloudWatch metric using the "put-metric-data" aws cli call [1]. You can call this command from the AppSpec file hooks like BeforeInstall and AfterInstall. Make sure the EC2 instance role has the requisite permissions.
[1] https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/put-metric-data.html
[2] https://docs.aws.amazon.com/codedeploy/latest/userguide/reference-appspec-file-example.html#appspec-file-example-server
I'm trying to automate the turning on and off process of Redis Cluster in aws. I saw the following link for reference (https://forums.aws.amazon.com/thread.jspa?threadID=149772). Is there a way to do it via cloudwatch ?
I am very new to aws platform.
Check the documentation regarding scale in/out
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/redis-cluster-resharding-online.html It also has commands to reshard a cluster manually.
Check CloudWatch metrics from the Redis cluster. https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/CacheMetrics.HostLevel.html and https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/CacheMetrics.Redis.html Choose the metrics that will trigger autoscaling
You can trigger an AWS Lambda on some event for a metric https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html
From the Lambda you cal call aws cli to reshard the cluster as described in 1. Example: https://alestic.com/2016/11/aws-lambda-awscli/
If you need to turn off the cluster completely, instead of the resharding commands just use https://docs.aws.amazon.com/cli/latest/reference/elasticache/delete-cache-cluster.html
I am running a spark cluster on AWS EMR. How do I get all all the details of the jobs and executors that are running on AWS EMR without using the spark UI. I am going to use it for monitoring and optimization.
You can checkout nagios or ganglia for cluster health but you cant see the jobs running on spark with these tools.
If you are using AWS EMR you can do that using lynx server. something like below.
Login to the master node of the cluster.
try the below command
lynx http://localhost:4040
Note : before you type the command make sure you are running a job
What should be suitable configuration to set up 2-3 node hadoop cluster on AWS ?
I want to set-up Hive, HBase, Solr, Tomcat on hadoop cluster with purpose of doing small POC's.
Also please suggest option to go with EMR or with EC2 and manually set up cluster on that.
Amazon EMR can deploy a multi-node cluster with Hadoop and various applications (eg Hive, HBase) within a few minutes. It is much easier to deploy and manage than trying to deploy your own Hadoop cluster under Amazon EC2.
See: Getting Started: Analyzing Big Data with Amazon EMR