Error while submitting aws emr job from command line - amazon-web-services

I am getting below error form aws emr. I have submitted job from cli. The job status is pending.
A client error (ThrottlingException) occurred when calling the ListSteps operation: Rate exceeded
How to see the all the active jobs in emr cluster and how to kill them from cli and also from the aws console.
Regards
sanjeeb

AWS APIs are rate limited. According to the AWS docs the recommended approach to dealing with a throttling response is to implement exponential backoff in your retry logic means when you get ThrottlingException try to catch it and sleep for some time say half second and then retry ...

Related

AWS Batch: cancel job not working for jobs in RUNNABLE state

Summary
I am using AWS Batch in order to run Monte Carlo simulations. Occasionally I realise that a group of jobs that I have submitted to my queue are incorrect in some way and I wish to clean up the queue before more jobs start running.
When I try the to cancel the job through the AWS Console I get a the notification "Job cancellation request completed successfully". However, the job remains in the queue, even after waiting for multiple hours. I don't know how to cancel these jobs.
What I've tried
Cancelling jobs in the RUNNABLE through the AWS Console manually. I get a "Job cancellation request completed successfully", but no change.
Terminating jobs in the RUNNABLE through the AWS Console manually, instead of cancelling. No change either.
Cancelling jobs through the AWS CLI with aws batch cancel-job command as described in https://docs.aws.amazon.com/cli/latest/reference/batch/cancel-job.html
Terminating jobs through the AWS CLI with aws batch terminate-job command as described in https://docs.aws.amazon.com/cli/latest/reference/batch/terminate-job.html
For all of the previous cases, the job remained in the queue afterwards, with the same status (RUNNABLE).

Dataflow job throws error when publishing to Pub/Sub topic

I have streaming Dataflow job that sink the output data to a pub/sub topic. But randomly the job logs throws an error:
There were errors attempting to publish to topic projects/my_project/topics/my_topics. Recent publish calls completed successfully: 574, recent publish calls completed with error: 1.
there is no stack trace provided by the dataflow, and from the job metrics the error type is "unavailable". After some time, the error will stopped and the pipeline still running as usual. Does this error occurs because of internal error in the GCP service, or because of quota issue? The output request was peaked at 10 req/s.
Found the similar issue, it resolved by adding the "Pub/Sub Admin" permission to Compute Engine default service account under IAM permissions.

AWS CloudFormation Rate Exceeded

I am running a multi-branch pipeline in Jenkins for CI/CD that deploys a CloudFormation stack to my AWS account. Occasionally, when multiple developers push to their branches at the same time, I receive this error on one or more branches:
com.amazonaws.services.cloudformation.model.AmazonCloudFormationException:
Rate exceeded (Service: AmazonCloudFormation; Status Code: 400; Error
Code: Throttling;
This seems to be a rate limit that Amazon has imposed on the number of requests to CloudFormation within a specified time frame.
What is the request limit of CloudFormation, and can I request a limit increase?
No - Not the requests to the cloudformation API.
Most likely the issue will be that Jenkins pipeline requesting for updates every few seconds in order to get the current status. And when you are deploying multiple stacks you will hit this error.
This is probably a bug in the Cloudformation plugin in Jenkins - you'll need to raise a ticket and ask them to implement a backoff of requests if the cfn stack is taking longer than expected, so that it doesn't keep requesting the status of the stack as often.
You could also change your Jenkinsfile's to use the aws-cli which do a better job of managing requests to AWS on cfn updates.

Throttling ,Resource limit exceeded when i try to get credential report in AWS

I am trying to generate credential report. I get following error
aws iam generate-credential-report
An error occurred (Throttling) when calling the GenerateCredentialReport operation (reached max retries: 4): Rate exceeded
Also , from boto3 API , I am not getting the report.
Is there any way to set limit?
I opened a support case with AWS about it, here is their response:
Thank you for contacting AWS about your GetCredentialReport issue.
According to our IAM team, we have observed an increase in the call
volume of the IAM GenerateCredentialReport API. In order to avoid any
impact that increase in call volume might have on the service and our
customers, we blocked that API. Callers will receive LimitExceeded
exception. We are actively investigating a solution that will lead to
unblocking the API.
The API seems to be working now. This is the latest response from AWS Support regarding the issue:
"We have deployed a fix to the GenerareCredentialReport API issue
which will protect the IAM service from elevated latencies and error
rates. We are going to ramp up the traffic to the API over the next
few days. In the meanwhile, clients calling the API might receive
“LimitExceed Exception”. In this case, we recommend that the clients
retry with exponential back off."

Limits for AWS Batch job details retention

I'm trying to understand how long the details associated with an AWS Batch job are retained. For example, the Kinesis limits page describes how each stream defaults to a 24 hour retention period that is extendable up to 7 days.
The AWS Batch limits page does not include any details about either the maximum time or count allowed for jobs. It does say that one million is the limit for SUBMITTED jobs, but its unclear if that is exclusively for SUBMITTED or includes other states as well.
Does anybody know the details of batch job retention?
Job metadata for SUCCEEDED and FAILED jobs are retained for 24 hours. Metadata for Jobs in SUBMITTED, PENDING, RUNNABLE, STARTING, and RUNNING remain in the queue until the job completes. Your AWS Batch Jobs also log STDERR/STDOUT to CloudWatch Logs where you control the retention policy.
From AWS Batch official doc - https://docs.aws.amazon.com/batch/latest/userguide/batch_user.pdf
Under Jobs -> Job States (Page 23)
FAILED
The job has failed all available attempts. The job state for FAILED jobs is persisted in AWS Batch for at least 24 hours.
Note
Logs for FAILED jobs are available in CloudWatch Logs; the log group is /aws/batch/job, and the log stream name format is first200CharsOfJobDefinitionName/default/ecs_task_id (this format may change in the future). After a job reaches the RUNNING status, you can programmatically retrieve its log stream with the DescribeJobs API operation. For more information, see View Log Data Sent to CloudWatch Logs in the Amazon CloudWatch Logs User Guide. By default, these logs are set to never expire, but you can modify the retention period. For more information, see Change Log Data Retention in CloudWatch Logs in the Amazon CloudWatch Logs User Guide.