I have a workflow that contains a bunch of activities. I store each activity's response in a S3 bucket.
I pass the S3 key as an input to each activity. Inside the activity, I have a method that retrieve the data from S3 and perform some operation. But my last activity failed and threw error:
Caused by: com.amazonaws.AmazonServiceException: Request entity too large (Service: AmazonSimpleWorkflow; Status Code: 413; Error Code: Request entity too large; Request ID: null)
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:820)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:439)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:245)
at com.amazonaws.services.simpleworkflow.AmazonSimpleWorkflowClient.invoke(AmazonSimpleWorkflowClient.java:3173)
at com.amazonaws.services.simpleworkflow.AmazonSimpleWorkflowClient.respondActivityTaskFailed(AmazonSimpleWorkflowClient.java:2878)
at com.amazonaws.services.simpleworkflow.flow.worker.SynchronousActivityTaskPoller.respondActivityTaskFailed(SynchronousActivityTaskPoller.java:255)
at com.amazonaws.services.simpleworkflow.flow.worker.SynchronousActivityTaskPoller.respondActivityTaskFailedWithRetry(SynchronousActivityTaskPoller.java:246)
at com.amazonaws.services.simpleworkflow.flow.worker.SynchronousActivityTaskPoller.execute(SynchronousActivityTaskPoller.java:208)
at com.amazonaws.services.simpleworkflow.flow.worker.ActivityTaskPoller$1.run(ActivityTaskPoller.java:97)
... 3 more
I know AWS SWF has some limits on data size, but I am only passing a S3 Key to activity. Inside activity, it will read from S3 and process the data. I am not sure why I am getting this error. If anyone knows, please help! Thanks a lot!
Your activity failed as respondActivityTaskFailed SWF API call is seen in the stack trace. So my guess is that the exception message + stack trace exceeded the maximum size allowed by SWF service.
Related
I've this requirement to read a folder in a bucket that exactly contains 13.7 TB of data to AWS GLUE...
I used the below code to read all those data from that folder
datasource0 = glueContext.create_dynamic_frame_from_options(connection_type='s3', connection_options=s3_options,
format="json")
Then under Job Details, I set the Worker Type to G.2X with Requested number of workers as 50 which will be it's max capacity of 100 DPU's.
But, it ran for 13hrs trying to read those data and failed with the below error
An error occurred while calling o94.getDynamicFrame. Cannot call methods on a stopped SparkContext. caused by Unable to execute HTTP request: Request did not complete before the request timeout configuration.
So, is it possible for AWS GLUE to read such sized data from S3!?
Thanks In Advance...
When I deploy serverless framework to AWS cloudformation stack, I got this error message.
Rate exceeded (Service: AWSSimpleSystemManagement; Status Code: 400; Error Code: ThrottlingException; Request ID: ....)
Do you have any idea how to resolve it?
Not sure what exactly returns this one to you but seems like you are deploying too fast or querying the AWSSimpleSystemManagement too fast as you are getting throttling exception.
Double check your code for bugs (maybe you are doing an action N times and not once). If you need to interact with AWSSimpleSystemManagement at this rate probably you can increase the number of requests via a ticket to AWS.
If that’s not the case, open them a ticket.
I am facing an issue while invoking the Pytorch model Endpoint. Please check the below error for detail.
Error Message:
An error occurred (InternalFailure) when calling the InvokeEndpoint operation (reached max retries: 4): An exception occurred while sending request to model. Please contact customer support regarding request 9d4f143b-497f-47ce-9d45-88c697c4b0c4.
Automatically restarted the Endpoint after this error. No specific log in cloud watch.
There may be a few issues here we can explore the paths and ways to resolve.
Inference Code Error
Sometimes these errors occur when your payload or what you're feeding your endpoint is not in the appropriate format. When invoking the endpoint you want to make sure your data is in the correct format/encoded properly. For this you can use the serializer SageMaker provides when creating the endpoint. The serializer takes care of encoding for you and sends data in the appropriate format. Look at the following code snippet.
from sagemaker.predictor import csv_serializer
rf_pred = rf.deploy(1, "ml.m4.xlarge", serializer=csv_serializer)
print(rf_pred.predict(payload).decode('utf-8'))
For more information about the different serializers based off the type of data you are feeding in check the following link.
https://sagemaker.readthedocs.io/en/stable/api/inference/serializers.html
Throttling Limits Reached
Sometimes the payload you are feeding in may be too large or the API request rate may have been exceeded for the endpoint so experiment with a more compute heavy instance or increase retries in your boto3 configuration. Here is a link for an example of what retries are and configuring them for your endpoint.
https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-python-throttlingexception/
I work for AWS & my opinions are my own
I'm currently using a Lambda Function written in Javascript that is setup with an SQS event source to automatically pull messages from an SQS Queue and do some basic processing on the message contents. I cannot show the code but the summary of the lambda function's execution is basically:
For each message in the batch it receives as part of the event:
It parses the body, which is a JSON string, into a Javascript object.
It reads an object from S3 that is listed in the object using getObject.
It puts a record into a DynamoDB table using put.
If there were no errors, it deletes the individual SQS message that was processed from the Queue using deleteMessage.
This SQS queue is high-volume and receives messages in-bulk, regularly building up a backlog of millions of messages. The Lambda is normally able to scale to process hundreds of thousands of messages concurrently. This solution has worked well for me with other applications in the past but I'm now encountering the following intermittent error that reliably begins to appear as the Lambda scales up:
[ERROR] [#############] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 400.
I've been unable to find any information anywhere about what this error means and what causes it. There appears to be not discernible pattern as to which executions encounter it. The function is usually able to run for a brief period without encountering the error and scale to expected levels. But then, as you can see, the error starts to appear quite suddenly and completely destroys the Lambda throughput by forcing it to auto-scale down:
Does anyone know what this "LAMBDA_RUNTIME" error means and what might cause it? My Lambda Function runtime is Node v12.
Your function is being invoked asynchronously, so when it finishes it signals the caller if it was sucessful.
You should have an error some milliseconds earlier, probably an unhandled exception not being logged. If that's the case, your functions ends without knowing about the exception and tries to post a success response.
I have this error only that I get:
[ERROR] [1638918279694] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413.
I went to the lambda function on aws console and ran the test with a custom event I build and the error I got there was:
{
"errorMessage": "Response payload size exceeded maximum allowed payload size (6291556 bytes).",
"errorType": "Function.ResponseSizeTooLarge"
}
So this is the actual error that cloudwatch doesn't return but the testing section of the lambda function console do.
I think I'll have to return info to an S3 file or something, but that's another matter.
I am calling AWS Storage Gateway refreshCache method quite too frequently I guess, (As the message suggests), but I am not sure how long do I need to wait till I hit it again, any help will be appreciated.
AWSStorageGateway gatewayClient = AWSStorageGatewayClientBuilder.standard().build();
RefreshCacheRequest cacheRequest = new RefreshCacheRequest();
cacheRequest.setFileShareARN(this.fileShareArn);
gatewayClient.refreshCache(cacheRequest);
com.amazonaws.services.storagegateway.model.InvalidGatewayRequestException: Too many requests have been sent to server. (Service: AWSStorageGateway; Status Code: 400; Error Code: InvalidGatewayRequestException; Request ID: f1ffa249-6908-4ae1-9f71-93fe7f26b2af)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
I think you can refer to the official document. https://docs.aws.amazon.com/storagegateway/latest/APIReference/API_RefreshCache.html
As it said,
When this API is called, it only initiates the refresh operation. When the API call completes and returns a success code, it doesn't necessarily mean that the file refresh has completed. You should use the refresh-complete notification to determine that the operation has completed before you check for new files on the gateway file share.
So I guess after you called AWS Storage Gateway refreshCache method, you must wait until the refresh action completed. And if you call the method again during this period,some exceptions will be raised.
For the solution, you can refer to Monitoring Your File Share to set a notification.