Amazon S3 Client setReadLimit - amazon-web-services

While uploading file to S3 , we are getting this random error msg for one single case
"If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)"
source being : https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java
As per AWS SDK for Java 1.8.10
We can set maximum stream buffer size to be configured per request via
request.getRequestClientOptions().setReadLimit(int)
We are using com.amazonaws.services.s3.AmazonS3 object to upload data.
Can anyone suggest how we can set ReadLimit() via com.amazonaws.services.s3.AmazonS3
https://aws.amazon.com/releasenotes/0167195602185387

It sounds like you're uploading data from an InputStream, but some sort of transient error is interrupting the upload. The SDK isn't able to retry the request because InputStreams are mark/resetable by default. The error message is trying to give guidance on buffer sizes, but for large data, you probably don't want to load it all into memory anyway.
If you're able to upload from a File source, then you shouldn't see this error again. Because Files are resettable, the SDK is able to retry your request if it encounters an error during the first attempt.

A little bit necroing, but you need to create a PutObjectRequest and use the setReadLimit on that:
PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName, key, fileInputStream, objectMetadata);
putObjectRequest.getRequestClientOptions().setReadLimit(xxx);
s3Client.putObject(putObjectRequest);
If you look in the implementation of the putObjectRequest(String, String, InputStream, ObjectMetadata), you can see that it just creates a PutObjectRequest and passes that to putObject(PutObjectRequest)

Related

LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413

I have node/express + serverless backend api which I deploy to Lambda function.
When I call an api, request goes to API gateway to lambda, lambda connects to S3, reads a large bin file, parses it and generates an output in JSON object.
The response JSON object size is around 8.55 MB (I verified using postman, running node/express code locally). Size can vary as per bin file size.
When I make an api request, it fails with the following msg in cloudwatch,
LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413
I can't/don't want to change this pipeline : HTTP API Gateway + Lambda + S3.
What should I do to resolve the issue ?
the AWS lambda functions have hard limits for the sizes of the request and of the response payloads. These limits cannot be increased.
The limits are:
6MB for Synchronous requests
256KB for Asynchronous requests
You can find additional information in the official documentation here:
https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html
You might have different solutions:
use EC2, ECS/Fargate
use the lambda to parse and transform the bin file into the desired JSON. Then save this JSON directly in an S3 public bucket. In the lambda response, you might return the client the public URL/URI/FileName of the created JSON.
For the last solution, if you don't want to make the JSON file visible to whole the world, you might consider using AWS Amplify in your client or/and AWS Cognito in order to give only an authorised user access to the file that he has just created.
As noted in other questions, API Gateway/Lambda has limits on on response sizes. From the discussion I read that latency is a concern additionally.
With these two requirements Lambda are mostly out of the question, as they need some time to start up (which can be lowered with provisioned concurrency) and do only have normal network connections (whereas EC2,EKS can have enhanced networking).
With this requirements it would be better (from AWS Point Of View) to move away from Lambda.
Looking further we could also question the application itself:
Large JSON objects need to be generated on demand. Why can't these be pre-generated asynchronously and then downloaded from S3 directly? Which would give you the best latency and speed and can be coupled with CloudFront
Why need the JSON be so large? Large JSONs also need to be parsed on the client side requiring more CPU. Maybe it can be split and/or compressed?

how to stream microphone audio from browser to S3

I want to stream the microphone audio from the web browser to AWS S3.
Got it working
this.recorder = new window.MediaRecorder(...);
this.recorder.addEventListener('dataavailable', (e) => {
this.chunks.push(e.data);
});
and then when user clicks on stop upload the chunks new Blob(this.chunks, { type: 'audio/wav' }) as multiparts to AWS S3.
But the problem is if the recording is 2-3 hours longer then it might take exceptionally longer and user might close the browser before waiting for the recording to complete uploading.
Is there a way we can stream the web audio directly to S3 while it's going on?
Things I tried but can't get a working example:
Kineses video streams, looks like it's only for real time streaming between multiple clients and I have to write my own client which will then save it to S3.
Thought to use kinesis data firehose but couldn't find any client data producer from brower.
Even tried to find any resource using aws lex or aws ivs but I think they are just over engineering for my use case.
Any help will be appreciated.
You can set the timeslice parameter when calling start() on the MediaRecorder. The MediaRecorder will then emit chunks which roughly match the length of the timeslice parameter.
You could upload those chunks using S3's multipart upload feature as you already mentioned.
Please note that you need a library like extendable-media-recorder if you want to record a WAV file since no browser supports that out of the box.

Kinesis put records not returned in response from get records request

I have a Scala app using the aws-java-sdk-kinesis to issue a series of putRecord requests to a local kinesis stream.
The response returned after each putRecord request indicates its successfully putting the records into the stream.
The scala code making the putRecordRquest:
def putRecord(kinesisClient: AmazonKinesis, value: Array[Byte], streamName: String): Try[PutRecordResult] = Try {
val putRecordRequest = new PutRecordRequest()
putRecordRequest.setStreamName(streamName)
putRecordRequest.setData(ByteBuffer.wrap(value))
putRecordRequest.setPartitionKey("integrationKey")
kinesisClient.putRecord(putRecordRequest)
}
To confirm this I have a small python app that basically consumes from the stream (initialStreamPosition: LATEST). And prints the records it finds by iterating through the shard-iterators. But unexpectedly however it returns an empty set of records for each obtained shardIterator.
Trying this using the aws cli tool, I do however get records returned for the same shardIterator. I am confused? How can that be?
Running the python consumer (with LATEST), returns:
Shard-iterators: ['AAAAAAAAAAH9AUYVAkOcqkYNhtibrC9l68FcAQKbWfBMyNGko1ypHvXlPEuQe97Ixb67xu4CKzTFFGoLVoo8KMy+Zpd+gpr9Mn4wS+PoX0VxTItLZXxalmEfufOqnFbz2PV5h+Wg5V41tST0c4X0LYRpoPmEnnKwwtqwnD0/VW3h0/zxs7Jq+YJmDvh7XYLf91H/FscDzFGiFk6aNAVjyp+FNB3WHY0d']
Records: []
If doing the "same" with the aws cli tool however I get:
> aws kinesis get-records --shard-iterator AAAAAAAAAAH9AUYVAkOcqkYNhtibrC9l68FcAQKbWfBMyNGko1ypHvXlPEuQe97Ixb67xu4CKzTFFGoLVoo8KMy+Zpd+gpr9Mn4wS+PoX0VxTItLZXxalmEfufOqnFbz2PV5h+Wg5V41tST0c4X0LYRpoPmEnnKwwtqwnD0/VW3h0/zxs7Jq+YJmDvh7XYLf91H/FscDzFGiFk6aNAVjyp+FNB3WHY0d --endpoint-url http://localhost:4567
Returns:
{"Records":[{"SequenceNumber":"49625122979782922897342908653629584879579547704307482626","ApproximateArrivalTimestamp":1640263797.328,"Data":{"type":"Buffer","data":[123,34,116,105,109,101,115,116,97,109,112,34,58,49,54,52,48,50,54,51,55,57,55,44,34,100,116,109,34,58,49,54,52,48,50,54,51,55,57,55,44,34,101,34,58,34,101,34,44,34,116,114,97,99,107,101,114,95,118,101,114,115,105,111,110,34,58,34,118,101,114,115,105,111,110,34,44,34,117,114,108,34,58,34,104,116,116,112,115,58,47,47,116,101,115,116,46,99,111,109,34,44,34,104,99,99,34,58,102,97,108,115,101,44,34,115,99,34,58,49,44,34,99,111,110,116,101,120,116,34,58,123,34,101,116,34,58,34,101,116,34,44,34,100,101,118,34,58,34,100,101,118,34,44,34,100,119,101,108,108,34,58,49,44,34,111,105,100,34,58,49,44,34,119,105,100,34,58,49,44,34,115,116,97,116,101,34,58,123,34,108,99,34,58,123,34,99,111,100,101,34,58,34,115,111,109,101,45,99,111,100,101,34,44,34,105,100,34,58,34,115,111,109,101,45,105,100,34,125,125,125,44,34,121,117,105,100,34,58,34,102,53,101,52,57,53,98,102,45,100,98,102,100,45,52,102,53,102,45,56,99,56,98,45,53,97,56,98,50,56,57,98,52,48,49,97,34,125]},"PartitionKey":"integrationKey"},{"SequenceNumber":"49625122979782922897342908653630793805399163707871723522","ApproximateArrivalTimestamp":1640263817.338,"Data":{"type":"Buffer","data":[123,34,116,105,109,101,115,116,97,109,112,34,58,49,54,52,48,50,54,51,56,49,55,44,34,100,116,109,34,58,49,54,52,48,50,54,51,56,49,55,44,34,101,34,58,34,101,34,44,34,116,114,97,99,107,101,114,95,118,101,114,115,105,111,110,34,58,34,118,101,114,115,105,111,110,34,44,34,117,114,108,34,58,34,104,116,116,112,115,58,47,47,116,101,115,116,46,99,111,109,34,44,34,104,99,99,34,58,102,97,108,115,101,44,34,115,99,34,58,49,44,34,99,111,110,116,101,120,116,34,58,123,34,101,116,34,58,34,101,116,34,44,34,100,101,118,34,58,34,100,101,118,34,44,34,100,119,101,108,108,34,58,49,44,34,111,105,100,34,58,49,44,34,119,105,100,34,58,49,44,34,115,116,97,116,101,34,58,123,34,108,99,34,58,123,34,99,111,100,101,34,58,34,115,111,109,101,45,99,111,100,101,34,44,34,105,100,34,58,34,115,111,109,101,45,105,100,34,125,125,125,44,34,121,117,105,100,34,58,34,102,53,101,52,57,53,98,102,45,100,98,102,100,45,52,102,53,102,45,56,99,56,98,45,53,97,56,98,50,56,57,98,52,48,49,97,34,125]},"PartitionKey":"integrationKey"},{"SequenceNumber":"49625122979782922897342908653632002731218779711435964418","ApproximateArrivalTimestamp":1640263837.347,"Data":{"type":"Buffer","data":[123,34,116,105,109,101,115,116,97,109,112,34,58,49,54,52,48,50,54,51,56,51,55,44,34,100,116,109,34,58,49,54,52,48,50,54,51,56,51,55,44,34,101,34,58,34,101,34,44,34,116,114,97,99,107,101,114,95,118,101,114,115,105,111,110,34,58,34,118,101,114,115,105,111,110,34,44,34,117,114,108,34,58,34,104,116,116,112,115,58,47,47,116,101,115,116,46,99,111,109,34,44,34,104,99,99,34,58,102,97,108,115,101,44,34,115,99,34,58,49,44,34,99,111,110,116,101,120,116,34,58,123,34,101,116,34,58,34,101,116,34,44,34,100,101,118,34,58,34,100,101,118,34,44,34,100,119,101,108,108,34,58,49,44,34,111,105,100,34,58,49,44,34,119,105,100,34,58,49,44,34,115,116,97,116,101,34,58,123,34,108,99,34,58,123,34,99,111,100,101,34,58,34,115,111,109,101,45,99,111,100,101,34,44,34,105,100,34,58,34,115,111,109,101,45,1pre05,100,34,125,125,125,44,34,121,117,105,100,34,58,34,102,53,101,52,57,53,98,102,45,100,98,102,100,45,52,102,53,102,45,56,99,56,98,45,53,97,56,98,50,56,57,98,52,48,49,97,34,125]},"PartitionKey":"integrationKey"}],"NextShardIterator":"AAAAAAAAAAE+9W/bI4CsDfzvJGN3elplafFFBw81/cVB0RjojS39hpSglW0ptfsxrO6dCWKEJWu1f9BxY7OZJS9uUYyLn+dvozRNzKGofpHxmGD+/1WT0MVYMv8tkp8sdLdDNuVaq9iF6aBKma+e+iD079WfXzW92j9OF4DqIOCWFIBWG2sl8wn98figG4x74p4JuZ6Q5AgkE41GT2Ii2J6SkqBI1wzM","MillisBehindLatest":0}
The actual python consumer I have used in many other settings to introspec other kinesis streams we have and its working as expected. But for some reason here its not working.
Does anyone have a clue what might be going on here?
So I was finally able to identify the issue, and perhaps it will be useful for someone else with similar problem.
In my setup, I am using a local kinesis stream (kinesalite) which doesn't support CBOR. You have to disable this explicitly otherwise I was seeing the following error when trying to deserialize the received record.
Unable to unmarshall response (We expected a VALUE token but got: START_OBJECT). Response Code: 200, Response Text: OK
In my case, setting the environment variable: AWS_CBOR_DISABLE=1 did the trick

PUT Presigned URL expires before download of file completes

Bit of an odd case where my API is sent a presigned URL to write to, and a presigned URL to download the file from.
The problem is that if they send a very large file, the presigned url we need to write to can expire before it gets to that step (some processing happens in between the read/write).
Is it possible to 'open' the connection for writing early to make sure it doesn't expire, and then start writing once the earlier process is done? Or maybe there is a better way of handling this.
The order goes:
Receive API request with a downloadUrl and an uploadUrl
Download the file
Process the file
Upload the file to the uploadUrl
TL;DR: How can I ensure the url for #4 doesn't expire before I get to it?
When generating the pre-signed URL, you have complete control over the time duration. For example, this Java code shows how to set the time when creating a GetObjectPresignRequest object:
GetObjectPresignRequest getObjectPresignRequest = GetObjectPresignRequest.builder()
.signatureDuration(Duration.ofMinutes(10))
.getObjectRequest(getObjectRequest)
.build();
So you can increase the time limit in such situations.

How to download and process large S3 file?

I have a large JSON file (i.e 100MB to 3GB) in S3. How to process this?
Today, I am using s3client.getObjectContent() to get the input stream and trying to process.
As I stream it, I am passing the inputstream to Jackson jsonparser and fetching each JSON object and calling another microservice to process the JSON object retrieved from the s3 input stream.
Problem:
As I am processing the JSON object, S3 stream is getting closed without processing the entire payload from S3.
I am getting warning:
S3AbortableInputStream:Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection
I am looking for a way to handle large S3 payload without the S3 client closing the stream before I process the entire payload. Any best practices or insights are appreciated.
Constraints: I need to process this as a stream or with a minimal memory footprint.
Can you please make the following change in your code and check?
FROM:
if (s3ObjectInputStream != null) {
s3ObjectInputStream.abort();
}
TO:
if (s3ObjectInputStream == null) {
s3ObjectInputStream.abort();
}