Unable to PUT big file (2gb) to aws s3 bucket (nodejs) | RangeError: data is too long - amazon-web-services

I scouted trough all of the internet and everybody gives out different advice but none of them helped me.
Im currently trying to simply send file.buffer that gets send to my endpoint directly to aws bucket.
im using PutObjectCommand have correctly entered all the details in but there's apparently problem with me using simple await s3.send(command) because my 2.2gbs video is way too big.
i get this error when attempting to upload said file to cloud.
RangeError: data is too long at Hash.update (node:internal/crypto/hash:113:22) at Hash.update (C:\Users\misop\Desktop\sebi\sebi-auth\node_modules\#aws-sdk\hash-node\dist-cjs\index.js:12:19) at getPayloadHash (C:\Users\misop\Desktop\sebi\sebi-auth\node_modules\#aws-sdk\signature-v4\dist-cjs\getPayloadHash.js:18:18) at SignatureV4.signRequest (C:\Users\misop\Desktop\sebi\sebi-auth\node_modules\#aws-sdk\signature-v4\dist-cjs\SignatureV4.js:96:71) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) { code: 'ERR_OUT_OF_RANGE', '$metadata': { attempts: 1, totalRetryDelay: 0 } }
I browsed quite a lot,there's lots of people saying that i should be using presigned url,i did try however if i do await getSignedUrl(s3,putCommand,{expires:3600}); then i do get generated url but there's not PUT send to cloud. when i read little more into it getSignedUrl is just for generating signed url therefore there's no way for me to use Put command there so im not sure how to approach this situation.
Im currently working with :
"#aws-sdk/client-s3": "^3.238.0",
"#aws-sdk/s3-request-presigner": "^3.238.0",
Honestly i've been testing lots of different ways i saw online but i wasnt successful following even amazon's official documentation where they mention these thing and i trully dont want to implement multipart upload for smaller than 4 ~ 5gbs of videos.
I'll be honored to hear any advice on this topic, thank you.
Get advice on how to implement simple video upload to aws s3 because of my many failed attempts on doing so since there's lots of information and vast majority doesnt work.

The solution to my problem was essentially using multer's s3 "addon" that had s3 property and had pre-done solution.
"multer-s3": "^3.0.1" version worked even with file that have 5gbs and such. solutions such as using PutObject command inside presigned url method or presigned-post methods were unable to work with multer's file.buffer that node server receives after its being submitted.
If you experienced same problem and want quick and easy solution. use this Multer-s3 npm

Related

Correct way to fetch data from an aws server into a flutter app?

I have a general understanding question. I am building a flutter app that relies on a content library containing text files, latex equations, images, pdfs, videos etc.
The content lies on an aws amplify backend. Depending on the navigation of the user in the app, the corresponding data is fetched and displayed.
I am not sure about the correct way of fetching the data. The current method (which works) is that the data is stored in an S3 bucket. When data is requested, the data is downloaded to a temporary directory and then opened and processed in the app. This is actually not slow, but I feel that it is not the way it should be done.
When data is downloaded a file transfer notification pops up, which bothers me because it is shown all the time. Also I would like to read the data directly with something like a get request, without downloading the file first (specially for text files, which I would like to read directly into a String). But here I don't know how it works, because I don't see that you can save data in a file system with the other amplify services like data store or the rest api. Also, the S3 bucket is an intuitive way of storing data that is easy to use for the content creators of my company, for me it seems that the S3 bucket is the way to go. However with S3 I have only figured out the download method to fetch data.
Could someone give me a hint on what is the correct approach for this use case? Thank you very much!

s3 direct upload with multipart fails after some parts

I am trying to use Direct Multipart upload, from the browser to S3 buckets.
I have CORS enabled on the said bucket.
The problem:
After initiating multi-part upload, some part gets uploaded fine, but one of the parts will fail randomly. I am attaching the screenshot below for reference.
P.S.
Part size is 8 MB
And there are 6 parallel uploads
I have tried with a Queue size of 4 and a chunk size of 5MBs. Nothing seems to work so far.
As you can see in the screenshot, randomly the OPTIONS call to S3 bucket times out for a specific part, hence the upload for the same part also fails.
Any idea how to fix this?
I have tried twice, as seen in both screenshots, it fails for different part numbers, for the same file though. Any help is highly appreciated.
Thanks!

Sagemaker, get spark dataframe from data image url on S3

I am trying to obtain a sparkdataframe which contains the paths and image for all images in my data. The data is store as follow :
folder/image_category/image_n.jpg
I worked on a local jupyter notebook and got no problem with using following code:
dataframe = spark.read.format("image").load(path)
I need to do the same exercise using AWS sagemaker and S3. I created a bucket following the same pattern :
s3://my_bucket/folder/image_category/image_n.jpg
I've tried a lot of possible solutions i found online, based on boto3, s3fs and other stuff, but unfortunately i am still unable to make it work (and i am starting to lose faith ...).
Would anyone have something reliable i could base my work on ?

How to facilitate downloading both CSV and PDF from API Gateway connected to S3

In the app I'm working on, we have a process whereby a user can download a CSV or PDF version of their data. The generation works great, but I'm trying to get it to download the file and am running into all sorts of problems. We're using API Gateway for all the requests, and the generation happens inside a Lambda on a POST request. The GET endpoint takes in a file_name parameter and then constructs the path in S3 and then makes the request directly there. The problem I'm having is when I'm trying to transform the response. I get a 500 error and when I look at the logs, it says Execution failed due to configuration error: Unable to transform response. So, clearly that's where I've spent most of my time. I've tried at least 50 different iterations of templates and combinations with little success. The closest I've gotten is the following code, where the CSV downloads fine, but the PDF is not a valid PDF anymore:
CSV:
#set($contentDisposition = "attachment;filename=${method.request.querystring.file_name}")
$input.body
#set($context.responseOverride.header.Content-Disposition = $contentDisposition)
PDF:
#set($contentDisposition = "attachment;filename=${method.request.querystring.file_name}")
$util.base64Encode($input.body)
#set($context.responseOverride.header.Content-Disposition = $contentDisposition)
where contentHandling = CONVERT_TO_TEXT. My binaryMediaTypes just has application/pdf and that's it. My goal is to get this working without having to offload the problem into a Lambda so we don't have that overhead at the download step. Any ideas how to do this right?
Just as another comment, I've tried CONVERT_TO_BINARY and just leaving it as Passthrough. I've tried it with text/csv as another binary media type and I've tried different combinations of encoding and decoding base64 and stuff. I know the data is coming back right from S3, but the transformation is where it's breaking. I am happy to post more logs if need be. Also, I'm pretty sure this makes sense on StackOverflow, but if it would fit in another StackExchange site better, please let me know.
Resources I've looked at:
https://docs.aws.amazon.com/apigateway/latest/developerguide/request-response-data-mappings.html
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-mapping-template-reference.html#util-template-reference
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-payload-encodings-workflow.html
https://docs.amazonaws.cn/en_us/apigateway/latest/developerguide/api-gateway-payload-encodings-configure-with-control-service-api.html.
(But they're all so confusing...)
EDIT: One Idea I've had is to do CONVERT_TO_BINARY and somehow base64 encode the CSVs in the transformation, but I can't figure out how to do it right. I keep feeling like I'm misunderstanding the order of things, specifically when the "CONVERT" part happens. If that makes any sense.
EDIT 2: So, I got rid of the $util.base64Encode in the PDF one and now I have a PDF that's empty. The actual file in S3 definitely has things in it, but for some reason CONVERT_TO_TEXT is not handling it right or I'm still not understading how this all works.
Had similar issues. One major thing is the Accept header. I was testing in chrome which sends Accept header as text/html,application/xhtml.... api-gateway ignores everything except the first one(text/html). It will then convert any response from S3 to base64 to try and conform to text/html.
At last after trying everything else I tried via Postman which defaults the Accept header to */*. Also set your content handling on the Integration response to Passthrough. And everything was working!
One other thing is to pass the Content-Type and Content-Length headers through(Add them in method response first and then in Integration response):
Content-Length integration.response.header.Content-Length
Content-Type integration.response.header.Content-Type

CSV Export using Api Gateway and Lambda

What I would like to do:
What I would like to do is have a url which would return to the caller a CSV file which is essentially a export of data. I would like this to remain to be a serverless solution.
What I have done:
I have created an AWS API Gateway with the URL I want. I have created a lambda that will query the database and create a CSV string of that data. That data is placed in a JSON object and returned. API gateway then gets the CSV data from the json object and returns CSV to the caller with appropriate headers to indicate tht it is a CSV and attachment. Testing from the browser I get the download automatically just like I intended.
The problem I see:
This works well until there is a sizable amount of data at which point I start getting "body size is too long".
My attempts to resolve:
I did some googling around and I see others have had similar issues. In one solution I saw that they return a link to the file that they created. This solution seems viable for them because they had a server. For my serverless architecture it seems to be a little trickier. I could take and store the file into S3 but then i would have to return a link to S3. That seems like it could work but doesn't feel right like im missing a configuration option. It also feels like im exposing the implementation by returning the s3 urls as well.
I have looked around for tutorials and example of people doing similar things and i haven't found any.
My Questions:
Is there a way to do this?
Is there another solution that i dont know of?
How do i return a file, in this case CSV, from API Gateway of a larger size
There is a limit of 6 MB for AWS Lambda response payloads. If the files you need to server are larger than that you won't be able to serve them directly from Lambda.
Using S3 to store and serve the files is the standard way of doing something like this. I would leave the S3 bucket private and generate S3 Pre-signed URLs in the Lambda function. That will limit the time that the CSV file is available for download, and it will prevent people from being able to guess the URLs of files you are serving. You would use an S3 Lifecycle Policy to archive or delete the files after a period of time.