What I would like to do:
What I would like to do is have a url which would return to the caller a CSV file which is essentially a export of data. I would like this to remain to be a serverless solution.
What I have done:
I have created an AWS API Gateway with the URL I want. I have created a lambda that will query the database and create a CSV string of that data. That data is placed in a JSON object and returned. API gateway then gets the CSV data from the json object and returns CSV to the caller with appropriate headers to indicate tht it is a CSV and attachment. Testing from the browser I get the download automatically just like I intended.
The problem I see:
This works well until there is a sizable amount of data at which point I start getting "body size is too long".
My attempts to resolve:
I did some googling around and I see others have had similar issues. In one solution I saw that they return a link to the file that they created. This solution seems viable for them because they had a server. For my serverless architecture it seems to be a little trickier. I could take and store the file into S3 but then i would have to return a link to S3. That seems like it could work but doesn't feel right like im missing a configuration option. It also feels like im exposing the implementation by returning the s3 urls as well.
I have looked around for tutorials and example of people doing similar things and i haven't found any.
My Questions:
Is there a way to do this?
Is there another solution that i dont know of?
How do i return a file, in this case CSV, from API Gateway of a larger size
There is a limit of 6 MB for AWS Lambda response payloads. If the files you need to server are larger than that you won't be able to serve them directly from Lambda.
Using S3 to store and serve the files is the standard way of doing something like this. I would leave the S3 bucket private and generate S3 Pre-signed URLs in the Lambda function. That will limit the time that the CSV file is available for download, and it will prevent people from being able to guess the URLs of files you are serving. You would use an S3 Lifecycle Policy to archive or delete the files after a period of time.
Related
I have a general understanding question. I am building a flutter app that relies on a content library containing text files, latex equations, images, pdfs, videos etc.
The content lies on an aws amplify backend. Depending on the navigation of the user in the app, the corresponding data is fetched and displayed.
I am not sure about the correct way of fetching the data. The current method (which works) is that the data is stored in an S3 bucket. When data is requested, the data is downloaded to a temporary directory and then opened and processed in the app. This is actually not slow, but I feel that it is not the way it should be done.
When data is downloaded a file transfer notification pops up, which bothers me because it is shown all the time. Also I would like to read the data directly with something like a get request, without downloading the file first (specially for text files, which I would like to read directly into a String). But here I don't know how it works, because I don't see that you can save data in a file system with the other amplify services like data store or the rest api. Also, the S3 bucket is an intuitive way of storing data that is easy to use for the content creators of my company, for me it seems that the S3 bucket is the way to go. However with S3 I have only figured out the download method to fetch data.
Could someone give me a hint on what is the correct approach for this use case? Thank you very much!
I need a solution for entering new data in csv that is stored in S3 bucket in AWS.
At this point we are downloading the file, editing and then uploading it again in s3 and we would like to automatize this process.
We need to add one row in a three column.
Thank you in advance!
I think you will be able to do that using Lambda Functions. You will need to programmatically make the modifications you need over the CSV but there are multiple programming languages that allow you to do that. One quick example is using python and the csv library
Then you can invoke that lambda or add more logic to the operations you want to do using an AWS API Gateway.
You can access the CSV file (object) inside the S3 Bucket from the lambda code using the AWS SDK and append the new rows with data you pass as parameters to the function
There is no way to directly modify the csv stored in S3 (if that is what you're asking). The process will always entail some version of download, modify, upload. There are many examples of how you can do this, for example here
I am trying to find possible orphans in an S3 bucket. What I mean is that we might delete something out of the DB, and for whatever reason, it doesn't get cleared from S3. This can be a bug in our system or something of that nature. I want to double check against our API that the object in S3 maps to something that exists - the naming convention let's us map things together like that.
Scraping an entire bucket every X days seems unscalable. I was thinking that for each object in the bucket, it can add itself to an SQS queue for the relevant checking to happen, every 30 days or so.
I've only found events around uploads and specific modifications over at https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html. Is there anything more generalized I can't find? Any creative solutions to this problem?
You should activate Amazon S3 Inventory, which can provide a regular CSV file (as often as daily) that contains a list of every object in the Amazon S3 bucket.
You could then trigger some code that compares the contents of the CSV file against the database to find 'orphan' objects.
I want to do the following: a user in a browser types some text and after he presses a 'Save' button, the text should be saved in a file (for example: content.txt) in a folder (for example: /username_text) on the root of an S3 bucket.
Also, I want the user to be able, when he visits the same page, load the content from S3 and continue working on the file. Then, if he/she is done, save the file to S3 again.
Probably important to mention, but I plan on using NodeJS for my back-end...
My question now is: What is the best way to set this storing-and-retrieving thing up? Do I create an API gateway + Lambda function to GET and POST files through that? Or do I for example use the aws-sdk in Node to directly push and pull files from S3? Or is there a better way to do this?
I looked at the following two guides:
Using AWS S3 Buckets in a NodeJS App – Codebase – Medium
Image Upload and Retrieval from S3 Using AWS API Gateway and Lambda
Welcome to StackOverflow!
I think you are worrying too much about the not-so-important stuff. S3 is nothing but a storage system. You could have decided to store the content of these files on DynamoDB, RDS, etc. What would you do if you stored its contents on these real databases? You'd fetch for data and display it to the user, wouldn't you?
This is what you need to do with S3! S3 is a smart choice on your scenario because your "file" can grow very big and S3 is a great place for storing files. However, apparently, you're not actually storing files (think of .pdf, .mp4, .mov, etc.), you're essentially only storing human-readable text.
So here's one approach on how to solve your problem:
FETCHING FILE CONTENT
User logs in
You fetch the user's personal information based on some token. You can store all the metadata in DynamoDB, where given a user_id, fetch all the "files" from this user. These "files" (metadata only) would be the bucket and key for the actual file on S3.
You use the getObject API from S3 to fetch the file based on your query and display the body of your file to your user in a RESTful way. Your response should look something like this:
{
"content": "some content"
}
SAVING FILE CONTENT
User logs in
The user writes anything in a form and submits it. In your Lambda function, you grab the content of this form and process it. This request should look something like this:
{
"file_id": "some-id",
"user_id": "some-id",
"content": "some-content"
}
If the file_id exists, update the content in S3. Otherwise, upload a new file in S3 and then create a new entry in DynamoDB. You'd then, of course, have to handle if the user submitting the changes actually owns the file, but if you're using UUIDs it shouldn't be too much of a problem, but still worth checking in case an ID is leaked somehow.
This way, you don't need to worry about uploading/downloading files as these are CPU intensive tasks, so you can keep your costs low as well as using very little RAM in your functions (128MB should be more than enough), after all, you're now only serving text. Not only this will simplify your way of designing it, but will also make things simpler both in API Gateway and in your code as you won't have to deal with binary types. The maximum you'll do is convert the buffer from S3 to a String when serving some content, but this should be completely fine.
EDIT
On your question regarding whether you should upload it from the browser or not, I suggest you take a look into this answer where I cover the pros/cons of doing it via API Gateway vs from the Browser.
Looks like Parse.com stores the PFFile objects on AWS S3 and only stores a reference to the actual files on S3 in Parse for the PFFile object types.
So my problem here is I only get a link to AWS S3 link for my PFFile if I export the data using the out of the box Parse.com export functionality. After I import the same data to my Parse application, for some reason the security setting on those PFFiles on S3 is changed in a way that all PFFiles won't be accessible to me after an import due to security error.
My question is, does anyone know how the security is being set on the PFFiles? Here's a link to PFFile https://parse.com/docs/osx/api/Classes/PFFile.html but I guess this is rather an advanced topic and wasn't revealed on this page.
Also looking a solution for this, all I found is this from their forum:
In this case, the PFFiles are stored in a different app. You might
need to download these files and upload them again to the new app and
update the pointers. I know this is not a great answer but we're
working on making this process more straightforward.
https://www.parse.com/questions/import-pffile-object-not-working-in-iphone-application