AWS S3 filename - amazon-web-services

I’m trying to build application with backend in java that allows users to create a text with images in it (something like a a personal blog). I’m planning to store these images to s3 bucket. When uploading image files to bucket i’m hashing the original name and store the hashed one in the bucket. Images are for display purpose only, no user will be able to download them. Frontend displays these images by getting a path to them from the server. So the question is, is there any need to store original name of the image file in the database? And what are the reasons, if any, of doing so?

I guess in general it is not needed because what is more important is how these resources are used or managed in the system.
Assuming your service is something like data access (similar to google drive), I don't think it's necessary to store it in DB, unless you want to make faster search queries.

Related

Correct way to fetch data from an aws server into a flutter app?

I have a general understanding question. I am building a flutter app that relies on a content library containing text files, latex equations, images, pdfs, videos etc.
The content lies on an aws amplify backend. Depending on the navigation of the user in the app, the corresponding data is fetched and displayed.
I am not sure about the correct way of fetching the data. The current method (which works) is that the data is stored in an S3 bucket. When data is requested, the data is downloaded to a temporary directory and then opened and processed in the app. This is actually not slow, but I feel that it is not the way it should be done.
When data is downloaded a file transfer notification pops up, which bothers me because it is shown all the time. Also I would like to read the data directly with something like a get request, without downloading the file first (specially for text files, which I would like to read directly into a String). But here I don't know how it works, because I don't see that you can save data in a file system with the other amplify services like data store or the rest api. Also, the S3 bucket is an intuitive way of storing data that is easy to use for the content creators of my company, for me it seems that the S3 bucket is the way to go. However with S3 I have only figured out the download method to fetch data.
Could someone give me a hint on what is the correct approach for this use case? Thank you very much!

Storing S3 Urls vs calling listObjects

I have an app that has an attachments feature for users. They can upload documents to S3 and then revisit and preview and/or Download said attachments.
I was planning on storing the S3 urls in DB and then pre-signing them when the User needs them. I'm finding a caveat here is that this can lead to edge cases between S3 and the DB.
I.e. if a file gets removed from S3 but its url does not get removed from DB (or vice-versa). This can lead to data inconsistency and may mislead users.
I was thinking of just getting the urls via the network by using listObjects in the s3 client SDK. I don't really need to store the urls and this guarantees the user gets what's actually in S3.
Only con here is that it makes 1 API request (as opposed to DB hit)
Any insights?
Thanks!
Using a database to store an index to files is a good idea, especially once the volume of objects increases. The ListObjects() API only returns 1000 objects per call. This might be okay if every user has their own path (so you can use ListObjects(Prefix='user1/'), but that's not ideal if you want to allow document sharing between users.
Using a database will definitely be faster to obtain a listing, and it has the advantage that you can filter on attributes and metadata.
The two systems will only get "out of sync" if objects are created/deleted outside of your app, or if there is an error in the app. If this concerns you, then use Amazon S3 Inventory, to provide a regular listing of objects in the bucket and write some code to compare it against the database entries. This will highlight if anything is going wrong.
While Amazon S3 is an excellent NoSQL database (Key = filename, Value = contents), it isn't good for searching/listing a large quantity of objects.

Storing of S3 Keys vs URLs

I have some functionality that uploads Documents to an S3 Bucket.
The key names are programmatically generated via some proprietary logic for the layout/naming convention needed.
The results of my S3 upload command is the actual url itself. So, it's in the format of
REGION/BUCKET/KEY
I was planning on storing that full url into my DB so that users can access their uploads.
Given that REGION and BUCKET probably wouldn't change, does it make sense to just store the KEY - and then dynamically generate the full url when the client needs it?
Just want to know what the desired pattern here is and what others do. Thanks!
Storing the full URL is a bad idea. As you said in the question, the region and bucket are already known, so storing the full URL is a waste of disk space. Also, if in the future say, you want to migrate your assets to a different bucket may be in a different region, having full URLs stored in the DB just make things harder.

how to restrict google cloud storage upload

I have a mobile application that uses Google Cloud Storage. The application allows each registered user to upload a specific number of files.
My question is, is there a way to do some kind of checks before the storage upload? Or do I need to implement a separate reservation API of sorts that OKs an upload step?
Any alternative suggestions are welcome too, of course.
warning: Not an authoritative answer. Happy to accept removal or update requests.
I am not aware of any GCS or Firebase Cloud Storage mechanisms that will inherently limit the number of files (objects) that a given user can create. If it were me, this is how I would approach the puzzle.
I would create a database (eg. Firestore / Datastore) that has a key for each user and a value which is the number of files they have uploaded. When a user wants to upload a new file, it would first make a REST call to a Cloud Function that I would write. This Cloud Function would implicitly know the identity of the calling user. It would look up the record in the database and determine if we are allowed to upload a new file. If no, then return an error and end of story. If yes, then increment the value in the database. Next I would create a GCS "signed URL" that can be used to permit an upload. It would be that signed URL that the Cloud Function would return. The app that now wishes to upload can use that signed URL to perform the actual upload.
I would also add metadata to each file uploaded to identify the logical uploader (user) of the file. That can be then used for reconciliation if needed. We could examine all the files in the bucket and re-build the database of how many files each user had uploaded.
A possible alternative to this story is for the Cloud Function to not return a signed-url but instead receive the data to be uploaded in the same request. If the check on number of files passes, then the Cloud Function could be a proxy to a GCS write to create the file directly. This alternative needs to be carefully examined as a function of the sizes of the files to be uploaded. If the files are large this may be a very poor solution. We want to be in and out of Cloud Functions as quickly as possible and holding a Cloud Function "around" to service data pass through isn't great. We may want to look at Cloud Run in that case as it supports concurrency in the instance without increasing the cost per call.

What is the correct way to set up S3 for loading content in the browser?

I want to do the following: a user in a browser types some text and after he presses a 'Save' button, the text should be saved in a file (for example: content.txt) in a folder (for example: /username_text) on the root of an S3 bucket.
Also, I want the user to be able, when he visits the same page, load the content from S3 and continue working on the file. Then, if he/she is done, save the file to S3 again.
Probably important to mention, but I plan on using NodeJS for my back-end...
My question now is: What is the best way to set this storing-and-retrieving thing up? Do I create an API gateway + Lambda function to GET and POST files through that? Or do I for example use the aws-sdk in Node to directly push and pull files from S3? Or is there a better way to do this?
I looked at the following two guides:
Using AWS S3 Buckets in a NodeJS App – Codebase – Medium
Image Upload and Retrieval from S3 Using AWS API Gateway and Lambda
Welcome to StackOverflow!
I think you are worrying too much about the not-so-important stuff. S3 is nothing but a storage system. You could have decided to store the content of these files on DynamoDB, RDS, etc. What would you do if you stored its contents on these real databases? You'd fetch for data and display it to the user, wouldn't you?
This is what you need to do with S3! S3 is a smart choice on your scenario because your "file" can grow very big and S3 is a great place for storing files. However, apparently, you're not actually storing files (think of .pdf, .mp4, .mov, etc.), you're essentially only storing human-readable text.
So here's one approach on how to solve your problem:
FETCHING FILE CONTENT
User logs in
You fetch the user's personal information based on some token. You can store all the metadata in DynamoDB, where given a user_id, fetch all the "files" from this user. These "files" (metadata only) would be the bucket and key for the actual file on S3.
You use the getObject API from S3 to fetch the file based on your query and display the body of your file to your user in a RESTful way. Your response should look something like this:
{
"content": "some content"
}
SAVING FILE CONTENT
User logs in
The user writes anything in a form and submits it. In your Lambda function, you grab the content of this form and process it. This request should look something like this:
{
"file_id": "some-id",
"user_id": "some-id",
"content": "some-content"
}
If the file_id exists, update the content in S3. Otherwise, upload a new file in S3 and then create a new entry in DynamoDB. You'd then, of course, have to handle if the user submitting the changes actually owns the file, but if you're using UUIDs it shouldn't be too much of a problem, but still worth checking in case an ID is leaked somehow.
This way, you don't need to worry about uploading/downloading files as these are CPU intensive tasks, so you can keep your costs low as well as using very little RAM in your functions (128MB should be more than enough), after all, you're now only serving text. Not only this will simplify your way of designing it, but will also make things simpler both in API Gateway and in your code as you won't have to deal with binary types. The maximum you'll do is convert the buffer from S3 to a String when serving some content, but this should be completely fine.
EDIT
On your question regarding whether you should upload it from the browser or not, I suggest you take a look into this answer where I cover the pros/cons of doing it via API Gateway vs from the Browser.