Using Google Cloud Storage to host images for an image sharing site [closed] - google-cloud-platform

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I am building an image sharing site and I want the images uploaded to be saved on Cloud Storage. I'm at the POC stage and would like to know the following:
Once an image is uploaded could I generate a image URL that could be sent to my UI service which could be used to render the image on the front end?
I want to prevent the user from downloading the image from the usual methods (right clicking save as / right click open in new tab). Could this be done from Cloud Storage itself or should it be implemented in the front end using overlays or watermarks etc.
In such a scenario where we have a specific download button to download the image, what is the best way to implement this? Do I download the image on the backend server and then send it to the front end using something like gsutils? Or can the front end directly request the image from Cloud Storage?
Also open to any other alternatives that accomplish the above. Thanks!

Your question required engineering and architecture in the cloud. So, I can provide your some insight, but you need to go deeper in each part to achieve correctly your site
Firstly, the users mustn't directly access to the Cloud Storage bucket, either you need to set it public, and anyone can access to all the content. When a user need to read or write a file, use the signed Url mechanism
When a new image is uploaded, you need to trigger a Cloud Function (an event is emitted when the file is uploaded and you can plug a function (or a pubsub) on this event). Why? because the overlay/watermark/low resolution version need to be done server side. You can perform this when you display the picture on the site. But it reduce the latency for the user. That's why I recommend you to perform this new image version when the file is uploaded with a Cloud Functions and to store it in Cloud Storage (in another directory, like thumbnail)
And thus, you need to save 2 paths in database: the original image, and the processed image. On the site, you display the processed image, when the download button is clicked, you generate a signed URL to download the original image.
The ACL is going to be deprecated, or at least, not recommended by Google. Having a uniform authorization policy (based on IAM service) is the recommended best practice by Google.
But you can achieve the same things
But, in this cases, you can't limit the access to the original version directory and to the thumbnail directory. Users have free access to all, to download, upload and delete what they want.
If it's your use case, perfect, else... use signed Url!

Related

Streaming media to files in AWS S3

My problem:
I want to stream media I record on the client (typescript code) to my AWS storage (services like YouTube / Twitch / Zoom / Google Meet can live record and save the record to their cloud. Some of them even have host-failure tolerance and create a file if the host has disconnected).
I want each stream to have a different file name so future triggers will be available from it.
I tried to save the stream into S3, but maybe there are more recommended storage solutions for my problems.
What services I tried:
S3: I tried to stream directly into S3 but it doesn't really support updating files.
I tried multi-part files but they are not host-failure tolerance.
I tried to upload each part and have a lambda to merge it (yes, it is very dirty and consuming) but I sometimes had ordering problems.
Kinesis-Video: I tried to use kinesis-video but couldn't enable the saving feature with the SDK.
By hand, I saw it saved a new file after a period of time or after a size was reached so maybe it is not my wanted solution.
Amazon IVS: I tried it because Twitch recommended this although it is way over my requirements.
I couldn't find an example of what I want to do in code with SDK (only by hand examples).
Questions
Do I look at the right services?
What can I do with the AWS-SDK to make it work?
Is there a good place with code examples for future problems? Or maybe a way to search for solutions?
Thank you for your help.

Make static file accessible to only my app hosted on either AWS or Google Cloud

I have a static site (purely HTML, JS & CSS) I will be hosting on either AWS or Google Cloud.
The site pulls data from a CSV that can either be located as a local file on the site, or preferably, on another endpoint.
My issue is, the client does not want the CSV file to be publicly accessible (people shouldn't be able to directly get to it, and download it).
The file needs to live on AWS S3 or Google Cloud Storage, as the client will be updating it periodically.
However, I can't seem to work out how to make it visible to my app, but not if you try to visit the file directly. I can either make it public, so my app can see it, but then so can everyone else. Or make it not public, so it can't be downloaded, but then my app also can't see it.
My question is - is what I'm trying to achieve even possible? Or does the CSV have to either be public or not?
My ideal option would be two separate buckets, one with my static site on, the other, with the CSV files.
Any suggestions would be most welcome.
What you're asking for isn't really possible in the way you've described it. If the CSV is to be consumed directly by the web page, then the web page needs to be able to get it, and if the web page can get it then so can anyone who can view the page. The same is true of any data that is on a web page. Is there a particular reason why you don't want the CSV to be accessed directly? It wouldn't be something you could just go to, you'd have to know the URI (which would be easy to find, but most users wouldn't bother). Are there things in the data that shouldn't be exposed? If so, you need to rethink your approach entirely.
AWS recently released new S3 features that can help with your need. One is Amazon S3 Access Point (https://aws.amazon.com/s3/features/access-points). The other is called Amazon S3 Object Lambda (https://aws.amazon.com/es/blogs/aws/introducing-amazon-s3-object-lambda-use-your-code-to-process-data-as-it-is-being-retrieved-from-s3/).
It looks like you can put Lambdas in front of an S3 bucket to process and transform requests to its files (also known as "objects"). Unfortunately, I do not have a precise answer to your question as I have never implemented this solution but I believe that you might be able to store the CSV file in S3 and give access to your static site only thanks to an S3 Access Point.
Alternatively, you might be able to use CloudFront and Lambda Associations to also restrict access to your CSV files to specific origins.

Django: Best Practice for Storing Images (URLField vs ImageField)

There are cases in a project where I'd like to store images on a model.
For example:
Company Logos
Profile Pictures
Programming Languages
Etc.
Recently I've been using AWS S3 for file storage (primarily hosting on Heroku) via ImageField uploads.
I feel like there's a better way to store files than what I've been doing.
For some things (like for the examples above) I think it would make sense to actually just get an image url from a more publically available url than take up space in my own database.
For the experts in the Django community who have built and deployed really professional projects, do you typically store files directly into the Django media folder via ImageField?
or do you normally use a URLField and then pull a url from an API or an image link from the web (e.g., go on any Google image, right click and copy then paste image URL)?
Bonus: What does your image storing setup look like?
Hope this makes sense.
Thanks in advance!
The standard is what you've described, using something like AWS S3 to store the actual image and handle the URL in your database. Here's a few reasons why:
It's cheap. like really cheap
Instead of making your web server serve the files, you're offloading that onto the client (e.g. their browser grabbing the file from S3)
If you're using an ephemeral system (like Heroku), your only option is to use something like S3.
Control. Sure, you can pull an image link from somewhere else that isn't managed by you. But this does not scale. What happens if that server goes offline? What if they take that image down? This way, you control what happens to the objects.
An example of a decently large internet company but not large enough to run their own infrastructure (like Facebook/Instagram, Google, etc.) is VSCO. They're taking a decent amount of photo uploads every day and they're handling them with AWS.

Returning images through AWS API Gateway

I'm trying to use AWS API Gateway as a proxy in front of an image service.
I'm able to get the image to come through but it gets displayed as a big chunk of ASCII because Content-Type is getting set to "application/json".
Is there a way to tell the gateway NOT to change the source Content-Type at all?
I just want "image/jpeg", "image/png", etc. to come through.
I was trying to format a string to be returned w/o quotes and discovered the Integration Response functionality. I haven't tried this fix myself, but something along these lines should work:
Go to the Method Execution page of your Resource,
click on Integration Response,
expand Method Response Status 200,
expand Mapping Templates,
click "application/json",
click the pencil next to Output Passthrough,
change "application/json" to "image/png"
Hope it works!
I apologize, in advance, for giving an answer that does not directly answer the question, and instead suggests you adopt a different approach... but based in the question and comments, and my own experience with what I believe to be a similar application, it seems like you may be using the the wrong tool for the problem, or at least a tool that is not the optimal choice within the AWS ecosystem.
If your image service was running inside Amazon Lambda, the need for API Gateway would be more apparent. Absent that, I don't see it.
Amazon CloudFront provides fetching of content from a back-end server, caching of content (at over 50 "edge" locations globally), no charge for the storage of cached content, and you can configure up to 100 distinct hostnames pointing to a single Cloudfront distribution, in addition to the default xxxxxxxx.cloudfront.net hostname. It also supports SSL. This seems like what you are trying to do, and then some.
I use it, quite successfully for exactly the scenario you describe: "a proxy in front of an image service." Exactly what my image service and your image service do may be different (mine is a resizer that can look up the source URL of missing/never before requested images, fetch, and resize) but fundamentally it seems like we're accomplishing a similar purpose.
Curiously, the pricing structure of CloudFront in some regions (such as us-east-1 and us-west-2) is such that it's not only cost-effective, but in fact using CloudFront can be almost $0.005 cheaper than not using it per gigabyte downloaded.
In my case, in addition to the back-end image service, I also have an S3 bucket with a single file in it, attached to a single path in the CloudFront distribution (as a second "custom origin"), for the sole purpose of serving up /robots.txt, to control direct access to my images by well-behaved crawlers. This allows the robots.txt file to be managed separately from the image service itself.
If this doesn't seem to address your need, feel free to comment and I will clarify or withdraw this answer.
#kjsc: we finally figured out how to get this working on an alternate question with base64 encoded image data which you may find helpful in your solution:
AWS Gateway API base64Decode produces garbled binary?
To answer your question, to get the Content-Type to come through as a hard-coded value, you would first go into the method response screen and add a Content-Type header and whatever Content type you want.
Then you'd go into the Integration Response screen and set the Content type to your desired value (image/png in this example). Wrap 'image/png' in single quotes.

Managing temp files in web development

I have a question regarding web architecture. I planning to build a website for uploading photos(This is a personal project). Users can upload multiple photos by zipping and uploading it. Photos can be any resolution while uploading but once basic processing is complete, all photos will stored in a standard resolution JPEG format.
Once zipped photos are uncompressed, they will be presented to the user in a web page as thumbnails, where users can do their last touch-ups (Once photos are saved, no modifications are allowed).
My question is this, how can I refer the original file when the user select the thumbnails. How can I best associate the temp file with the thumbnail presented. I know I can store the image in a DB and use it, but the original file will be their only till the user save the images and once it saved it will be standard size image.
Even though I am using python/django, I think this is a general web programming question.
thanks,
Dan
It's certainly reasonable to have a temp_file_location type attribute (or even model) and store the intermediate files in a temporary place. Cron jobs or the like can than be used to clean up both the filesystem and the database.