Managing temp files in web development - django

I have a question regarding web architecture. I planning to build a website for uploading photos(This is a personal project). Users can upload multiple photos by zipping and uploading it. Photos can be any resolution while uploading but once basic processing is complete, all photos will stored in a standard resolution JPEG format.
Once zipped photos are uncompressed, they will be presented to the user in a web page as thumbnails, where users can do their last touch-ups (Once photos are saved, no modifications are allowed).
My question is this, how can I refer the original file when the user select the thumbnails. How can I best associate the temp file with the thumbnail presented. I know I can store the image in a DB and use it, but the original file will be their only till the user save the images and once it saved it will be standard size image.
Even though I am using python/django, I think this is a general web programming question.
thanks,
Dan

It's certainly reasonable to have a temp_file_location type attribute (or even model) and store the intermediate files in a temporary place. Cron jobs or the like can than be used to clean up both the filesystem and the database.

Related

Retrieving data from AWS S3 too slow in Shiny app

I know that this question can be mostly answered generally for any Web App, but because I am specifically using Shiny I figured that your answers may be considerably more useful.
I have made a relatively complex app. The data is not complex, but the user interface is.
I am storing the data in S3 using the aws.s3 package, and have built my app using golem. Because most shiny apps are used to analyse or enter some data, they usually deal with a couple of datasets, and a relational database is very useful and fast for that type of app.
However, my app is quite UI/UX extensive. Users can have their own/shared whiteboard space(s) where they drag around items. The coordinates of the items are stored in rds files in my S3 bucket, for each user. They can customise many aspects of the app just for them, font size, colours of various experimental groups (it's a research app), experimental visits that are storing pdf files, .html files and .rds files.
The .rds files stored can contain variables, lists, data.frames, reactiveValues, renderUI() objects etc.. So they are widely different.
As such I have dozens of rds files that are stored in a bucket and everytime the app loads each of these .rds files need to be read one by one in order to recreate the environment appropriate for each user. The number of files/folders in directories are queried to know how many divs need to be generated for the user to click inside their files etc..
The range of objects stored is too wide for me to use a relational database - but my app is taking at least 40 seconds to load. It is also generally slow when submitting data as well, mostly because the data entered often modified many UI elements that need to be pushed to S3 again. Because I have no background in proper Web Dev, I have no idea what is the best way to store user-related UX/UI elements and how to retrieve them seamlessly.
Could anyone please recommend me to appropriate resources for me to learn more about it?
Am I doing it completely wrong? I honestly do not know how else to store and retrieve all these R objects.
Thank you in advance for your help with the above.

Using Google Cloud Storage to host images for an image sharing site [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I am building an image sharing site and I want the images uploaded to be saved on Cloud Storage. I'm at the POC stage and would like to know the following:
Once an image is uploaded could I generate a image URL that could be sent to my UI service which could be used to render the image on the front end?
I want to prevent the user from downloading the image from the usual methods (right clicking save as / right click open in new tab). Could this be done from Cloud Storage itself or should it be implemented in the front end using overlays or watermarks etc.
In such a scenario where we have a specific download button to download the image, what is the best way to implement this? Do I download the image on the backend server and then send it to the front end using something like gsutils? Or can the front end directly request the image from Cloud Storage?
Also open to any other alternatives that accomplish the above. Thanks!
Your question required engineering and architecture in the cloud. So, I can provide your some insight, but you need to go deeper in each part to achieve correctly your site
Firstly, the users mustn't directly access to the Cloud Storage bucket, either you need to set it public, and anyone can access to all the content. When a user need to read or write a file, use the signed Url mechanism
When a new image is uploaded, you need to trigger a Cloud Function (an event is emitted when the file is uploaded and you can plug a function (or a pubsub) on this event). Why? because the overlay/watermark/low resolution version need to be done server side. You can perform this when you display the picture on the site. But it reduce the latency for the user. That's why I recommend you to perform this new image version when the file is uploaded with a Cloud Functions and to store it in Cloud Storage (in another directory, like thumbnail)
And thus, you need to save 2 paths in database: the original image, and the processed image. On the site, you display the processed image, when the download button is clicked, you generate a signed URL to download the original image.
The ACL is going to be deprecated, or at least, not recommended by Google. Having a uniform authorization policy (based on IAM service) is the recommended best practice by Google.
But you can achieve the same things
But, in this cases, you can't limit the access to the original version directory and to the thumbnail directory. Users have free access to all, to download, upload and delete what they want.
If it's your use case, perfect, else... use signed Url!

Django: Best Practice for Storing Images (URLField vs ImageField)

There are cases in a project where I'd like to store images on a model.
For example:
Company Logos
Profile Pictures
Programming Languages
Etc.
Recently I've been using AWS S3 for file storage (primarily hosting on Heroku) via ImageField uploads.
I feel like there's a better way to store files than what I've been doing.
For some things (like for the examples above) I think it would make sense to actually just get an image url from a more publically available url than take up space in my own database.
For the experts in the Django community who have built and deployed really professional projects, do you typically store files directly into the Django media folder via ImageField?
or do you normally use a URLField and then pull a url from an API or an image link from the web (e.g., go on any Google image, right click and copy then paste image URL)?
Bonus: What does your image storing setup look like?
Hope this makes sense.
Thanks in advance!
The standard is what you've described, using something like AWS S3 to store the actual image and handle the URL in your database. Here's a few reasons why:
It's cheap. like really cheap
Instead of making your web server serve the files, you're offloading that onto the client (e.g. their browser grabbing the file from S3)
If you're using an ephemeral system (like Heroku), your only option is to use something like S3.
Control. Sure, you can pull an image link from somewhere else that isn't managed by you. But this does not scale. What happens if that server goes offline? What if they take that image down? This way, you control what happens to the objects.
An example of a decently large internet company but not large enough to run their own infrastructure (like Facebook/Instagram, Google, etc.) is VSCO. They're taking a decent amount of photo uploads every day and they're handling them with AWS.

Is there an implementation of a single instance blob store for Django?

I am new to Django so I apologize if I missed something. I would like to have a library that gives me a single-instance data store for Blob / Binary data. I want a library that masks whether or not the files are stored in the database, file system or some kind of back end like S3 on Amazon. I want a single API that lets me add files, and get back URLs to serve those files. Also it would be nice if the implementation supported some kind of migration if I had blobs in a database for a site when it just started out and then move those blobs to an S3 bucket behind the scenes without me needing to change how my application stores and serves the data.
An important sub-aspect of this is that the files have to be only shown to properly authorized users (i.e. just putting them in an open /media/ folder as files is not sufficient).
Perhaps I am asking too much - but I find this kind of service very useful in my applications. The main reason that I am asking is that unless I find such a thing - I will wander off and build my own library - I just don't want to waste the time if this kind of thing already exists.

Sitecore media items and race conditions

How does Sitecore deal with race conditions when publishing media items?
Scenario:
A non versioned media item with a 500mb mpg file (stored as blob) is
being downloaded by a site visitor.
The download will take at best
a bew minutes, at worst could be measured in hours (if they're on a
low bandwidth connection).
While the user is downloading an author
uploads a new version of the mpg on the media item and publishes.
What happens, and why?
Other variations include:
The security settings on the media item change to block access from the visitor downloading
The media item is deleted and the change published
I'm guessing that in all these cases the download is aborted, but if so, what response does the server send?
I don't have an exact answer, but Sitecore caches blob assets on the file system under /App_Data/MediaCache/ so perhaps the existing asset is still in that cache. I'm not sure how Sitecore's media caching mechanism works but I bet it purges/re-caches on the next request to the new assets once the asset is completely there.
Just a guess. Maybe decompile the kernel to find the code that handles caching media.
(Not really an answer.. just comment was too big for the box :P)
This is a very interesting question.. Sitecore's media performance is done a lot through it caching a copy to disk and the delivering it from there on subsequent requests (also for caching scaled copies of originals such as thumbnails etc). The file is flushed once the original item is edited in some way and then re-published.
I am uncertain (and intrigued) how this would affect a large file as I think a lot of people assume media is probably smaller files such as images or pdfs etc that a user would just re-request if broken and how this effects a file currently being streamed when the item itself was updated. I'm sure a lot of the work at that point is IIS/ASP.NET streaming rather than Sitecore itself.
I'm not sure if Sitecore's cache would protect / shield against that but this should be pretty simple enough to test with a larger media file. Interested in the results (as larger files I've delivered personally have been done by CDN or a dedicated streaming partner)
This is not a difinitive answer, and I agree w/Stephen about dedicated streaming partner. I wonder how such systems handle this.
It seems that Sitecore creates a new media cache file for each published and accessed revision, so the HTTP transmit can continue reading the old file while the system writes the new file. Not sure if/how that works if you disable caching (I didn't try disabling caching). Otherwise, trying to write while reading could be blocked or interfere with read.
Note that you get a new revision ID even if you don't version. And it might be the publication that causes a new cache entry, not the occurrence of a new revision.