I'm developing a Django project where I need to serve temporary images, which are generated online. The sessions should be anonymous; anyone should be able to use the service. The images should be destroyed when the session expires or closes.
I don't know, however, what's the best approach. For instance, I could use file-based sessions and just set the images to be generated at the session folder, and they would (or at least should) be destroyed with the session. I suppose I could do something similar with database sessions, maybe saving the images in the database or just removing them when the sessions ends, however, the file-based solution sounds more reliable to me.
Is it a good solution, or are there more solid alternatives?
I'd name the temporary images based on a hash of the session key and then create a management command that:
makes a list containing potential temp filename hashes for all the current sessions.
grabs a list of all the current filenames in your temporary directory
deletes filenames which don't have a matching entry in the hash list
Since there's no failsafe way to know if a session has "closed", you should use the cleanup management command first - either before this one, or you could make it run implicitly as part of this new command by using call_command() function.
Related
I have an app that has an attachments feature for users. They can upload documents to S3 and then revisit and preview and/or Download said attachments.
I was planning on storing the S3 urls in DB and then pre-signing them when the User needs them. I'm finding a caveat here is that this can lead to edge cases between S3 and the DB.
I.e. if a file gets removed from S3 but its url does not get removed from DB (or vice-versa). This can lead to data inconsistency and may mislead users.
I was thinking of just getting the urls via the network by using listObjects in the s3 client SDK. I don't really need to store the urls and this guarantees the user gets what's actually in S3.
Only con here is that it makes 1 API request (as opposed to DB hit)
Any insights?
Thanks!
Using a database to store an index to files is a good idea, especially once the volume of objects increases. The ListObjects() API only returns 1000 objects per call. This might be okay if every user has their own path (so you can use ListObjects(Prefix='user1/'), but that's not ideal if you want to allow document sharing between users.
Using a database will definitely be faster to obtain a listing, and it has the advantage that you can filter on attributes and metadata.
The two systems will only get "out of sync" if objects are created/deleted outside of your app, or if there is an error in the app. If this concerns you, then use Amazon S3 Inventory, to provide a regular listing of objects in the bucket and write some code to compare it against the database entries. This will highlight if anything is going wrong.
While Amazon S3 is an excellent NoSQL database (Key = filename, Value = contents), it isn't good for searching/listing a large quantity of objects.
In Postman, once you have created a request or collection and have fine tuned it, is there any way to lock it so as to make it read only so that it can't be accidentally altered?
Obviously I would need something to toggle it back to editable again!
Thanks in advance
I don't think there is a option to make a collection read only for admin ( the creator of the collection). Few of the way of avoiding unnecessary changes are:
if you can edit rights of other users within the workspace make
it view only for selected users inside the workspace.
Create a fork of the collection so that you revert back
Create a copy of the collection
Download collection as json file
Personally i prefere downloading collection as json as this keeps the workspace clean and tidy .
else:
i prefer creating inprogress workspace and final workspace and sharing completed collection to final workspace and deleting it from ingrogress workspace everytime i finish something.
If changes is required , i will work in inprogress workspace by creating a copy
Does Django have a method of storage like HTML5's localStorage or sessionStorage?
I want to use the Django/Django-Rest-Framework as the backend of my project.
but whether the Django has a convenient storage method to server my project? if in the HTML5 there are localStorage and sessionStorage, which is very useful.
EDIT
I want to use a simple method to store my temporary data, such as, if there is a requirement to share the data.
such as I have 3 providers (a_provider, b_provider, c_provider), they can process a origin_data.
in a function,
def process_data():
a_provider(get_data()) # process a
b_provider(get_data()) # process b
c_provider(get_data()) # process c
the get_data() can get the shared data.
rather than every process to return the processed data as param to pass into other provider.
There are some 'Offline Solutions' you can check out here
However, If you are trying to completely run in local-storage Django probably isn't your choice. Some new development on this particular topic is being explored by the awesome team at BeeWare.
Hope this helps.
I am working on doing some simple analytics on a Django webstite (v1.4.1). Seeing as this data will be gathered on pretty much every server request, I figured the right way to do this would be with a piece of custom middleware.
One important metric for the site is how often given images are accessed. Since each image is its own object, I thought about using django-hitcount, but figured that was unnecessary for what I was trying to do. If it proves easier, I may use it though.
The current conundrum I face is that I don't want to query the database and look for a given object for every HttpRequest that occurs. Instead, I would like to wait until a successful response (indicated by an HttpResponse.status of 200 or whatever), and then query the server and update a hit field for the corresponding image. The reason the only way to access the path of the image is in process_request, while the only way to access the status code is in process_response.
So, what do I do? Is it as simple as creating a class variable that can hold the path and then lookup the file once the response code of 200 is returned, or should I just use django-hitcount?
Thanks for your help
Set up a cron task to parse your Apache/Nginx/whatever access logs on a regular basis, perhaps with something like pylogsparser.
You could use memcache to store the counters and then periodically persist them to the database. There are risks that memcache will evict the value before it's been persisted but this could be acceptable to you.
This article provides more information and highlights a risk arising when using hosted memcache with keys distributed over multiple servers. http://bjk5.com/post/36567537399/dangers-of-using-memcache-counters-for-a-b-tests
Background
I'm doing fairly big file uploads on Django. File size is generally 10MB-100MB.
I'm on Heroku and I've been hitting the request timeout of 30 seconds.
The Beginning
In order to get around the limit, Heroku's recommendation is to upload from the browser DIRECTLY to S3.
Amazon documents this by showing you how to write an HTML form to perform the upload.
Since I'm on Django, rather than write the HTML by hand, I'm using django-uploadify-s3 (example). This provides me with an SWF object, wrapped in JS, that performs the actual upload.
This part is working fine! Hooray!
The Problem
The problem is in tying that data back to my Django model in a sane way.
Right now the data comes back as a simple URL string, pointing to the file's location.
However, I was previously using S3 Boto from django-storages to manage all of my files as FileFields, backed by the delightful S3BotoStorageFile.
To reiterate, S3 Boto is working great in isolation, Uploadify is working great in isolation, the problem is in putting the two together.
My understanding is that the only way to populate the FileField is by providing both the filename AND the file content. When you're uploading files from the browser to Django, this is no problem, as Django has the file content in a buffer and can do whatever it likes with it. However, when doing direct-to-S3 uploads like me, Django only receives the file name and URL, not the binary data, so I can't properly populate the FieldFile.
Cry For Help
Anyone know a graceful way to use S3Boto's FileField in conjunction with direct-to-S3 uploading?
Else, what's the best way to manage an S3 file just based on its URL? Including setting expiration, key id, etc.
Many thanks!
Use a URLField.
I had a similar issue where i want to store file to s3 either directly using FileField or i have an option for the user to input the url directly. So to circumvent that, i used 2 fields in my model, one for FileField and one for URLField. And in the template i could use 'or' to see which one exists and to use that like {{ instance.filefield or instance.url }}.
This is untested, but you should be able to use:
from django.core.files.storage import default_storage
f = default_storage.open('name_you_expect_in_s3', 'r')
#f is an instance of S3BotoStorageFile, and can be assigned to a field
obj, created = YourObject.objects.get_or_create(**stuff_you_know)
obj.s3file_field = f
obj.save()
I think this should set up the local pointer to s3 and save it, without over writing the content.
ETA: You should do this only after the upload completes on S3 and you know the key in s3.
Checkout django-filetransfers. Looks like it plays nice with django-storages.
I've never used django, so ymmv :) but why not just write a single byte to populate the content? That way, you can still use FieldFile.
I'm thinking that writing actual SQL may be the easiest solution here. Alternatively you could subclass S3BotoStorage, override the _save method and allow for an optional kwarg of filepath which sidesteps all the other saving stuff and just returns the cleaned_name.