Django: Save ContentFile (or some kind of virtual file) to database - django

In my django app I create a string which I have to save to my database as a File.
If i understand correctly, django is supposed to automatically upload the file when I save the entry:
class Foo(models.Model):
bar1=models.ForeignKey(Table1)
bar2=models.ForeignKey(Table2)
res=models.FileField(upload_to='MYPATH')
The problem is that to create an instance of Foo, I have to first create a physical file on the server's disk which would mean two copies would be created (one by me in order to create a database entry, one by django when saving the entry).
As far as I can see, I must create the file myself in 'MYPATH' and instead of using a FileField, I have to save a reference in the database (essentially what django is doing ????). However I have doubts that this is the best method as
It doesn't strike me as Pythonesque.
I won't have access to the same methods as when using a real FileField. For instance when calling it, I won't have a FieldFile but just a reference string.
Basically, what I wanted to do was: String --> ContentFile (or some form of "virtual" file) --> Physical File handled by Django when saving entry in the database.
entry = Foo(bar1=var1, bar2=var2, res=ContentFile(XMLSTRING))
entry.save()
This doesn't work, but it shows what I want to achieve.
So, please show me one of the following three:
How to save a file to the database without physically creating it (using a ContentFile doesn't create a physical file after saving the entry which is what I want to do)
Make django not upload the given file but use the already uploaded version whilst maintaining all the methods provided by FileField
What I don't understand.
I apologize for [my english, my lack of understanding, the lack of clarity]
Anything you need to know, I'd happy to specify.
EDIT: I looked at this thread, but there, the urlretrieve creates a temp file, which is something I don't really want to do. Maybe I should be doing that, but is there a better way?

Related

Delete Django ImageField and FileField records whose files are not found on the server

I've recently lost some files in my media folder, and I want to delete the image field/FileField objects whose files have been removed, across all models of my application.
I've tried django-cleanup, but it appears to be doing the inverse operation, i.e. deleting files on the server whose objects have been removed from the database.
You can write a management command for this, here is a way how to handle this
Class cleanup(BaseCommand):
def handle(self,options):
for obj in Files.objects.all():
if os.path.exists(settings.MEDIA_DIR+ obj.filename): continue
obj.delete()
Note that a FileField will be falsy if the file is not there, so you can use this simple solution:
for instance in ModelWithFileField.objects.all():
if bool(instance.file_field):
continue
instance.delete()
You can even do it in a django shell, so you do not have to write a script for it.

Selectively deleting images uploaded to Azure blob (Django/Python project)

In a Django (Python) project, I'm using Azure blobs to store photos uploaded by users. The code simply goes something like this:
from azure.storage.blob import BlobService
blob_service = BlobService(account_name=accountName, account_key=accountKey)
blob_service.put_blob(
'pictures',
name, # including folder
content_str, # image as stream
x_ms_blob_type='BlockBlob',
x_ms_blob_content_type=content_type,
x_ms_blob_cache_control ='public, max-age=3600, s-maxage=86400'
)
My question is: what's the equivalent method to delete an uploaded photo in my particular scenario? I'm writing a task to periodically clean up my data models, and so want to get rid of images associated to them as well.
You should be able to use:
blob_service.delete_blob(container_name, blob_name)
You can also delete an entire container:
blob_service.delete_container(container_name)
There are a few extra parameters which will be helpful to you if you're trying to delete snapshots, deal with leases, etc.
Note that put_blob() is defined in blockblobservice.py, while delete_blob() is defined in baseblobservice.py (deletes are going to be the same, whether page, block, or append blob).

How to create separate python script for uploading data into ndb

Can anyone guide me towards the right direction as to where I should place a script solely for loading data into ndb. As I wish to upload all data into the gae ndb so that the application could perform query on it.
Right now, the loading of data is in my application. I wish to placed it separately from the main application.
Should it be edited in the yaml file?
EDITED
This is a snippet of the entity and the handler to upload the data into GAE ndb.
I wish to placed this chunk of code separately from my main application .py. Reason being the uploading of this data won't be done frequently and to keep the codes in the main application "cleaner".
class TagTrend_refine(ndb.Model):
tag = ndb.StringProperty()
trendData = ndb.BlobProperty(compressed=True)
class MigrateData(webapp2.RequestHandler):
def get(self):
listOfEntities = []
f = open("tagTrend_refine.txt")
lines = f.readlines()
f.close()
for line in lines:
temp = line.strip().split("\t")
data = TagTrend_refine(
tag = temp[0],
trendData = temp[1]
)
listOfEntities.append(data)
ndb.put_multi(listOfEntities)
For example if I placed the above code in a file called dataLoader.py, where should I call this script to invoke?
In app.yaml alongside my main application(knowledgeGraph.application)?
- url: /.*
script: knowledgeGraph.application
You don't show us the application object (no doubt a WSGI app) in your knowledge.py module, so I can't know what URL you want to serve with the MigrateData handler -- I'll just guess it's something like /migratedata.
So the class TagTrend_refine should be in a separate file (usually called models.py) so that both your dataloader.py, and your knowledge.py, can import models to access it (and models.py will need its own import of ndb of course). (Then of course access to the entity class will be as models.TagTrend_refine -- very basic Python).
Next, you'll complete dataloader.py by defining a WSGI app, e.g, at end of file,
app = webapp2.WSGIApplication(routes=[('/migratedata', MigrateData)])
(of course this means this module will need to import webapp2 as well -- can I take for granted a knowledge of super-elementary Python?).
In app.yaml, as the first URL, before that /.*, you'll have:
url: /migratedata
script: dataloader.app
Given all this, when you visit '/migratedata', your handler will read the "tagTrend_refine.txt" file that you uploaded together with your .py, .yaml, and so on, files in your overall GAE application, and unconditionally create one entity per line of that file (assuming you fix the multiple indentation problems in your code as displayed above, but, again, this is just super-elementary Python -- presumably you've used both tabs and spaces and they show up OK in your editor, but not here on SO... I recommend you use strictly, only spaces, never tabs, in Python code).
However this does seem to be a peculiar task. If /migratedata gets visited twice, it will create duplicates of all entities. If you change the tagTrend_refine.txt and deploy a changed variation, then visit /migratedata... all old entities will stick around and all the new entities will join them. And so forth.
Moreover -- /migratedata is NOT idempotent (if visited more than once it does not produce the same state as running it just once) so it shouldn't be a GET (and now we're on to super-elementary HTTP for a change!-) -- it should be a POST.
In fact I suspect (but I'm really flying blind here, since you see fit to give such tiny amounts of information) that you in fact want to upload a .txt file to a POST handler and do the updates that way (perhaps avoiding duplicates...?). However, I'm no mind reader, so this is about as far as I can go.
I believe I have fully answered the question you posted (though perhaps not the one you meant but didn't express:-) and by SO's etiquette it would be nice to upvote and accept this answer, then, if needed, post another question, expressing MUCH more clearly and completely what you're trying to achieve, your current .py and .yaml (ideally with correct indentation), what they actually do and why you'd like to do something different. For POST vs GET in particular, just study When should I use GET or POST method? What's the difference between them? ...
Alex's solution will work, as long as all you data can be loaded in under 1 minute, as that's the timeout for an app engine request.
For larger data, consider calling the datastore API directly from your own computer where you have the source. It's a bit of a hassle because it's a different API; it's not ndb. But it's still a pretty simple API. Here's some code that calls the API:
https://github.com/GoogleCloudPlatform/getting-started-python/blob/master/2-structured-data/bookshelf/model_datastore.py
Again, this code can run anywhere. It doesn't need to be uploaded to app engine to run.

Use validation to prevent duplicate file _name_ being uploaded

How can I detect that the name of a file that a user has provided for upload (via a django.forms.ModelForm using a FileField field) is a duplicate of one that exists, and thus decide to fail validation on the form?
I'm finding this particularly challenging, because from within the form, I don't see how I can find out what the value of upload_to is for this FileField, so I can't go looking myself in the file system to see if that file is there already.
As i see it you have 2 options:
Set a value in your settings.py to hold your 'upload_to' and then use it to check when you are validating.
Something like this to verify would work (you need to change your upload_to ofc):
from django.conf import settings
if settings.UPLOAD_TO:
# Do something
Issue with that is that you can't have subfolders or anything complex there.
A second option would be, as mentioned in your comments, to add a new column to your model that holds a hash for your file. This approach should work better. As someone mentioned in your comments, to avoid uploading a big file, checking, failing, uploading another big file, etc, you can try to hash it in the client and verify it via ajax first (you will verify it again in the server, but this can make things go faster for your users).
Older question, but Django 1.11 now supports the unique option on FileField. Set unique=True on your field declaration on your model.
It shouldn't matter what you are setting upload_to to. The file name will still be stored in the database.
Changed in Django 1.11:
In older versions, unique=True can’t be used on FileField.
https://docs.djangoproject.com/en/1.11/ref/models/fields/#unique

Django 1.3 removing images

Django 1.3 do not removes file which was deleted from database. I'm not found how to set Django remove deleted files. Is it possible? If so, how?
Simple:
Override the delete and save and method in your model. Remember that a file can both be dereferenced by the deletion of the object, but also by uploading a new file. But beware that the delete method is not called when you delete in bulk, ie. QuerySet.delete().
https://docs.djangoproject.com/en/dev/topics/db/models/#overriding-model-methods
You can also use signals:
https://code.djangoproject.com/wiki/Signals
But beware! The reason why Django does not automatically delete the file, is that it cannot guarantee that the file is not refered to by some other application or model. But if you can guarantee that as a programmer, go ahead.
This blog article gives you the best possible info, I think:
http://haineault.com/blog/147/