Selectively deleting images uploaded to Azure blob (Django/Python project) - django

In a Django (Python) project, I'm using Azure blobs to store photos uploaded by users. The code simply goes something like this:
from azure.storage.blob import BlobService
blob_service = BlobService(account_name=accountName, account_key=accountKey)
blob_service.put_blob(
'pictures',
name, # including folder
content_str, # image as stream
x_ms_blob_type='BlockBlob',
x_ms_blob_content_type=content_type,
x_ms_blob_cache_control ='public, max-age=3600, s-maxage=86400'
)
My question is: what's the equivalent method to delete an uploaded photo in my particular scenario? I'm writing a task to periodically clean up my data models, and so want to get rid of images associated to them as well.

You should be able to use:
blob_service.delete_blob(container_name, blob_name)
You can also delete an entire container:
blob_service.delete_container(container_name)
There are a few extra parameters which will be helpful to you if you're trying to delete snapshots, deal with leases, etc.
Note that put_blob() is defined in blockblobservice.py, while delete_blob() is defined in baseblobservice.py (deletes are going to be the same, whether page, block, or append blob).

Related

Delete Django ImageField and FileField records whose files are not found on the server

I've recently lost some files in my media folder, and I want to delete the image field/FileField objects whose files have been removed, across all models of my application.
I've tried django-cleanup, but it appears to be doing the inverse operation, i.e. deleting files on the server whose objects have been removed from the database.
You can write a management command for this, here is a way how to handle this
Class cleanup(BaseCommand):
def handle(self,options):
for obj in Files.objects.all():
if os.path.exists(settings.MEDIA_DIR+ obj.filename): continue
obj.delete()
Note that a FileField will be falsy if the file is not there, so you can use this simple solution:
for instance in ModelWithFileField.objects.all():
if bool(instance.file_field):
continue
instance.delete()
You can even do it in a django shell, so you do not have to write a script for it.

How to log images in middle of `train`

It seems like you can only log data by return values from train. In many workflows, it might make more sense to directly save images in the middle of a train function (e.g. save images sampled by a generative model or from a vision-based MDP).
Is there a simple way to do this? One idea would be to try to find the log-directory and write to it directly, but would this have issues?
I'm guessing you're asking about using logging images in Trainable:_train().
If you're not in local mode, within the trainable, you can access the self.logdir attribute to write images to. This should automatically be sync'ed back to your head node (if you're running remotely).

Django: Save ContentFile (or some kind of virtual file) to database

In my django app I create a string which I have to save to my database as a File.
If i understand correctly, django is supposed to automatically upload the file when I save the entry:
class Foo(models.Model):
bar1=models.ForeignKey(Table1)
bar2=models.ForeignKey(Table2)
res=models.FileField(upload_to='MYPATH')
The problem is that to create an instance of Foo, I have to first create a physical file on the server's disk which would mean two copies would be created (one by me in order to create a database entry, one by django when saving the entry).
As far as I can see, I must create the file myself in 'MYPATH' and instead of using a FileField, I have to save a reference in the database (essentially what django is doing ????). However I have doubts that this is the best method as
It doesn't strike me as Pythonesque.
I won't have access to the same methods as when using a real FileField. For instance when calling it, I won't have a FieldFile but just a reference string.
Basically, what I wanted to do was: String --> ContentFile (or some form of "virtual" file) --> Physical File handled by Django when saving entry in the database.
entry = Foo(bar1=var1, bar2=var2, res=ContentFile(XMLSTRING))
entry.save()
This doesn't work, but it shows what I want to achieve.
So, please show me one of the following three:
How to save a file to the database without physically creating it (using a ContentFile doesn't create a physical file after saving the entry which is what I want to do)
Make django not upload the given file but use the already uploaded version whilst maintaining all the methods provided by FileField
What I don't understand.
I apologize for [my english, my lack of understanding, the lack of clarity]
Anything you need to know, I'd happy to specify.
EDIT: I looked at this thread, but there, the urlretrieve creates a temp file, which is something I don't really want to do. Maybe I should be doing that, but is there a better way?

How do I import globals in Postman

My team has just started using Team Syncing in Postman which seems to be a great feature, but we want to be able to share the large set of global variables we use within our collections.
These are not synced to the cloud server and there doesn't seem to be a way to import them.
Has any got a good way to share these throughout the team without everyone manually entering each one?
To share globals in Postman:
Export as JSON, share JSON file
Import globals from JSON
The same steps in more detail:
To export as JSON:
a) Go to gear in upper right-hand corner, choose Manage Environments from the dropdown
b) Click Globals button
c) Choose Download as JSON
To import from JSON:
a) Choose Import from upper left of Postman window
b) Select your JSON file or drag it into the resulting window:
NOTE: Even though this window says it only imports collections, environments, data dumps, curl commands, and RAML/WADL/Swagger/Runscope, it will also work for globals.
c) Click Open in the system dialog box (after choosing file). Your globals will be imported. You may receive an error along with the confirmation message, but the globals were still imported.
You can take a backup of the collections that the postman saves, by checking
To get the Postman storage location
Search for
chrome-extension://fhbjgbiflinjbdggehcddcbncdddomop/
The location given under Paths is the one that contains all the data. For eg: /home/xyz/.config/chromium/Default/Storage/ext/fhbjgbiflinjbdggehcddcbncdddomop/def
Using this, the collections also can be transferred from one machine to another.
In the current version (Version 9.31.0).
I find the number 1 & 2 depicted below is the way to export the environment fully to a JSON file.
The steps 3 & 4 in that order is for importing. Now, watch out for duplicates after importing as I find they aren't replaced, instead anew created and you may want to delete older/un-used versions of the same.

How to create separate python script for uploading data into ndb

Can anyone guide me towards the right direction as to where I should place a script solely for loading data into ndb. As I wish to upload all data into the gae ndb so that the application could perform query on it.
Right now, the loading of data is in my application. I wish to placed it separately from the main application.
Should it be edited in the yaml file?
EDITED
This is a snippet of the entity and the handler to upload the data into GAE ndb.
I wish to placed this chunk of code separately from my main application .py. Reason being the uploading of this data won't be done frequently and to keep the codes in the main application "cleaner".
class TagTrend_refine(ndb.Model):
tag = ndb.StringProperty()
trendData = ndb.BlobProperty(compressed=True)
class MigrateData(webapp2.RequestHandler):
def get(self):
listOfEntities = []
f = open("tagTrend_refine.txt")
lines = f.readlines()
f.close()
for line in lines:
temp = line.strip().split("\t")
data = TagTrend_refine(
tag = temp[0],
trendData = temp[1]
)
listOfEntities.append(data)
ndb.put_multi(listOfEntities)
For example if I placed the above code in a file called dataLoader.py, where should I call this script to invoke?
In app.yaml alongside my main application(knowledgeGraph.application)?
- url: /.*
script: knowledgeGraph.application
You don't show us the application object (no doubt a WSGI app) in your knowledge.py module, so I can't know what URL you want to serve with the MigrateData handler -- I'll just guess it's something like /migratedata.
So the class TagTrend_refine should be in a separate file (usually called models.py) so that both your dataloader.py, and your knowledge.py, can import models to access it (and models.py will need its own import of ndb of course). (Then of course access to the entity class will be as models.TagTrend_refine -- very basic Python).
Next, you'll complete dataloader.py by defining a WSGI app, e.g, at end of file,
app = webapp2.WSGIApplication(routes=[('/migratedata', MigrateData)])
(of course this means this module will need to import webapp2 as well -- can I take for granted a knowledge of super-elementary Python?).
In app.yaml, as the first URL, before that /.*, you'll have:
url: /migratedata
script: dataloader.app
Given all this, when you visit '/migratedata', your handler will read the "tagTrend_refine.txt" file that you uploaded together with your .py, .yaml, and so on, files in your overall GAE application, and unconditionally create one entity per line of that file (assuming you fix the multiple indentation problems in your code as displayed above, but, again, this is just super-elementary Python -- presumably you've used both tabs and spaces and they show up OK in your editor, but not here on SO... I recommend you use strictly, only spaces, never tabs, in Python code).
However this does seem to be a peculiar task. If /migratedata gets visited twice, it will create duplicates of all entities. If you change the tagTrend_refine.txt and deploy a changed variation, then visit /migratedata... all old entities will stick around and all the new entities will join them. And so forth.
Moreover -- /migratedata is NOT idempotent (if visited more than once it does not produce the same state as running it just once) so it shouldn't be a GET (and now we're on to super-elementary HTTP for a change!-) -- it should be a POST.
In fact I suspect (but I'm really flying blind here, since you see fit to give such tiny amounts of information) that you in fact want to upload a .txt file to a POST handler and do the updates that way (perhaps avoiding duplicates...?). However, I'm no mind reader, so this is about as far as I can go.
I believe I have fully answered the question you posted (though perhaps not the one you meant but didn't express:-) and by SO's etiquette it would be nice to upvote and accept this answer, then, if needed, post another question, expressing MUCH more clearly and completely what you're trying to achieve, your current .py and .yaml (ideally with correct indentation), what they actually do and why you'd like to do something different. For POST vs GET in particular, just study When should I use GET or POST method? What's the difference between them? ...
Alex's solution will work, as long as all you data can be loaded in under 1 minute, as that's the timeout for an app engine request.
For larger data, consider calling the datastore API directly from your own computer where you have the source. It's a bit of a hassle because it's a different API; it's not ndb. But it's still a pretty simple API. Here's some code that calls the API:
https://github.com/GoogleCloudPlatform/getting-started-python/blob/master/2-structured-data/bookshelf/model_datastore.py
Again, this code can run anywhere. It doesn't need to be uploaded to app engine to run.