django update_or_create(), see what got updated - django

My django app uses update_or_create() to update a bunch of records. In some cases, updates are really few within a ton of records, and it would be nice to know what got updated within those records. Is it possible to know what got updated (i.e fields whose values got changed)? If not, does any one has ideas of workarounds to achieve that?
This will be invoked from the shell, so ideally it would be nice to be prompted for confirmation just before a value is being changed within update_or_create(), but if not that, knowing what got changed will also help.
Update (more context): Thought I'd give more context here. The data in this Django app gets updated through various means (through users coming on the web site, through the admin page, through scripts (run from the shell) that populate data from a csv etc.). The above question is important mostly for the shell scripts that update data from csvs, hence a solution at the database/trigger/signal level may not be helpful here (I guess).

This is what I ended up doing:
for row in reader:
school_obj0, created = Org.objects.get_or_create(school_id = row[0])
if (school_obj0.name != row[1]):
print (school_obj0.name, '==>', row[1])
confirmation = input('proceed? [y/n]: ')
if (confirmation == 'y'):
school_obj1, created = Org.objects.update_or_create(
school_id = row[0], defaults={"name": row[1],})
Happy to know about improvements to this approach (please see the update in the question with more context)

This will be invoked from the shell, so ideally it would be nice to be
prompted for confirmation just before a value is being changed
Unfortunately, databases don't work like that. It's the responsibility of applications to provide this functionality. And django isn't an application. You can however use django to write an application that provides this functionality.
As for finding out whether an object was updated or created, that's what the return value gives you. A tuple where the second value is a flag for update or create

Related

What is the proper way to create draft page programatically in the Wagtail?

Just to save page I do:
parent_page = Page.objects.get(title="parent")
page = Page(title="child", owner=request.user)
parent_page.add_child(instance=page)
page.save() # not sure if this required
What if I need to save Draft Page? This code seems to be working but I am not sure if it's correct:
parent_page = Page.objects.get(title="parent")
page = Page(title="child", live=False, owner=request.user)
parent_page.add_child(instance=page)
page.save_revision()
page.save()
Do I need call page.save() after page.save_revision()?
With this code everything works ok in my case but I have noticed that PageRevision.user_id is empty. How it can affect my code in future?
From wagtail code I see that I can save revision with user parameter:
page.save_revision(user=request.user, log_action=True)
I have saved revisions without user and don't see any side effect. How this field is critical? Do I need to fill user_id field for all existing revisions?
P.S. looks like it would be better to wrap this code in transaction because when it fails due to validation error manage.py fixtree show me error Incorrect numchild value.
You don't need page.save() in either case - parent_page.add_child handles the saving. (This might be what's causing the manage.py fixtree issue - django-treebeard doesn't update in-memory instances after database operations, so fields like numchild may be getting saved with incorrect values.)
The user field on revisions is optional, because changes to pages are not always performed by users - they might be done programmatically, like you are doing. It's just there for logging purposes (e.g. to show in the revision history of a page in the Wagtail admin).

How can I disable a Django cache?

I have written a small django app and in it I can upload some documents and at another point I can make a reference from a payment to the corresponding document. But my problem is that the list of documents is always outdated. But when I kill my gunicorn and start the webserver again, then the dropdown list with documents loads correctly and also shows the newest file uploads.
Do you have an idea why Django behaves like that?
Best regards
Edit: I just saw that I have to edit this question.
So I have this in my forms.py:
dms_objects = dmsdok.objects.all().order_by('-id')
choices_dms = []
for dms_obj in dms_objects:
choices_dms.append((dms_obj.id, dms_obj.dms_dok_titel))
And also I have this in the form:
zahlung_dok_pseudo_fk = forms.IntegerField(required=False, widget=forms.Select(attrs={'class':'form-control',}, choices= choices_dms), label='Zugehörigkeit Dokument')
I guess that this is what Willem means, but I do not know how I must change that.

Django - How to pass dynamic models between pages

I have made a django app that creates models and database tables on the fly. This is, as far as I can tell, the only viable way of doing what I need. The problem arises of how to pass a dynamically created model between pages.
I can think of a few ways of doing such but they all sound horrible. The methods I can think of are:
Use global variables within views.py. This seems like a horrible hack and likely to cause conflicts if there are multiple simultaneous users.
Pass a reference in the URL and use some eval hackery to try and refind the model. This is probably stupid as the model could potentially be garbage collected en route.
Use a place-holder app. This seems like a bad idea due to conflicts between multiple users.
Having an invisible form that posts the model when a link is clicked. Again very hacky.
Is there a good way of doing this, and if not, is one of these methods more viable than the others?
P.S. In case it helps my app receives data (as a json string) from a pre-existing database, and then caches it locally (i.e. on the webserver) creating an appropriate model and table on the fly. The idea is then to present this data and do various filtering and drill downs on it with-out placing undue strain on the main database (as each query returns a few hundred results out of a database of hundreds of millions of data points.) W.R.T. 3, the tables are named based on a hash of the query and time stamp, however a place-holder app would have a predetermined name.
Thanks,
jhoyla
EDITED TO ADD: Thanks guys, I have now solved this problem. I ended up using both answers together to give a complete answer. As I can only accept one I am going to accept the contenttypes one, sadly I don't have the reputation to give upvotes yet, however if/when I ever do I will endeavor to return and upvote appropriately.
The solution in it's totality,
from django.contrib.contenttypes.models import ContentType
view_a(request):
model = create_model(...)
request.session['model'] = ContentType.objects.get_for_model(model)
...
view_b(request):
ctmodel = request.session.get('model', None)
if not ctmodel:
return Http404
model = ctmodel.model_class()
...
My first thought would be to use content types and to pass the type/model information via the url.
You could also use Django's sessions framework, e.g.
def view_a(request):
your_model = request.session.get('your_model', None)
if type(your_model) == YourModel
your_model.name = 'something_else'
request.session['your_model'] = your_model
...
def view_b(request):
your_model = request.session.get('your_model', None)
...
You can store almost anything in the session dictionary, and managing it is also easy:
del request.session['your_model']

Mediawiki mass user delete/merge/block

I have 500 or so spambots and about 5 actual registered users on my wiki. I have used nuke to delete their pages but they just keep reposting. I have spambot registration under control using reCaptcha. Now, I just need a way to delete/block/merge about 500 users at once.
You could just delete the accounts from the user table manually, or at least disable their authentication info with a query such as:
UPDATE /*_*/user SET
user_password = '',
user_newpassword = '',
user_email = '',
user_token = ''
WHERE
/* condition to select the users you want to nuke */
(Replace /*_*/ with your $wgDBprefix, if any. Oh, and do make a backup first.)
Wiping out the user_password and user_newpassword fields prevents the user from logging in. Also wiping out user_email prevents them from requesting a new password via email, and wiping out user_token drops any active sessions they may have.
Update: Since I first posted this, I've had further experience of cleaning up large numbers of spam users and content from a MediaWiki installation. I've documented the method I used (which basically involves first deleting the users from the database, then wiping out up all the now-orphaned revisions, and finally running rebuildall.php to fix the link tables) in this answer on Webmasters Stack Exchange.
Alternatively, you might also find Extension:RegexBlock useful:
"RegexBlock is an extension that adds special page with the interface for blocking, viewing and unblocking user names and IP addresses using regular expressions."
There are risks involved in applying the solution in the accepted answer. The approach may damage your database! It incompletely removes users, doing nothing to preserve referential integrity, and will almost certainly cause display errors.
Here a much better solution is presented (a prerequisite is that you have installed the User merge extension):
I have a little awkward way to accomplish the bulk merge through a
work-around. Hope someone would find it useful! (Must have a little
string concatenation skills in spreadsheets; or one may use a python
or similar script; or use a text editor with bulk replacement
features)
Prepare a list of all SPAMuserIDs, store them in a spreadsheet or textfile. The list may be
prepared from the user creation logs. If you do have the
dB access, the Wiki_user table can be imported into a local list.
The post method used for submitting the Merge & Delete User form (by clicking the button) should be converted to a get method. This
will get us a long URL. See the second comment (by Matthew Simoneau)
dated 13/Jan/2009) at
http://www.mathworks.com/matlabcentral/newsreader/view_thread/242300
for the method.
The resulting URL string should be something like below:
http: //(Your Wiki domain)/Special:UserMerge?olduser=(OldUserNameHere)&newuser=(NewUserNameHere)&deleteuser=1&token=0d30d8b4033a9a523b9574ccf73abad8%2B\
Now, divide this URL into four sections:
A: http: //(Your Wiki domain)/Special:UserMerge?olduser=
B: (OldUserNameHere)
C: &newuser=(NewUserNameHere)&deleteuser=1
D: &token=0d30d8b4033a9a523b9574ccf73abad8%2B\
Now using a text editor or spreadsheet, prefix each spam userIDs with part A and Suffix each with Part C and D. Part C will include the
NewUser(which is a specially created single dummy userID). The Part D,
the Token string is a session-dependent token that will be changed per
user per session. So you will need to get a new token every time a new
session/batch of work is required.
With the above step, you should get a long list of URLs, each good to do a Merge&Delete operation for one user. We can now create a
simple HTML file, view it and use a batch downloader like DownThemAll
in Firefox.
Add two more pieces " Linktext" to each line at
beginning and end. Also add at top and at
bottom and save the file as (for eg:) userlist.html
Open the file in Firefox, use DownThemAll add-on and download all the files! Effectively, you are visiting the Merge&Delete page for
each user and clicking the button!
Although this might look a lengthy and tricky job at first, once you
follow this method, you can remove tens of thousands of users without
much manual efforts.
You can verify if the operation is going well by opening some of the
downloaded html files (or by looking through the recent changes in
another window).
One advantage is that it does not directly edit the
MySQL pages. Nor does it require direct database access.
I did a bit of rewriting to the quoted text, since the original text contains some flaws.

Django: How can I protect against concurrent modification of database entries

If there a way to protect against concurrent modifications of the same data base entry by two or more users?
It would be acceptable to show an error message to the user performing the second commit/save operation, but data should not be silently overwritten.
I think locking the entry is not an option, as a user might use the "Back" button or simply close his browser, leaving the lock for ever.
This is how I do optimistic locking in Django:
updated = Entry.objects.filter(Q(id=e.id) && Q(version=e.version))\
.update(updated_field=new_value, version=e.version+1)
if not updated:
raise ConcurrentModificationException()
The code listed above can be implemented as a method in Custom Manager.
I am making the following assumptions:
filter().update() will result in a single database query because filter is lazy
a database query is atomic
These assumptions are enough to ensure that no one else has updated the entry before. If multiple rows are updated this way you should use transactions.
WARNING Django Doc:
Be aware that the update() method is
converted directly to an SQL
statement. It is a bulk operation for
direct updates. It doesn't run any
save() methods on your models, or emit
the pre_save or post_save signals
This question is a bit old and my answer a bit late, but after what I understand this has been fixed in Django 1.4 using:
select_for_update(nowait=True)
see the docs
Returns a queryset that will lock rows until the end of the transaction, generating a SELECT ... FOR UPDATE SQL statement on supported databases.
Usually, if another transaction has already acquired a lock on one of the selected rows, the query will block until the lock is released. If this is not the behavior you want, call select_for_update(nowait=True). This will make the call non-blocking. If a conflicting lock is already acquired by another transaction, DatabaseError will be raised when the queryset is evaluated.
Of course this will only work if the back-end support the "select for update" feature, which for example sqlite doesn't. Unfortunately: nowait=True is not supported by MySql, there you have to use: nowait=False, which will only block until the lock is released.
Actually, transactions don't help you much here ... unless you want to have transactions running over multiple HTTP requests (which you most probably don't want).
What we usually use in those cases is "Optimistic Locking". The Django ORM doesn't support that as far as I know. But there has been some discussion about adding this feature.
So you are on your own. Basically, what you should do is add a "version" field to your model and pass it to the user as a hidden field. The normal cycle for an update is :
read the data and show it to the user
user modify data
user post the data
the app saves it back in the database.
To implement optimistic locking, when you save the data, you check if the version that you got back from the user is the same as the one in the database, and then update the database and increment the version. If they are not, it means that there has been a change since the data was loaded.
You can do that with a single SQL call with something like :
UPDATE ... WHERE version = 'version_from_user';
This call will update the database only if the version is still the same.
Django 1.11 has three convenient options to handle this situation depending on your business logic requirements:
Something.objects.select_for_update() will block until the model become free
Something.objects.select_for_update(nowait=True) and catch DatabaseError if the model is currently locked for update
Something.objects.select_for_update(skip_locked=True) will not return the objects that are currently locked
In my application, which has both interactive and batch workflows on various models, I found these three options to solve most of my concurrent processing scenarios.
The "waiting" select_for_update is very convenient in sequential batch processes - I want them all to execute, but let them take their time. The nowait is used when an user wants to modify an object that is currently locked for update - I will just tell them it's being modified at this moment.
The skip_locked is useful for another type of update, when users can trigger a rescan of an object - and I don't care who triggers it, as long as it's triggered, so skip_locked allows me to silently skip the duplicated triggers.
For future reference, check out https://github.com/RobCombs/django-locking. It does locking in a way that doesn't leave everlasting locks, by a mixture of javascript unlocking when the user leaves the page, and lock timeouts (e.g. in case the user's browser crashes). The documentation is pretty complete.
You should probably use the django transaction middleware at least, even regardless of this problem.
As to your actual problem of having multiple users editing the same data... yes, use locking. OR:
Check what version a user is updating against (do this securely, so users can't simply hack the system to say they were updating the latest copy!), and only update if that version is current. Otherwise, send the user back a new page with the original version they were editing, their submitted version, and the new version(s) written by others. Ask them to merge the changes into one, completely up-to-date version. You might try to auto-merge these using a toolset like diff+patch, but you'll need to have the manual merge method working for failure cases anyway, so start with that. Also, you'll need to preserve version history, and allow admins to revert changes, in case someone unintentionally or intentionally messes up the merge. But you should probably have that anyway.
There's very likely a django app/library that does most of this for you.
Another thing to look for is the word "atomic". An atomic operation means that your database change will either happen successfully, or fail obviously. A quick search shows this question asking about atomic operations in Django.
The idea above
updated = Entry.objects.filter(Q(id=e.id) && Q(version=e.version))\
.update(updated_field=new_value, version=e.version+1)
if not updated:
raise ConcurrentModificationException()
looks great and should work fine even without serializable transactions.
The problem is how to augment the deafult .save() behavior as to not have to do manual plumbing to call the .update() method.
I looked at the Custom Manager idea.
My plan is to override the Manager _update method that is called by Model.save_base() to perform the update.
This is the current code in Django 1.3
def _update(self, values, **kwargs):
return self.get_query_set()._update(values, **kwargs)
What needs to be done IMHO is something like:
def _update(self, values, **kwargs):
#TODO Get version field value
v = self.get_version_field_value(values[0])
return self.get_query_set().filter(Q(version=v))._update(values, **kwargs)
Similar thing needs to happen on delete. However delete is a bit more difficult as Django is implementing quite some voodoo in this area through django.db.models.deletion.Collector.
It is weird that modren tool like Django lacks guidance for Optimictic Concurency Control.
I will update this post when I solve the riddle. Hopefully solution will be in a nice pythonic way that does not involve tons of coding, weird views, skipping essential pieces of Django etc.
To be safe the database needs to support transactions.
If the fields is "free-form" e.g. text etc. and you need to allow several users to be able to edit the same fields (you can't have single user ownership to the data), you could store the original data in a variable.
When the user committs, check if the input data has changed from the original data (if not, you don't need to bother the DB by rewriting old data),
if the original data compared to the current data in the db is the same you can save, if it has changed you can show the user the difference and ask the user what to do.
If the fields is numbers e.g. account balance, number of items in a store etc., you can handle it more automatically if you calculate the difference between the original value (stored when the user started filling out the form) and the new value you can start a transaction read the current value and add the difference, then end transaction. If you can't have negative values, you should abort the transaction if the result is negative, and tell the user.
I don't know django, so I can't give you teh cod3s.. ;)
From here:
How to prevent overwriting an object someone else has modified
I'm assuming that the timestamp will be held as a hidden field in the form you're trying to save the details of.
def save(self):
if(self.id):
foo = Foo.objects.get(pk=self.id)
if(foo.timestamp > self.timestamp):
raise Exception, "trying to save outdated Foo"
super(Foo, self).save()