Django and Dynamic Example Data - django

I'm trying to find a way to easily generate an example/demonstration data set from initial_data.json in Django.
Essentially, the fixtures and initial_data.json do exactly what I need, except that the dates are static....
My app uses dates to display/sort otherwise easily generated information (comments, scores etc) and I'd like to create a thorough data set in order to be able to demonstrate the app's functions to prospective clients; the problem arises with the dates. Even if I run syncdb (which automatically includes my initial_data.json), the dates are static, so all the information will relate to those specific dates, rather than to today. As time passes, that data will become less visible in the app and will therefore not fully demonstrate it's abilities to potential clients.
Is there an easy way to update date information in initial_data.json so that dates remain relevant to the current real date and I can then run syncdb again with those new dates? (Assume that this is all on a local machine merely as a demonstration to clients... Not on a server, production or otherwise).
I hope this makes sense?!

you might be better off writing a function (maybe a management command) to generate some dummy data and save to your (temporary?) database

OK, my solution was to use django-mockups: https://github.com/sorl/django-mockups
It adds random data to your tables (all of them or only those specified by the user) by obeying the field types (text, email, url etc) and the max_length specified in those fields. Inserts Lorem Ipsum and inserts correctly formatted email address etc etc
Very easy to use, can be set to run through a cron job, or can be run manually as and when required. Perfect.

Related

How to change Sitecore Template field Item Id without data loss?

I recently noticed there is a difference in Item Id for a Sitecore template field between 2 environments (Source and Target). Due to this, any data changes to the field value for the dataitem using the template is not reflecting to target Sitecore database.
Hence, we manually copy the value from source to target and which takes lot of time to sync the 2 environments. Any idea how to change the template field Item Id in Sitecore without data loss in target instance?
Thanks
The template fields have most likely been created manually on the different servers, as #AdrianIorgu has suggested. I am going to suggest that you don't worry about merging fields and tools.
What you really care about is the content on the PRODUCTION instance of your site (assuming that this is Target). In any other environment, content should be regarded throwaway.
With that in mind, create a package of the template from your PRODUCTION instance and the install that in the other environments, deleting the duplicate field from the Source instance. The GUIDs of the field should now match across all environments. Check this into your source control (using TDS or Unicorn or whatever). You can then correctly update any standard values and that will be reflect through the server when you deploy again.
If your other environments (dev/qa/pre-prod) result in data loss for that field then don't worry about it, restore a backup from PROD.
Most likely that happened because the field or the template was added manually on the second environment, without migrating the items using packages, serialization or a third-party tool like TDS or Unicorn.
As #SitecoreClimber mentioned above, you can use Razl to sync the two environments and see the differences, but I don't think you will be able to change the field's GUID, to have the two environments consistent, without any data loss. Depending on the volume of your data, fixing this can be tricky.
What I would do:
make sure the target instance has the right template by installing a package with the correct template from source (with a MERGE-MERGE operation), which will end up having a duplicate field name
write a SQL query to get a list of all the items that have value for that field and update the value to the new field
Warning: this SQL query below is just a sample to get you started, make sure you extend and test this properly before running on a CD instance
use YOUR_DATABASE
begin tran
Declare #oldFieldId nvarchar(100), #newFieldId nvarchar(100), #previousValue nvarchar(100), #newValue nvarchar(100)
set #oldFieldID = '75577384-3C97-45DA-A847-81B00500E250' //old field ID
set #newFieldID = 'A2F96461-DE33-4CC6-B758-D5183676509B' //new field ID
/* versionedFields */
Select itemId, fieldid, value
from [dbo].[versionedFields] f with (nolock)
where f.FieldId like #oldFieldID
For this kind of stuff I sugest you to use Sitecore Razl.
It's a tool for comparing and merging sitecore databases.
Razl allows developers to have a complete side by side comparison between two Sitecore databases; highlighting features that are missing or not up to date. Razl also gives developers the ability to simply move the item from one database to another.
Whether it's finding that one missing template, moving your entire database or just one item, Razl allows you to do it seamlessly and worry free.
It's not a free tool, you can check here how you can buy it:
https://www.razl.net/purchase.aspx

Mediawiki mass user delete/merge/block

I have 500 or so spambots and about 5 actual registered users on my wiki. I have used nuke to delete their pages but they just keep reposting. I have spambot registration under control using reCaptcha. Now, I just need a way to delete/block/merge about 500 users at once.
You could just delete the accounts from the user table manually, or at least disable their authentication info with a query such as:
UPDATE /*_*/user SET
user_password = '',
user_newpassword = '',
user_email = '',
user_token = ''
WHERE
/* condition to select the users you want to nuke */
(Replace /*_*/ with your $wgDBprefix, if any. Oh, and do make a backup first.)
Wiping out the user_password and user_newpassword fields prevents the user from logging in. Also wiping out user_email prevents them from requesting a new password via email, and wiping out user_token drops any active sessions they may have.
Update: Since I first posted this, I've had further experience of cleaning up large numbers of spam users and content from a MediaWiki installation. I've documented the method I used (which basically involves first deleting the users from the database, then wiping out up all the now-orphaned revisions, and finally running rebuildall.php to fix the link tables) in this answer on Webmasters Stack Exchange.
Alternatively, you might also find Extension:RegexBlock useful:
"RegexBlock is an extension that adds special page with the interface for blocking, viewing and unblocking user names and IP addresses using regular expressions."
There are risks involved in applying the solution in the accepted answer. The approach may damage your database! It incompletely removes users, doing nothing to preserve referential integrity, and will almost certainly cause display errors.
Here a much better solution is presented (a prerequisite is that you have installed the User merge extension):
I have a little awkward way to accomplish the bulk merge through a
work-around. Hope someone would find it useful! (Must have a little
string concatenation skills in spreadsheets; or one may use a python
or similar script; or use a text editor with bulk replacement
features)
Prepare a list of all SPAMuserIDs, store them in a spreadsheet or textfile. The list may be
prepared from the user creation logs. If you do have the
dB access, the Wiki_user table can be imported into a local list.
The post method used for submitting the Merge & Delete User form (by clicking the button) should be converted to a get method. This
will get us a long URL. See the second comment (by Matthew Simoneau)
dated 13/Jan/2009) at
http://www.mathworks.com/matlabcentral/newsreader/view_thread/242300
for the method.
The resulting URL string should be something like below:
http: //(Your Wiki domain)/Special:UserMerge?olduser=(OldUserNameHere)&newuser=(NewUserNameHere)&deleteuser=1&token=0d30d8b4033a9a523b9574ccf73abad8%2B\
Now, divide this URL into four sections:
A: http: //(Your Wiki domain)/Special:UserMerge?olduser=
B: (OldUserNameHere)
C: &newuser=(NewUserNameHere)&deleteuser=1
D: &token=0d30d8b4033a9a523b9574ccf73abad8%2B\
Now using a text editor or spreadsheet, prefix each spam userIDs with part A and Suffix each with Part C and D. Part C will include the
NewUser(which is a specially created single dummy userID). The Part D,
the Token string is a session-dependent token that will be changed per
user per session. So you will need to get a new token every time a new
session/batch of work is required.
With the above step, you should get a long list of URLs, each good to do a Merge&Delete operation for one user. We can now create a
simple HTML file, view it and use a batch downloader like DownThemAll
in Firefox.
Add two more pieces " Linktext" to each line at
beginning and end. Also add at top and at
bottom and save the file as (for eg:) userlist.html
Open the file in Firefox, use DownThemAll add-on and download all the files! Effectively, you are visiting the Merge&Delete page for
each user and clicking the button!
Although this might look a lengthy and tricky job at first, once you
follow this method, you can remove tens of thousands of users without
much manual efforts.
You can verify if the operation is going well by opening some of the
downloaded html files (or by looking through the recent changes in
another window).
One advantage is that it does not directly edit the
MySQL pages. Nor does it require direct database access.
I did a bit of rewriting to the quoted text, since the original text contains some flaws.

Django: Possible to load fixtures with date fields based on the current date?

I want to load fixtures into django. The data has some date fields - is it possible to create these data so they will always be e.g. yesterday or tomorrow? I want to make sure certain data is always fresh, but also so I can easily test edge cases (e.g. whether an object is enabled if the publishing date is today, etc).
Fixtures just load text data files (in JSON/XML/YAML) so, there's no real way to insert dynamically generated data by just loading a fixture. On the other hand, you can get around this using other methods.
One option is the package django-fixture-generator where you can write python/django code to create data and it will be inserted before your tests are called.
Another option is a previous SO question: How to load sql fixture in Django for User model?. This has some code on using SQL files for fixtures, where you can use a SQL expression for your date requirements (e.g. GETDATE()+1 or similar in your SQL dialect).

What is a sane way to perform a radical Django Model migration in a production environment?

I have an existing django web app that is in use. I have to radically migrate one key model in my design to a completely new design, but I want to cache all of the existing data for that model and migrate them to the new records in production when ready to deploy.
I can afford to bring my website down for a few hours one night and do whatever I need to do to migrate. What are some sane ways I can do this migration?
It seems any migration would need to:
1) Dump all of the existing data into some format, such as SQL, JSON, XML
2) Migrate the model to the new format
3) Reload the data into the new model using a conversion script
I also thought of trying to store all of the existing data in some other model called "OldModel" (if Model is the name of the existing model) and then migrating the data live.
There is a project to help with migrations that I've heard of: South.
Having said that, I admit we've not used it. We still plan our migrations using a file of SQL statements. Madness, I know, but it has the advantage of testability. You can run it as many times as necessary during development and staging testing before the "big deploy". It can be source controlled, diffed, etc. It can also, therefore, be called from a larger deployment script. Of course, we back up production before running it :-)
If your database does journaling, using the old-fashioned method has the added advantage that there is a transaction history that can be rolled back.
Experiments we've run with JSON, XML and "OldModel" -> "NewModel" style dumps have scaled pretty poorly. Mind you, YMMV... we have quite a large database. By using a script, you can run on your production database without having to offload or reload vast amounts of data. This way even a complicated migration can take seconds, rather than hours.
There are around 5 or 6 tools to help automate some portion of migrations. Several of them are listed in this question and I'll add the others just for completeness.
Next, see S. Lott's answer to this question about migration workflows for a great idea on using version numbers in the model name to make migrations easier, including structuring a standalone script to properly convert the tables. To my mind this is vastly superior to serializing the data for export and then trying to build your new tables by importing.
Finally, I haven't been able to think of a way to do a hot migration properly and haven't seen any hints from anywhere else either, so maintenance downtime is inevitable.
Make all migrations in steps!
If you need to add a field, go ahead and add it, with a default value or being optional. This is safe.
If you need to make an existing optional field required, give it a default first.
If you need to make an existing field with a default not have a default, drop the default after fixing all the code that creates instances.
If you need to change the type of a field, add a new field that inherits the value from the current one, first. Then, run a script to update the existing instances to populate the new field. Thirdly, Remove all the code that uses the old field to use the new one. Finally, which no code is left using the original, you can drop it.
For every situation there is a small step you can make. For every bigger change, you can break it down into little ones. This is one place iterative development pays off. Keep good backups in place and don't be afraid to push often! Make the small changes quickly to see if they work.
If you are more comfortable with the Django ORM than with raw SQL, you might consider using Model -> BackupModel -> TestModel -> Model, where all but the last step can be performed without dropping data.
def backup(InModel,OutModel):
in_objs = InModel.objects.all()
for obj in in_objs:
out_obj = OutModel.convert_from(InModel,obj)
out_obj.save()
Here, you would just make sure that all your models have convert_from methods implemented. These should all be trivial conversions except for BackupModel -> TestModel. In the other cases, nothing but the class would change, all data being identically preserved.
The advantage to this is that before you go rewriting all your interfaces, you can play around with TestModel and make sure that your conversions were what you thought they'd be. If everything goes wrong, you convert from BackupModel->Model, and everything is okay. In a worst-case scenario, you give up on Django's ORM, run back to SQL, and simply rename all your tables that begin with backupmodel__* to model__* in your database.
Disclaimer: I've never done this.

Django: How can I protect against concurrent modification of database entries

If there a way to protect against concurrent modifications of the same data base entry by two or more users?
It would be acceptable to show an error message to the user performing the second commit/save operation, but data should not be silently overwritten.
I think locking the entry is not an option, as a user might use the "Back" button or simply close his browser, leaving the lock for ever.
This is how I do optimistic locking in Django:
updated = Entry.objects.filter(Q(id=e.id) && Q(version=e.version))\
.update(updated_field=new_value, version=e.version+1)
if not updated:
raise ConcurrentModificationException()
The code listed above can be implemented as a method in Custom Manager.
I am making the following assumptions:
filter().update() will result in a single database query because filter is lazy
a database query is atomic
These assumptions are enough to ensure that no one else has updated the entry before. If multiple rows are updated this way you should use transactions.
WARNING Django Doc:
Be aware that the update() method is
converted directly to an SQL
statement. It is a bulk operation for
direct updates. It doesn't run any
save() methods on your models, or emit
the pre_save or post_save signals
This question is a bit old and my answer a bit late, but after what I understand this has been fixed in Django 1.4 using:
select_for_update(nowait=True)
see the docs
Returns a queryset that will lock rows until the end of the transaction, generating a SELECT ... FOR UPDATE SQL statement on supported databases.
Usually, if another transaction has already acquired a lock on one of the selected rows, the query will block until the lock is released. If this is not the behavior you want, call select_for_update(nowait=True). This will make the call non-blocking. If a conflicting lock is already acquired by another transaction, DatabaseError will be raised when the queryset is evaluated.
Of course this will only work if the back-end support the "select for update" feature, which for example sqlite doesn't. Unfortunately: nowait=True is not supported by MySql, there you have to use: nowait=False, which will only block until the lock is released.
Actually, transactions don't help you much here ... unless you want to have transactions running over multiple HTTP requests (which you most probably don't want).
What we usually use in those cases is "Optimistic Locking". The Django ORM doesn't support that as far as I know. But there has been some discussion about adding this feature.
So you are on your own. Basically, what you should do is add a "version" field to your model and pass it to the user as a hidden field. The normal cycle for an update is :
read the data and show it to the user
user modify data
user post the data
the app saves it back in the database.
To implement optimistic locking, when you save the data, you check if the version that you got back from the user is the same as the one in the database, and then update the database and increment the version. If they are not, it means that there has been a change since the data was loaded.
You can do that with a single SQL call with something like :
UPDATE ... WHERE version = 'version_from_user';
This call will update the database only if the version is still the same.
Django 1.11 has three convenient options to handle this situation depending on your business logic requirements:
Something.objects.select_for_update() will block until the model become free
Something.objects.select_for_update(nowait=True) and catch DatabaseError if the model is currently locked for update
Something.objects.select_for_update(skip_locked=True) will not return the objects that are currently locked
In my application, which has both interactive and batch workflows on various models, I found these three options to solve most of my concurrent processing scenarios.
The "waiting" select_for_update is very convenient in sequential batch processes - I want them all to execute, but let them take their time. The nowait is used when an user wants to modify an object that is currently locked for update - I will just tell them it's being modified at this moment.
The skip_locked is useful for another type of update, when users can trigger a rescan of an object - and I don't care who triggers it, as long as it's triggered, so skip_locked allows me to silently skip the duplicated triggers.
For future reference, check out https://github.com/RobCombs/django-locking. It does locking in a way that doesn't leave everlasting locks, by a mixture of javascript unlocking when the user leaves the page, and lock timeouts (e.g. in case the user's browser crashes). The documentation is pretty complete.
You should probably use the django transaction middleware at least, even regardless of this problem.
As to your actual problem of having multiple users editing the same data... yes, use locking. OR:
Check what version a user is updating against (do this securely, so users can't simply hack the system to say they were updating the latest copy!), and only update if that version is current. Otherwise, send the user back a new page with the original version they were editing, their submitted version, and the new version(s) written by others. Ask them to merge the changes into one, completely up-to-date version. You might try to auto-merge these using a toolset like diff+patch, but you'll need to have the manual merge method working for failure cases anyway, so start with that. Also, you'll need to preserve version history, and allow admins to revert changes, in case someone unintentionally or intentionally messes up the merge. But you should probably have that anyway.
There's very likely a django app/library that does most of this for you.
Another thing to look for is the word "atomic". An atomic operation means that your database change will either happen successfully, or fail obviously. A quick search shows this question asking about atomic operations in Django.
The idea above
updated = Entry.objects.filter(Q(id=e.id) && Q(version=e.version))\
.update(updated_field=new_value, version=e.version+1)
if not updated:
raise ConcurrentModificationException()
looks great and should work fine even without serializable transactions.
The problem is how to augment the deafult .save() behavior as to not have to do manual plumbing to call the .update() method.
I looked at the Custom Manager idea.
My plan is to override the Manager _update method that is called by Model.save_base() to perform the update.
This is the current code in Django 1.3
def _update(self, values, **kwargs):
return self.get_query_set()._update(values, **kwargs)
What needs to be done IMHO is something like:
def _update(self, values, **kwargs):
#TODO Get version field value
v = self.get_version_field_value(values[0])
return self.get_query_set().filter(Q(version=v))._update(values, **kwargs)
Similar thing needs to happen on delete. However delete is a bit more difficult as Django is implementing quite some voodoo in this area through django.db.models.deletion.Collector.
It is weird that modren tool like Django lacks guidance for Optimictic Concurency Control.
I will update this post when I solve the riddle. Hopefully solution will be in a nice pythonic way that does not involve tons of coding, weird views, skipping essential pieces of Django etc.
To be safe the database needs to support transactions.
If the fields is "free-form" e.g. text etc. and you need to allow several users to be able to edit the same fields (you can't have single user ownership to the data), you could store the original data in a variable.
When the user committs, check if the input data has changed from the original data (if not, you don't need to bother the DB by rewriting old data),
if the original data compared to the current data in the db is the same you can save, if it has changed you can show the user the difference and ask the user what to do.
If the fields is numbers e.g. account balance, number of items in a store etc., you can handle it more automatically if you calculate the difference between the original value (stored when the user started filling out the form) and the new value you can start a transaction read the current value and add the difference, then end transaction. If you can't have negative values, you should abort the transaction if the result is negative, and tell the user.
I don't know django, so I can't give you teh cod3s.. ;)
From here:
How to prevent overwriting an object someone else has modified
I'm assuming that the timestamp will be held as a hidden field in the form you're trying to save the details of.
def save(self):
if(self.id):
foo = Foo.objects.get(pk=self.id)
if(foo.timestamp > self.timestamp):
raise Exception, "trying to save outdated Foo"
super(Foo, self).save()