Django processes concurrency - django

I am running a Django app with 2 processes (Apache + mod_wsgi).
When a certain view is called, the content of a folder is read and the process adds entries to my database based on what files are new/updated in the folder.
When 2 such views execute at the same time, both see the new file and both want to create a new entry. I cannot manage to have only one of them write the new entry.
I tried to use select_on_update, with transaction.atomic(), get_or_create, but without any success (maybe used wrongly?).
What is the proper way of locking to avoid writing an entry with the same content twice with get_or_create ?

I ended up enforcing the unicity at the database (model) level, and catching the resulting IntegrityError in the code.

Related

what is the best method to initialize or store a lookup dictionary that will be used in django views

I'm reviving an old django 1.2 app. most of the steps have been taken.
I have views in my django app that will reference a simple dictionary of only 1300ish key-value pairs.
Basically the view will query the dictionary a few hunderd to a few thousand times for user supplied values.The dictionary data may change twice a year or so.
fwiw: django served by gunicorn, db=postgres, apache as proxy, no redis available yet on the server
I thought of a few options here:
a table in the database that will be queried and let caching do its
job (at the expense of a few hundred sql queries)
Simply define the dictionary in the settings file (ugly, and how many time is it read? Every time you do an 'from django.conf import settings'?
This was the situation how it was coded in the django 1.2 predecessor of this app many years ago
read a tab delimited file using Pandas in the django settings and make this available. the advantage is that I can do some pandas magic in the view. (How efficient is this, will the file be read many times for different users or just once during server startup?)
prepopulate a redis cache from a file as part of the startup process (complicates things on the server side and we want it to be simple, but its fast.
List items in a tab delimited file and read it in in the view (my least popular option since it seems to be rather slow)
What are your thoughts on this? Any other options?
Let me give a few - simple to more involved
Hold it in memory
Basic flat file
Sqlite file
Redis
DB
I wouldn't bring redis in for 1300 kv pairs that don't even get mutated all that much
I would put a file alongside the code that gets slurped in memory at startup or do a single sql query and grab the entire thing at startup and keep it in memory to use throughout the application

How can I add new models and do migrations without restarting the server manually?

For the app I'm building I need to be able to create a new data model in models.py as fast as possible automatically.
I created a way to do this by making a seperate python program that opens models.py, edits it, closes it, and does server migrations automatically but there must be a better way.
edit: my method works on my local server but not on pythonanywhere
In the Django documentation, I found SchemaEditor, which is exactly what you want. Using the SchemaEditor, you can create Models, delete Models, add fields, delete fields etc..
Here's an excerpt:
Django’s migration system is split into two parts; the logic for
calculating and storing what operations should be run
(django.db.migrations), and the database abstraction layer that turns
things like “create a model” or “delete a field” into SQL - which is
the job of the SchemaEditor.
Don't rewrite your models.py file automatically, that is not how it's meant to work. When you need more flexibility in the way you store data, you should do the following:
think hard about what kind of data you want to store and make your data model more abstract to fit more cases, if needed.
Use JSON fields to store arbitrary JSON data with your model (e.g. for the Postgres database)
if it's not a fit, don't use Django's ORM and use a different store (e.g. Redis for key-value or MongoDB for JSON documents)

Django: Can I run syncdb on user demand? (to keep model in sync with file system)

I have a Django model that represents data files on a server, with some metadata about each file. These files are generated by an instrument and can appear at any time throughout the day. I would like the Django table to reflect the files that are actually available for the user to select.
Here is what I have so far:
I have a python script that scans the directory, produces an intial_data.json file and puts it in the app/fixtures directory. (The script pulls out important metadata from each file to make it easy for the user to make selections.)
I have fixtures working so that when I run syncdb, it loads the data into the model.
My question is, how do I do this repeatedly (hourly? on-demand? -- for example, triggered by clicking a button on the page?)
My impression is that syncdb is only meant to be run occasionally, like, for a data migration. Am I wrong - can I run it "at the click of a button"?
Is there a better way of keeping my table in sync with the file system? I have considered using FileField or FilePathField but these seem not workable, because I want to pre-load the table with the file metadata.
I don't understand why you want you use syncdb for this. That's really for creating tables. If all you're doing is loading a fixture, why don't you do that directly? You can use django.core.serializers to parse and load your JSON file (and I'd recommend calling it something other than initial_data).

Django - logging each view's action

I'm thinking of creating a log system for my django web application. The web application is quite comprehensive in its use (covers all aspects of a business's processes) so I'd like to track every event that happens. Specifically, I'd like to log every view that runs and not just the "main" ones and, potentially, log what is happening within the view as its executed.
While I'm in the "idea" stage of the logging system, I've quickly hit a few questions that leave me unsure how to proceed. Here are the main questions I have:
I'm thinking of logging all of the events in the same MySQL database that the main web app holds its data. The concern I have is bloating the MySQL database into a massive DB. Also, if the DB crashes or is destroyed somehow (yes I have backups) I'll loose my log too which blows away any ability to track down the problem. Do I use a seperate DB or just go with text files?
How granular do I go? Initially I was thinking of simply logging things like, "Date - In view myView". However, as I'm thinking about it, it would be nice to log all the stuff that happens within the view. Doing this could make the log massive! and would also make my code ugly with so many log entry lines mixed into the code. This kind of detail:
Date - entered view myView
Date - in view myView, retrieved object myObject from the DB
Date - in view myView, setting myObject field myField to myNewValue
Date - leaving myView
Those are my main thoughts at this point. Any advice on this front?
Thanks
I think the best and right way is to create your own custom middleware where you can log actually everything you need.
Here are some links on the subject:
middleware snippets
http://djangosnippets.org/snippets/2624/
http://djangosnippets.org/snippets/290/
http://djangosnippets.org/snippets/264/
django-logging-middleware (pretty old by may give you an idea)
django-request
django.db.backends logging
Is there a Django middleware/plugin that logs all my requests in a organized fashion?
Django verbose request logging
log all sql queries
django orm, how to view (or log) the executed query?
Also, consider using sentry error logging and aggregation platform instead of writing logs into the database. FYI, see using a database for logging.
If you want to log any action run in every view, you can for example, replace entered view A and exited view A by a line in these words: view A - 147ms.
As alecxe stated, you can log requests/SQL, there are plenty of ways to do it with middleware. About database (object) actions, you can tie individual saves, updates and deletes with signals.
For bulk updates and deletes, you could (it's not a clean way but it would work) monkey-patch manager and queryset methods to add logging.
This way you can log actions rather than SQL.
I would see lines like this:
[2013/09/11 15:11:12.0153] view app.module.view 200 148ms
[2013/09/11 15:11:12.0189] orm save:auth.User,id=1 3ms
This is a quick and dirty proposal, but, maybe it's worth it.

Django: Handling an uploaded SQLite file

I'm working on a simple Django application in which the user upload a SQLite file; the data is read and added to the main database (PostgreSQL).
My idea is to use two databases, one for the main application and the other to manage the uploaded file (the structure is always the same so I can create models for it).
What do you think about this solution? Is it possible to dynamically change the settings.py file for the second database so I can modify the path and easily read data inside it?
Thanks!
Django supports multiple databases in one project, you can set up postgres as default DB and sqlite as secondary (just for upload).