I came across this article
https://medium.com/#hakibenita/how-to-manage-concurrency-in-django-models-b240fed4ee2
Which describes how a request can change a record that another request is currently working with.
Now this article is from 2017 and I haven't found anything about django cobcurrency since.
Also manage.py is single threaded.
Does this mean the issue is now managed by django internally or do I have to manage concurrency manually still when I deploy it with apache?
Requests are usually handled by multiple workers which are in fact different processes. This means that you will be handling simultaneous requests accessing the database so yes, you have to watch for race conditions and the framework will not do that for you.
Related
I am currently writing a django App which will have multiple deployments. Each deployment will have a sub-domain. That is, the same app will be running over different virtualenvs and will share a MySQL database (multiple databases in the same installation) and a Redis Server.
The point where I am stuck is managing the tasks in a manner that they don't get mixed up. I have been reading on this all day and could't find a solution which was accepted as a standard.
Some of the solutions proposed were,
Having separate queues for apps and have a worker listen to each queue.
Having separate database numbers in the CELERY_BROKER_URL.
Regarding solution 2, I couldn't understand how it will work and how the consistency is preserved.
Regarding solution 1, I am worried that a single mistake in the worker-queue mapping could be disastrous to the server as all the databases will have the same schema and would accept any data send even if it is supposed to go to the database of some other app instance.
Please provide some insight on this issue. Thank You!
We're looking into implementing audit logs in our application and we're not sure how to do it correctly.
I know that django-reversion works and works well but there's a cost of using it.
The web server will have to make two roundtrips to the database when saving a record even if the save is in the same transaction because at least in postgres the changes are written to the database and comitting the transaction makes the changes visible.
So this will block the web server until the revision is saved to the database if we're not using async I/O which is currently the case. Even if we would use async I/O generating the revision's data takes CPU time which again blocks the web server from handling other requests.
We can use database triggers instead but our DBA claims that offloading this sort of work to the database will use resources that are meant for handling more transactions.
Is using database triggers for this sort of work a bad idea?
We can scale both the web servers using a load balancer and the database using read/write replicas.
Are there any tradeoffs we're missing here?
What would help us decide?
You need to think about the pattern of db usage in your website.
Which may be unique to you, however most web apps read much more often than they write to the db. In fact it's fairly common to see optimisations done, to help scaling a web app, which trade off more complicated 'save' operations to get faster reads. An example would be denormalisation where some data from related records is copied to the parent record on each save so as to avoid repeatedly doing complicated aggregate/join queries.
This is just an example, but unless you know your specific situation is different I'd say don't worry about doing a bit of extra work on save.
One caveat would be to consider excluding some models from the revisioning system. For example if you are using Django db-backed sessions, the session records are saved on every request. You'd want to avoid doing unnecessary work there.
As for doing it via triggers vs Django app... I think the main considerations here are not to do with performance:
Django app solution is more 'obvious' and 'maintainable'... the app will be in your pip requirements file and Django INSTALLED_APPS, it's obvious to other developers that it's there and working and doesn't need someone to remember to run the custom SQL on the db server when you move to a new server
With a db trigger solution you can be certain it will run whenever a record is changed by any means... whereas with Django app, anyone changing records via a psql console will bypass it. Even in the Django ORM, certain bulk operations bypass the model save method/save signals. Sometimes this is desirable however.
Another thing I'd point out is that your production webserver will be multiprocess/multithreaded... so although, yes, a lengthy db write will block the webserver it will only block the current process. Your webserver will have other processes which are able to server other requests concurrently. So it won't block the whole webserver.
So again, unless you have a pattern of usage where you anticipate a high frequency of concurrent writes to the db, I'd say probably don't worry about it.
Running a Django application on Appengine we need to make a query that returns approx. 450 rows per request including joins M2M prefetch_related and select_related.
When we make many concurrent requests, the query time for each request goes up in a way that all requests end simultaneously.
Running the same concurrent requests on a non-appengine Django installation or in an appengine instance that has threading set to false do not show this behavior.
There is also a slight improvement when the requests are separated to different appengine instances.
Has anyone seen this before?
Sounds like your database backend is too heavily loaded by your query. Have you tried upgrading to a higher tier?
The basic tier only handles 25 concurrent queries. You said "many" in your question, so if "many" > 25 that's the source of your problem:
https://developers.google.com/cloud-sql/pricing
I am relatively new to Django and this is a more general 'concept' question.
For a client I need to construct an expansive database holding data returned from a series of questionnaires as well as some basic biological data. The idea is to move away from the traditional tools (i.e. Microsoft Access) and manage the data in a mysql database using a basic CRUD interface. Initially the project doesn't need to live on the web, but the next phase will to be to have a centralized db with login and admin page.
I have started building the db with Django models which is great, and I want to use the Django admin for the management of the data.
My question is: Is this a good use of Django? Is there anything I should consider before relying on django for the whole process? And is it advisable to us the Django runserver for db admin on a client's local machine (before we get to the web phase).
Any advice would be much appreciated.
Actually, your description sounds exactly like the sort of thing for which Django is an ideal solution. It sounds more complex and customized than a CMS, and if it's as straightforward as your description then the ORM is definitely a good tool for this. Then again, this sounds exactly like an appserver-ready problem, so Rails, Express for Node.js, or even ChicagoBoss (if you're brave) would be good platforms for this kind of application.
And sure, Django is solid enough you can run it with the test server for local clients before you go whole-hog and run the thing on the web. For that, though, I recommend Apache/mod_wsgi, and if you're going to be fault tolerant there are diamond architectures (one front end proxy with monitoring failover, two or more appserver machines, one database with hot spare) and more complex (see: sharding) architectural layouts you can approach later.
If you're going to run it in a client's local setting, and you're not running Windows, I recommend looking into the screen program. It will allow you to detach the running job into the background while making diagnostics accessible in an ongoing fashion.
I am writing a web application with django on the server side. It takes ~4 seconds for server to generate a response to the user. It makes use of a weather api. My application has to make ~50 query to that api for each user request.
Server side uses urllib of python for using the weather api. I used pythons threading to speed up the process because urllib is synchronous. I am using wsgi with apache. The problem is wsgi stack is fully synchronous and when many users use my application, they have to wait for one anothers request to finish. Since each request takes ~4 seconds, this is unacceptable.
I am kind of stuck, what can I do?
Thanks
If you are using mod_wsgi in a multithreaded configuration, or even a multi process configuration, one request should not block another from being able to do something. They should be able to run concurrently. If using a multithreaded configuration, are you sure that you aren't using some locking mechanism on some resource within your own application which precludes requests running through the same section of code? Another possibility is that you have configured Apache MPM and/or mod_wsgi daemon mode poorly so as to preclude concurrent requests.
Anyway, as mentioned in another answer, you are much better off looking at caching strategies to avoid the weather lookups in the first place, or offloading to client.
50 queries to an outside resource per request is probably a bad place to be, and probably not neccesary at all.
The weather doesn't change all that quickly, and so you can probably benefit enormously by just caching results for a while. Then it doesn't matter how many requests you're getting, you don't need to do more than a few queries per day
If that's not your situation, you might be able to get the client to do the work for you. Refactor the code so that the weather api aggregation happens on the client in javascript, rather than funneling it all through the server.
Edit: based on comments you've posted, what you are asking for probably cannot be optimized within the constraints of the API you are using. The problem is that the service is doing a good job of abstracting away the differences in the many sources of weather information they aggregate into a nearest location query. after all, weather stations provide only point data.
If you talk directly to the technical support people that provide the API, you might find that they are willing to support more complex queries (bounding box), for which they will give you instructions. More likely, though, they abstract that away because they don't want to actually reveal the resolution that their API actually provides, or because there is some technical reason in the way that they model their data or perform their calculations that would make such queries too difficult to support.
Without that or caching, you are just out of luck.