Django: 2 connections to default DB? - django

In a long-running management command I'd like to have two connections to the same DB. One connection will hold a transaction to lock a certain row (select for update) and the other connection will record some processing info. If the process crashes, a new run of the management command can use the processing info to skip/simplify some processing steps, hence the need to record it in a different connection.
How do I go about creating a 2nd connection in the same thread? My first thought was to add a default2 entry to DATABASES with the same connection info as default and use .using("default2") in one of the queries, but wasn't sure if that would cause issues in Django

Related

SQLite with in-memory and isolation

I want to create an in-memory SQLite DB. I would like to make two connections to this in-memory DB, one to make modifications and the other to read the DB. The modifier connection would open a transaction and continue to make modifications to the DB until a specific event occurs, at which point it would commit the transaction. The other connection would run SELECT queries reading the DB. I do not want the changes that are being made by the modifier connection to be visible to the reader connection until the modifier has committed (the specified event has occurred). I would like to isolate the reader's connection to the writer's connection.
I am writing my application in C++. I have tried opening two connections like the following:
int rc1 = sqlite3_open_v2("file:db1?mode=memory", pModifyDb, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE | SQLITE_OPEN_FULLMUTEX | SQLITE_OPEN_URI, NULL);
int rc2 = sqlite3_open_v2("file:db1?mode=memory", pReaderDb, SQLITE_OPEN_READONLY | SQLITE_OPEN_FULLMUTEX | SQLITE_OPEN_URI, NULL);
I have created a table, added some rows and committed the transaction to the DB using 'pModifyDb'. When I try to retrieve the values using the second connection 'pReaderDb' by calling sqlite3_exec(), I receive a return code of 1 (SQLITE_ERROR).
I've tried specifying the URI as "file:db1?mode=memory&cache=shared". I am not sure if the 'cache=shared' option would preserve isolation anymore. But that did not work either when the reader connection is trying to exec a SELECT query the return code was 6 (SQLITE_LOCKED). Maybe because the shared cache option unified both the connections under the hood?
If I remove the in-memory requirement from the URI, by using "file:db1" instead, everything works fine. I do not want to use file-based DB as I require high throughput and the size of the DB won't be very large (~10MB).
So I would like to know how to set up two isolated connections to a single SQLite in-memory DB?
Thanks in advance,
kris
This is not possible with an in-memory DB.
You have to use a database file.
To speed it up, put it on a RAM disk (if possible), and disable synchronous writes (PRAGMA synchronous=off) in every connection.
To allow a reader and a writer at the same time, you have to put the DB file into WAL mode.
This is seems possible since version 3.7.13 (2012-06-11):
Enabling shared-cache for an in-memory database allows two or more database connections in the same process to have access to the same in-memory database. An in-memory database in shared cache is automatically deleted and memory is reclaimed when the last connection to that database closes.
Docs

mysql lost connection error

Currently, I am working on a project to integrate mysql with the IOCP server to collect sensor data and verify the collected data from the client.
However, there is a situation where mysql misses a connection.
The query itself is a simple query that inserts a single row of records or gets the average value between date intervals.
The data of each sensor flows into the DB at the same time every 5 seconds. When the messages of the sensors come on occasionally or overlap with the message of the client, the connection is disconnected.
lost connection to mysql server during query
In relation to throwing the above message
max_allowed_packet Numbers changed.
interactive_timeout, net_read_timeout, net_write_timeout, wait_timeout
It seems that if there are overlapping queries, an error occurs.
Please let me know if you know the solution.
I had a similar issue in a MySQL server with very simple queries where the number of concurrent queries were high. I had to disable the query cache to solve the issue. You could try disabling the query cache using following statements.
SET GLOBAL query_cache_size = 0;
SET GLOBAL query_cache_type = 0;
Please note that a server restart will enable the query cache again. Please put the configuration in MySQL configuration file if you need to have it preserved.
Can you run below command and check the current timeouts?
SHOW VARIABLES LIKE '%timeout';
You can change the timeout, if needed -
SET GLOBAL <timeout_variable>=<value>;

Caching in Djangos object model

I'm running a system with a few workers that's taking jobs from a message queue, all using Djangos ORM.
In one case I'm actually passing a message along from one worker to another in another queue.
It works like this:
Worker1 in queue1 creates an object (MySQL INSERT) and pushes a message to queue2
Worker2 accepts the new message in queue2 and retrieves the object (MySQL SELECT), using Djangos objects.get(pk=object_id)
This works for the first message. But in the second message worker 2 always fails on that it can't find object with id object_id (with Django exception DoesNotExist).
This works seamlessly in my local setup with Django 1.2.3 and MySQL 5.1.66, the problem occurs only in my test environment which runs Django 1.3.1 and MySQL 5.5.29.
If I restart worker2 every time before worker1 pushes a message, it works fine. This makes me believe there's some kind of caching going on.
Is there any caching involved in Django's objects.get() that differs between these versions? If that's the case, can I clear it in some way?
The issue is likely related to the use of MySQL transactions. On the sender's site, the transaction must be committed to the database before notifying the receiver of an item to read. On the receiver's side, the transaction level used for a session must be set such that the new data becomes visible in the session after the sender's commit.
By default, MySQL uses the REPEATABLE READ isolation level. This poses problems where there are more than one process reading/writing to the database. One possible solution is to set the isolation level in the Django settings.py file using a DATABASES option like the following:
'OPTIONS': {'init_command': 'SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED'},
Note however that changing the transaction isolation level may have other side effects, especially when using statement based replication.
The following links provide more useful information:
How do I force Django to ignore any caches and reload data?
Django ticket#13906

Django Celery FIFO

So I have this 2 applications connected with a REST API (json messages). One written in Django and the other in Php. I have an exact database replica on both sides (using mysql).
When i press "submit" on one of them, i want that data to be saved on the current app database, and start a cron job with celery/redis to update the remote database for the other app using rest.
My question is, how do i attribute the same worker to my tasks in order to keep a FIFO order?
I need my data to be consistent and FIFO is really important.
Ok i am going to detail what i want to do a little further:
So i have this django app, and when i press submit after i fill in the form my celery worker wakes up and takes care of taking that submitted data and posting to a remote server. This i can do without problems.
Now, imagine that my internet goes down at that exact time, my celery worker keeps retrying to send until it is successful But imagine i do another submit before my previous data is submitted, my data wont be consistent on the other remote server.
Now that is my problem. I am not able to make this requests FIFO with the retry option given by celery so i that's were i need some help figuring that out.
this is the answer i got from another forum:
Use named queues with celery:
http://docs.celeryproject.org/en/latest/userguide/workers.html#queues
Start a worker process with a single worker:
http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html#starting-the-worker-process
Set this worker to consume from the appropriate queue:
http://docs.celeryproject.org/en/latest/userguide/workers.html#queues-adding-consumers
For the fifo part i can sort my celery broker in a fifo order before sending my requests

Using Django ORM in threads and avoiding "too many clients" exception by using BoundedSemaphore

I work on manage.py command which creates about 200 threads to check remote hosts. My database setup allows me to use 120 connections, so I need to use some kind of pooling. I've tried using separated thread, like this
class Pool(Thread):
def __init__(self):
Thread.__init__(self)
self.semaphore = threading.BoundedSemaphore(10)
def give(self, trackers):
self.semaphore.acquire()
data = ... some ORM (not lazy, query triggered here) ...
self.semaphore.release()
return data
I pass instance of this object to every check-thread but still getting "OperationalError: FATAL: sorry, too many clients already" inside Pool object after init-ing 120 threads .
I've expected that only 10 database connections will be opened and threads will wait for free semaphore slot. I can check that semaphore works by commenting "release()", in that case only 10 threads will work and other will wait till app termination.
As much as I understand, every thread is opening new connection to database even if actual call is inside different thread, but why? Is there any way to perform all database queries inside only one thread?
Django's ORM manages database connections in thread-local variables. So each different thread accessing the ORM will create its own connection. You can see that in the first few lines of django/db/backends/__init__.py.
If you want to limit the number of database connections made, you must limit the number of different threads that actually access the ORM. A solution could be to implement a service that delegates ORM requests to a pool of dedicated ORM threads. To transmit the requests and their results from and to other threads you will have to implement some sort of message passing mechanism. Since this is a typical producer/consumer problem, the Python docs about threading should give some hints how to achieve this.
Edit: I've just googled for "django connection pooling". There are many people who complain that Django does not provide a proper connection pool. Some of them managed to integrate a separate pooling package. For PostgreSQL, I would take a look at the pgpool middleware.