we are planning to use SQLite in our project , and bit confused with the concurrency model of it (because i read many different views on this in the community). so i am putting down my questions hoping to clear off my apprehension.
we are planning to prepare all my statements during the application startup with multiple connection for reads and one connection for write.so basically with this we are create connection and prepare the statement in one thread and use another thread for binding and executing the statement.
i am using C APIs on windows and Linux.
Creating connection on one thread and using it in another . does this pose any problem.
should i use "Shared cache Mode".
i am thinking of using one lock to synchronize between reads and writes and there would not be any sync between Reads. should i sync between reads as well?
does concurrent multiple read work on same connection
does concurrent multiple read work on different connection
EDIT : one more question , how to validate the connection i,e we are opening the connection at the application startup ,the same connection will be used till the application exits, so in this process, how do we validate the connection before using it
Thanks in Advance
Regards
DEE
1) I do not think SQLite uses any thread specific data, therefore creating a connection on one thread and using on another should be fine (They say its so for version 3.5 onwards)
2) I don't think it will have any significant performance benefit to use shared cache mode, experiment and see, it takes only a single statement to enable it per thread
3) You need to use a Single-Writer-Multiple-Reader kind of lock, using simple locks will serialize all reads and writes and nullify any performance benefits of using multiple threads.
4 & 5) Any read operation should work concurrently without any problem.
SQL Lite faq covers threading in detail. Specific FAQ on threads As of 3.3.1 it is safe to do what you say, under certain conditions (see FAQ).
Related
From SQLite compile time options:
... SQLITE_THREADSAFE=1 sets the default threading mode to Serialized. SQLITE_THREADSAFE=2 sets the default threading mode to Multi-threaded ...
It further states:
Multi-thread. In this mode, SQLite can be safely used by multiple
threads provided that no single database connection is used
simultaneously in two or more threads.
Serialized. In serialized mode, SQLite can be safely used by multiple
threads with no restriction.
It's not clear what is the use of "Multi-thread" (=2), if "Serialized" (=1) is capable of doing it without restrictions. The literal meanings of these 2 quoted terms are also not very clear.
Is the single DB connection in multiple threads not allowed for =2 option or =1 as well? Is that an undefined behaviour if used?
The reason for the second question is that, I have a requirement where several DB files are opened at the same time. They are being read in worker thread and written in a single DB thread. If I create 2 connections for each DB file, then soon the file descriptor limit can get exhausted for an OS.
Though we haven't faced any major problem, recently we came across a situation where the SQLite was accessed simultaneously from both the worker and DB threads. A long delay of 20 sec blocked the worker thread. This issue reproduces consistently.
This lead me to believe that, threading could be an issue. In my setup, the default =1 (Serialized) option is set at the compile time.
Clarifications:
Environment: Using Qt/C++. For threading we use QThreads. IMO, this may not impact this behaviour
Threading: There is a main thread, "database" thread and 4 worker threads. Every user sits on a particular worker thread for its socket connection. However their DBs are always on the common "database" thread
DB connections: There can be hundreds of different DBs opened at a time depending on number of users connected to server. Since every OS has a limit of how many files can be opened at a time, I use 1 connection per DB file.
Connection sharing: Every user's DB connection is shared between its worker thread for reading (SELECT) and the common DB thread for writing (INSERT/DELETE/UPDATE). I assumed that for =1, the connection can probably be shared.
Suspicion: There is 1 table which has 10k+ rows and it also contains huge data in its columns. Total DB size goes upto 300-400 MBs mainly due to this. When a SELECT is invoked on this particular row based on its "id" field (30 character string). The first time, it takes upto 20 sec. The next time, it's few milliseconds
Don't remove the C++ tag.
Well I'm no SQLite expert, but the following sounds quite clear to me:
Multi-thread. In this mode, SQLite can be safely used by multiple threads provided that no single database connection is used simultaneously in two or more threads.
Serialized. In serialized mode, SQLite can be safely used by multiple threads with no restriction.
From my understanding this means:
If you use SQLITE_THREADSAFE=2 (multithreaded) you have to make sure that each thread uses its own database connection. Sharing a single database connecgtion amongst multiple threads isn't safe.
If you use SQLITE_THREADSAFE=1 (serialized) you can even safely reuse a single databse connection amongst multiple threads.
I currently have two different processes, one process writes into a database and the other one reads and updates the records in the database that were inserted by the first process. Every time I try to start both processes concurrently the database gets locked, makes sense. Due to simultaneous read and write. I came across WAL, but haven't been able to come across any example as to how to enable WAL in a c++ code. Any help would be appreciated. Thanks.
To activate it, as per the documentation you just need to run the following statement:
PRAGMA journal_mode=WAL
This is persistent and only needs to be applied once.
This question already has answers here:
Multiple SQLite database instances open at the same time on different Threads (QT)
(2 answers)
Closed 7 years ago.
I am writing a QT application which monitors some statistics. In the main window you choose multiple (or just one) items and open a graph window for them. Each item is polled from a different thread.
Every time I get data, it is written in a SQLite database, but I have encountered a problem:
I am making this application on a computer with a SSD drive and it runs OK, but when I run it on a computer with a HDD the application crashes (the crash happens in the QT sqlite dll file - qsqlite.dll). I googled the problem and have found many people saying that sqlite is not recommended for such usage. I have also used QMutex to lock/unlock the sections when I write to the DB, but unfortunately it does not fix the problem.
Is there a way that I can use sqlite for this or must I look for a different database such as postgres ?
Thank you for your time !
I've never been a fan of multiple database connections within one application.
In your situation, I would look to implement a queue (FIFO) for all the write-SQL statements and send them to the database through only one connection. As you will be writing to the queue from several threads you will have to use mutexes etc to protect the write to the queue.
This way, the only thread contention is in your code and the SQLite drivers don't have to work too hard.
I would also consider caching the data from the read-queries so you are only requesting data that you don't have already. That is, request, say, the most recent 100 values and keep them in an array or list, then when you next request values, only request the values from the last one you have in the array.
I have a problem with an sqlite3 db which remains locked/unaccessible after a certain access.
Behaviour occurs so far on Ubuntu 10.4 and on custom (OpenEmbedded) Linux.
The sqlite version is 3.7.7.1). Db is a local file.
One C++-applications accesses the db periodically (5s). Each time several insert statements are done wrapped in a deferred transaction. This happens in one thread only. The connection to the db is held over the whole lifetime of the application. The statements used are also persistent and reused via sqlite3_reset. sqlite_threadsafe is set to 1 (serialized), journaling is set to WAL.
Then I open in parellel the sqlite db with the sqlite command line tool. I enter BEGIN IMMEDIATE;, wait >5s, and commit with END;.
after this the db access of the application fails: the BEGIN TRANSACTION returns return code 1 ("SQL error or missing database"). If I execute an ROLLBACK TRANSACTION right before the begin, just to be sure there is not already an active transaction, it fails with return code 5 ("The database file is locked").
Has anyone an idea how to approach this problem or has an idea what may cause it?
EDIT: There is a workaround: If the described error occures, I close and reopen the db connection. This fixes the problem, but I'm currently at a loss at to why this is so.
Sqlite is a server less database. As far as I know it does not support concurrent access from multiple source by design. You are trying to access the same backing file from both your application and the command tool - so you attempt to perform concurrent access. This is why it is failing.
SQLite connections should only be used from a single thread, as among other things they contain mutexes that are used to ensure correct concurrent access. (Be aware that SQLite also only ever supports a single updating thread at once anyway, and with no concurrent reads at the time; that's a limitation of being a server-less DB.)
Luckily, SQLite connections are relatively cheap when they're not doing anything and the cost of things like cached prepared statements is actually fairly small; open up as many as you need.
[EDIT]:
Moreover, this would explain closing and reopening the connection works: it builds the connection in the new thread (and finalizes all the locks etc. in the old one).
I have two applications running on my machine. One is supposed to hand in the work and other is supposed to do the work. How can I make sure that the first application/process is in wait state. I can verify via the resources its consuming, but that does not guarantee so. What tools should I use?
Your 2 applications shoud communicate. There are a lot of ways to do that:
Send messages through sockets. This way the 2 processes can run on different machines if you use normal network sockets instead of local ones.
If you are using C you can use semaphores with semget/semop/semctl. There should be interfaces for that in other languages.
Named pipes block until there is both a read and a write operation in progress. You can use that for synchronisation.
Signals are also good for this. In C it is called sendmsg/recvmsg.
DBUS can also be used and has bindings for variuos languages.
Update: If you can't modify the processing application then it is harder. You have to rely on some signs that indicate the progress. (I am assuming you processing application reads a file, does some processing then writes the result to an output file.) Do you know the final size the result should be? If so you need to check the size repeatedly (or whenever it changes).
If you don't know the size but you know how the processing works you may be able to use that. For example the processing is done when the output file is closed. You can use strace to see all the system calls including the close. You can replace the close() function with the LD_PRELOAD environment variable (on windows you have to replace dlls). This way you can sort of modify the processing program without actually recompiling or even having access to its source.
you can use named pipes - the first app will read from it but it will be blank and hence it will keep waiting (blocked). The second app will write into it when it wants the first one to continue.
Nothing can guarantee that your application is in waiting state. You have to pass it some work and get back a response. It might be transactions or not - application can confirm that it got the message to process before it starts to process it or after it was processed (successfully or not). If it does not wait, passing a piece of work should fail. Whether when trying to write to a TCP/IP socket or other means, or if timeout occurs. This depends on implementation, what kind of transport you are using and other requirements.
There is actually a way of figuring out if the process (thread) is in blocking state and waiting for data on a socket (or other source), but that means that client should be on the same computer and have access privileges required to do that, but that makes no sense other than debugging, which you can do using any debugger anyway.
Overall, the idea of making sure that application is waiting for data before trying to pass it that data smells bad. Not to mention the racing condition - what if you checked and it was OK, and when you actually tried to send the data, you found out that application is not waiting at that time (even if that is microseconds).