How to access MySQL from multiple threads concurrently - c++

We're doing a small benchmark of MySQL where we want to see how it performs for our data.
Part of that test is to see how it works when multiple concurrent threads hammers the server with various queries.
The MySQL documentation (5.0) isn't really clear about multi threaded clients. I should point out that I do link against the thread safe library (libmysqlclient_r.so)
I'm using prepared statements and do both read (SELECT) and write (UPDATE, INSERT, DELETE).
Should I open one connection per thread? And if so: how do I even do this.. it seems mysql_real_connect() returns the original DB handle which I got when I called mysql_init())
If not: how do I make sure results and methods such as mysql_affected_rows returns the correct value instead of colliding with other thread's calls (mutex/locks could work, but it feels wrong)

As maintainer of a fairly large C application that makes MySQL calls from multiple threads, I can say I've had no problems with simply making a new connection in each thread. Some caveats that I've come across:
Edit: it seems this bullet only applies to versions < 5.5; see this page for your appropriate version: Like you say you're already doing, link against libmysqlclient_r.
Call mysql_library_init() (once, from main()). Read the docs about use in multithreaded environments to see why it's necessary.
Make a new MYSQL structure using mysql_init() in each thread. This has the side effect of calling mysql_thread_init() for you. mysql_real_connect() as usual inside each thread, with its thread-specific MYSQL struct.
If you're creating/destroying lots of threads, you'll want to use mysql_thread_end() at the end of each thread (and mysql_library_end() at the end of main()). It's good practice anyway.
Basically, don't share MYSQL structs or anything created specific to that struct (i.e. MYSQL_STMTs) and it'll work as you expect.
This seems like less work than making a connection pool to me.

You could create a connection pool. Each thread that needs a connection could request a free one from the pool. If there's no connection available then you either block, or grow the pool by adding a new connection to it.
There's an article here describing the pro's and cons of a connection pool (though it is java based)
Edit: Here's a SO question / answer about connection pools in C
Edit2: Here's a link to a sample Connection Pool for MySQL written in C++. (you should probably ignore the goto statements when you implement your own.)

Seems clear to me from the mySQL Docs that any specific MYSQL structure can be used in a thread without difficulty - using the same MYSQL structure in different threads simultaneously is clearly going to give you extremely unpredictable results as state is stored within the MYSQL connection.
Thus either create a connection per thread or used a pool of connections as suggested above and protect access to that pool (i.e. reserving or releasing a connection) using some kind of Mutex.

MySQL Threaded Clients in C
It states that mysql_real_connect() is not thread safe by default. The client library needs to be compiled for threaded access.

Related

SQLite - multithread read and write in C++

I have C++11 app, that uses multiple threads. Each thread can read or write database, eg. doing INSERT, UPDATE, DELETE, SELECT.
I have enabled serialized mode for SQLite, so connection can be shared between threads.
However, I dont know, how to run queries. Can I just run single query and create statement via sqlite3_prepare_v2? Or, should I add my own locks via std::lock_guard<std::mutex> and do something like:
Thread #1
db.lock()
db.query("....").execute()
db.unlock()
Thread #2
db.lock()
res = db.query("....").select()
while(res) res.row()
db.unlock()
Or is there any other way? I have been looking for some sample code, but found nothing.
In the Serialized mode you don't need manual locking to execute queries. According to the SQLite documentation:
In this mode (which is the default when SQLite is compiled with SQLITE_THREADSAFE=1) the SQLite library will itself serialize access to database connections and prepared statements so that the application is free to use the same database connection or the same prepared statement in different threads at the same time.
You can also consider to use WAL to get more concurrency. In this mode reading and writing can proceed concurrently. Include journal mode=WAL in the connection string to enable it.
"Thread-safe" means that individual function calls are safe. However, when function calls from multiple threads are interleaved, this does not prevent one thread from modifying data that another thread is currently reading with multiple steps; this can result in nonsensical data.
You have to ensure that multiple threads do not attempt to use the same connection object for different transactions at the same time. Either use your own lock (as shown in the code), or use separate connection objects for each thread.

Managing sqlite database in a multithreaded environment

When using the sqlite C++ library, I can use sqlite3_open_v2 to open a database. This will produce an handle to the database, and a pointer to that handle will be set.
Using that pointer, I can call sqlite3_prepare_v2 to prepare a sqlite statement, then I can use sqlite3_step to step through the results of the query.
Now, I am working in an environment where I have several multiple threads that continuously get created and destroyed (it is a server application that spawns new threads to serve incoming, possibly concurrent connections). Now, as far as my understanding goes, I should be creating new handles to the same database with a call to sqlite3_open_v2 every time a new thread is created. However, this adds a significant computational overhead since it can take a while to create a new connection to the database and I need to handle a lot of connections.
So I was wondering if there was a more efficient way to achieve this. Is there a way, for example, to just mutex everything to solve my problems? I can mutex my calls to the only connection object I have: this serializes my communications with the database.
Would this work? Or is there a reason why I can't use the same connection object from several different threads, even if I avoid any form of concurrency?
And if this can work, should I just serialize my calls to sqlite3_prepare_v2, or my first call to sqlite3_step, or all my calls to sqlite3_step? I mean: when I call step for the first time, all the results get loaded or communication with the actual database file takes place every time I call step?
The difference would be between mutexing only the call to prepare, and locking everything until I have finished stepping through the results.
Is something like this feasible, should I just create new connections to the database every time and let sqlite handle all of the concurrency, or am I missing something important that trivially solves my problem?
You can just let sqlite3 handle all of this for you and by default it should. The sqlite3 libraries should use SQLITE_THREADSAFE=1 by default (empahsis mine):
SQLITE_THREADSAFE=<0 or 1 or 2>
This option controls whether or not code is included in SQLite to enable it to operate safely in a multithreaded environment. The default is SQLITE_THREADSAFE=1 which is safe for use in a multithreaded environment.
And SQL_CONFIG_SERIALIZED should also be used by default also (emphasis mine):
SQLITE_CONFIG_SERIALIZED
There are no arguments to this option. This option sets the threading mode to Serialized. In other words, this option enables all mutexes including the recursive mutexes on database connection and prepared statement objects. In this mode (which is the default when SQLite is compiled with SQLITE_THREADSAFE=1) the SQLite library will itself serialize access to database connections and prepared statements so that the application is free to use the same database connection or the same prepared statement in different threads at the same time.
However, you can also change it yourself with a call to sqlite3_config before initialisation:
sqlite3_config(SQL_CONFIG_SERIALIZED);
You should then be able to open your database using SQLITE_OPEN_FULLMUTEX:
sqlite3* pDatabase;
sqlite3_open_v2("MyDatabase.db", &pDatabase, SQLITE_OPEN_FULLMUTEX, nullptr);
You can also use a std::mutex to prevent access to your sqlite3 calls, but this shouldn't be necessary since sqlite3 handles it for you (but if for some reason you have built the libraries differently for some reason, this would be viable).
I think you should check whether you call sqlite3_config() function after sqlite3_initialize(). If then, the function sqlite3_config() returns SQLITE_MISUSE.
Here is some part of explanations about sqlite3-config() API concerned with the error code, SQLITE_MISUSE.
The sqlite3_config() interface may only be invoked prior to library initialization using sqlite3_initialize() or after shutdown by sqlite3_shutdown(). If sqlite3_config() is called after sqlite3_initialize() and before sqlite3_shutdown() then it will return SQLITE_MISUSE. Note, however, that sqlite3_config() can be called as part of the implementation of an application-defined sqlite3_os_init().
Source: http://www.sqlite.org/c3ref/config.html

MultiThreaded Libcurl

I need to execute parallel HTTP requests using Libcurl.
From what I understand I need to create a new handle for each thread
and use CURLOPT_WRITEDATA with some kind of Thread Local Storage.
Does the multi interface makes this task a little easier?
I'm also using cookies, does using CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR will make
Libcurl load the cookie file for each thread?
As you probably know, libcurl is not thread safe, so you should ensure that the libcurl handle is never shared between multiple threads. Each thread will need to store local data (among other things, the connection handle).
From this, it ensues that for each handle, i.e., for each thread, libcurl will need to read the cookie file from scratch, since this information cannot be shared. This is not a problem, in my opinion, although there could be issues when updating it (you will have multiple thread attempting it).
About the multi interface, it allows you to multiplex multiple transfers, so it is another approach to what you are trying to do but in a single thread.
UPDATE March 2013
libcurl is now thread-safe.
libcurl is free, thread-safe, IPv6 compatible, feature rich, well supported, fast, thoroughly documented and is already used by many known, big and successful companies and numerous applications."
This is not a direct answer, but why do you need multithreading for parallel HTTP requests?
The multi interface is designed for this purpose: you add multiple handles and then process all of them with one call, all in the same thread. From the documentation:
Enable multiple simultaneous transfers in the same thread without
making it complicated for the application.
If you want multiple threads, I suggest you use the easy interface in each thread, and forget about the multi interface.
Sharing simply shares data between easy handles, you can use the interface with/without the multi interface. If you do use multiple threads, you have to provide your own locking.
Also check out libcurl share interface. It was designed for this purpose, i.e. to share data between requests:
You can have multiple easy handles share data between them. Have them
update and use the same cookie database, DNS cache, TLS session cache!
This way, each single transfer will take advantage from data updates
made by the other transfer(s). The sharing interface, however, does
not share active or persistent connections between different easy
handles.

Is Perforce's C++ P4API thread-safe?

Simple question - is the C++ API provided by Perforce thread-safe? There is no mention of it in the documentation.
By "thread-safe" I mean for server requests from the client. Obviously there will be issues if I have multiple threads trying to set client names and such on the same connection.
But given a single connection object, can I have multiple threads fetching changelists, getting status, translating files through a p4 map, etc.?
Late answer, but... From the release notes themselves:
Known Limitations
The Perforce client-server protocol is not designed to support
multiple concurrent queries over the same connection. For this
reason, multi-threaded applications using the C++ API or the
derived APIs (P4API.NET, P4Perl, etc.) should ensure that a
separate connection is used for each thread or that only one
thread may use a shared connection at a time.
It does not look like the client object has thread affinity, so in order to share a connection between threads, one just has to use a mutex to serialize the calls.
If the documentation doesn't mention it, then it is not safe.
Making something thread-safe in any sense is often difficult and may result in a performance penalty because of the addition of locks. It wouldn't make sense to go through the trouble and then not mention it in the documentation.

How to use SQLite in a multi-threaded application?

I'm developing an application with SQLite as the database, and am having a little trouble understanding how to go about using it in multiple threads (none of the other Stack Overflow questions really helped me, unfortunately).
My use case: The database has one table, let's call it "A", which has different groups of rows (based on one of their columns). I have the "main thread" of the application which reads the contents from table A. In addition, I decide, once in a while, to update a certain group of rows. To do this, I want to spawn a new thread, delete all the rows of the group, and re-insert them (that's the only way to do it in the context of my app). This might happen to different groups at the same time, so I might have 2+ threads trying to update the database.
I'm using different transactions from each thread, I.E. at the start of every thread's update cycle, I have a begin. In fact, what each thread actually does is call "BEGIN", delete from the database all the rows it needs to "update", and inserts them again with the new values (this is the way it must be done in the context of my application).
Now, I'm trying to understand how I go about implementing this. I've tried reading around (other answers on Stack Overflow, the SQLite site) but I haven't found all the answers. Here are some things I'm wondering about:
Do I need to call "open" and create a new sqlite structure from each thread?
Do I need to add any special code for all of this, or is it enough to spawn different threads, update the rows, and that's fine (since I'm using different transactions)?
I saw something talking about the different lock types there are, and the fact that I might receive "SQLite busy" from calling certain APIs, but honestly I didn't see any reference that completely explained when I need to take all this into account. Do I need to?
If anyone can answer the questions/point me in the direction of a good resource, I'd be very grateful.
UPDATE 1: From all that I've read so far, it seems like you can't have two threads who are going to write to a database file anyway.
See: http://www.sqlite.org/lockingv3.html. In section 3.0: A RESERVED lock means that the process is planning on writing to the database file at some point in the future but that it is currently just reading from the file. Only a single RESERVED lock may be active at one time, though multiple SHARED locks can coexist with a single RESERVED lock.
Does this mean that I may as well only spawn off a single thread to update a group of rows each time? I.e., have some kind of poller thread which decides that I need to update some of the rows, and then creates a new thread to do it, but never more than one at a time? Since it looks like any other thread I create will just get SQLITE_BUSY until the first thread finishes, anyway.
Have I understood things correctly?
BTW, thanks for the answers so far, they've helped a lot.
Some steps when starting out with SQLlite for multithreaded use:
Make sure sqlite is compiled with the multi threaded flag.
You must call open on your sqlite file to create a connection on each thread, don't share connections between threads.
SQLite has a very conservative threading model, when you do a write operation, which includes opening transactions that are about to do an INSERT/UPDATE/DELETE, other threads will be blocked until this operation completes.
If you don't use a transaction, then transactions are implicit, so if you start a INSERT/DELETE/UPDATE, sqlite will try to acquire an exclusive lock, and complete the operation before releasing it.
If you do a BEGIN EXCLUSIVE statement, it will acquire an exclusive lock before doing operations in that transaction. A COMMIT or ROLLBACK will release the lock.
Your sqlite3_step, sqlite3_prepare and some other calls may return SQLITE_BUSY or SQLITE_LOCKED. SQLITE_BUSY usually means that sqlite needs to acquire the lock. The biggest difference between the two return values:
SQLITE_LOCKED: if you get this from a sqlite3_step statement, you MUST call sqlite3_reset on the statement handle. You should only get this on the first call to sqlite3_step, so once reset is called you can actually "retry" your sqlite3_step call. On other operations, it's the same as SQLITE_BUSY
SQLITE_BUSY : There is no need to call sqlite3_reset, just retry your operation after waiting a bit for the lock to be released.
Check out this link. The easiest way is to do the locking yourself, and to avoid sharing the connection between threads. Another good resource can be found here, and it concludes with:
Make sure you're compiling SQLite with -DTHREADSAFE=1.
Make sure that each thread opens the database file and keeps its own sqlite structure.
Make sure you handle the likely possibility that one or more threads collide when they access the db file at the same time: handle SQLITE_BUSY appropriately.
Make sure you enclose within transactions the commands that modify the database file, like INSERT, UPDATE, DELETE, and others.
I realize this is an old thread and the responses are good but I've been looking into this recently and came across an interesting analysis of some different implementations. Mainly it goes over the strengths and weaknesses of connection sharing, message passing, thread-local connections and connection pooling. Take a look at it here: http://dev.yorhel.nl/doc/sqlaccess
Modern versions of SQLite has thread safety enabled by default. SQLITE_THREADSAFE compilation flag controls whether or not code is included in SQLite to enable it to operate safely in a multithreaded environment. Default value is SQLITE_THREADSAFE=1. It means Serialized mode. In this mode:
In this mode (which is the default when SQLite is compiled with SQLITE_THREADSAFE=1) the SQLite library will itself serialize access to database connections and prepared statements so that the application is free to use the same database connection or the same prepared statement in different threads at the same time.
Use sqlite3_threadsafe() function to check Sqlite library SQLITE_THREADSAFE compilation flag.
Default library thread safety behavior can be changed via sqlite3_config(). Use SQLITE_OPEN_NOMUTEX and SQLITE_OPEN_FULLMUTEX flags at sqlite3_open_v2() to adjust the threading mode of individual database connections.
Check this code from the SQLite wiki.
I have done something similar with C and I uploaded the code here.
I hope it's useful.
Summary
Transactions in SQLite are SERIALIZABLE.
Changes made in one database connection are invisible to all other database connections prior to commit.
A query sees all changes that are completed on the same database connection prior to the start of the query, regardless of whether or not those changes have been committed.
If changes occur on the same database connection after a query starts running but before the query completes, then it is undefined whether or not the query will see those changes.
If changes occur on the same database connection after a query starts running but before the query completes, then the query might return a changed row more than once, or it might return a row that was previously deleted.
For the purposes of the previous four items, two database connections that use the same shared cache and which enable PRAGMA read_uncommitted are considered to be the same database connection, not separate database connections.
In addition to the above information on multi-threaded access, it might be worth taking a look at this page on isolation, as many things have changed since this original question and the introduction of the write-ahead log (WAL).
It seems a hybrid approach of having several connections open to the database provides adequate concurrency guarantees, trading off the expense of opening a new connection with the benefit of allowing multi-threaded write transactions.
If you use connection pooling, like in Java EE, web application, set the connection pool max. size to 1. Access will be serialized.