Sqlite threaded design - c++

I am creating application which uses sqlite to store some key value pairs.Now I dont want to block the main thread for performing sqlite operations.I have created separate thread for sqlite operations and created a queue for all operations. Main thread tell sqlite thread to do various oprerations. For each sqlite operation sqlite thread create a task and add to its queue .Main loop of sqlite thread takes task from queue and process it.
Now issue is that main thread cannot proceed until its get data from sqlite.So does it makes sense to have separate thread for sqlite operations?
Can I do this is some better way so that my main thread remains unblocked and it can get sqlite data also?

Yes you can. For example, you can have some function getResult() which will return concurrent container which you will use in your main thread. It allows to add new data as long as it is extracted from a DB and on the other side to get data as soon as it is available and do not wait until whole result is ready.
For concurrent containers you can try the following libraries: PPL, TBB or Boost.Lockfree.
Also you can use some event-driven programming by sending events from your work thread to your main thread. Boost.Signals could be of help in such a case.
One more thing: you can use asynchronous programming by using PPL::task, for example. You create task to get result from the DB and set continuation to handle this result. No block.
I believe there are other variants as well. So it is up to you to choose what is better suited for the task

Related

How to execute code asynchronously without creating new threads

I am using Qt SQL which is blocking API so I have to execute SQL code in Separate thread (QtConcurrent::run) and return (Q)future.
something like this:-
QFuture<QString> future = QtConcurrent::run( []() { /* some SQL code */ } );
auto watcher = new QFutureWatcher<QString>();
watcher.setFuture(future);
connect(watcher,&QFutureWatcher<QString>::finished,
[future](){ /* code to execute after future is finished */ });
But I learned that Threading is costly. every context switch is expensive. So it looks like CPU wastage to create new Thread just for waiting for result from MySQL server. My application is going to run on single core Virtual Machine on Google Cloud anyways . it there any way I can execute Qt SQL code asynchronusly without possibly creating new thread ?
I was also wondering how other APIs like Qt Networking implement asynchronus API without create new thread ? or i am wrong and they do create new thread under the hood ?
Many threaded applications run on a single core. Flushing cache to run on a separate core is also expensive. Use the right tool for the job. There's nothing wrong with threads.
That said, if you really want to run on a single thread use a workqueue to keep track of async task progress. The libevent library does this for you, but there are others. You just run a polling loop adding work onto the queue and executing callbacks when a task needs attention or completes.
By using QtConcurrent::run you already solved one problem - cost of creating thread because it use a thread pool.
When comes to context switches, first you could try to measure them with perf stat. And depends on situation, optimize it. If its just simple queries then probably vast majority of context switches comes from the system, not your app.
Doing something async means that you can execute task and move forward with your current code without waiting for results. But usually such task i.e sql query will spawn thread/process or will make request to OS.
Qt Networking make i.e read request and OS signals (epoll) when data will arrive. But in case of single core OS will interrupt your thread anyway.
If you have many many small queries you could try optimize them to make less queries, do caching.

Can I use QTimer to replace QThread?

More precisely, the question should be:
What's the difference between connecting the signal QTimer::timeout to my working function and creating a worker thread with QThread?
I am writing a program which receives streaming data in main thread (the signal is generated by QIODevice::readread())and processes them concurrently. For now I start a QTimer constantly firing signal QTimer::timeout, and the signal is connected to a working function in main thread which does the data processing stuff. This is how I achieve the concurrency.
I wonder if this approach different from creating another thread with QThread, since the idea I've found in this topic is very simliar to what I've done. The only difference is that the accepted answer creates another thread and moves timer and worker class on it. Besides the difference, I can't see any necessity of using a thread in my case.
In my case (receiving data in main thread and processing them concurrently), am I doing OK using QTimer or should I create a QThread? I am quite new to multi-threading, and if I misunderstand something, please help correct me. Thank you.
[Edit]:
I don't know what's the difference/advantage of creating a new thread to process the data. For now, everything is doing in one thread: I keep storing data in a queue and dequeue them one by one in a function triggered by QTimer::timeout.
What's the difference between connecting the signal QTimer::timeout to my working
function and creating a worker thread with QThread?
When you connect some signal/slot pair from the objects which has the same thread affinity, then the connection is direct. What it means is in your case, the main thread creates the timer, and also contains the slot, so the signal will be emitted in the main thread and also will be processed in the main thread (as the slot is also in the main thread).
When you connect some signal/slot pair from the objects which has the different thread affinity, then the connection is queued. That means signal emission and slot execution will run in different threads.
You are not really achieving concurrency, the timer signal and processing slot are executing in main thread sequentially.
So here are your options:
If you want to process data in main thread, current code is ok.
If you want to emit timeout in main thread and process data in different thread then create new class with the processing method and use moveToThread with object of that class.
The link you provided really has a different situation. In your case (correct me if I am wrong), you process data only when data is available, not just after a specified time. Your situation is much like traditional producer/consumer problem. My proposal is to not use QTimer at all. Instead create a new class with a slotwhich will process data. Then emit a signal from main thread when data is available, and connect if to the processing slot. You will achieve real concurrency. In this case you will need to implement locking for shared data access, it is easy in Qt, you can just use QMutexLocker
First, a little background:
One of the fundamental ideas behind threads is that a thread can only do one thing at a time. It may be updating the GUI, or processing data, or communicating with a remote server, but it can't be doing all those things at once.
That's where multi-threading comes in. You probably want your computer to be doing many things at once (watching videos, browsing the web, listening to music, and writing code all at the same time). The computer allows you to do that by scheduling each of these tasks on a separate threads and switching between them in periodic intervals.
In the old days, before multi-core processors, this was achieved solely by multitasking (the processor would interrupt the currently executing thread, switch to another thread context and execute the other thread for a while before switching again). With modern processors, you can have several threads executing at the EXACT same time, one on each core. This is typically referred to as multiprocessing.
Now, back to your question:
A thread can only do one thing at a time and, if you use a timer, you are using the main (AKA GUI) thread to process your data. This thread is typically responsible for responding to OS events and updating the GUI (hence GUI thread). If you don't have a lot of data to process, it's typically OK to do so on the GUI thread. However, if the data processing time has a chance of growing, it is recommended to execute such processing on a separate thread to make sure that the UI remains responsive (and so that you don't get the annoying "Your program is not responding" message from the OS). Basically, if data processing can take longer than ~200ms, it is recommended to execute the processing on a separate thread so that the user doesn't feel like the GUI is "stuck".

Question on using multithreading to periodically and forcefully check for updates on software

I'm working on an application that has a main thread performing some work (message loop of the UI etc.), but I would also like a second thread, which would periodically test if there are any updates available to download. I would also like the possibility for the main thread to ask the secondary thread to force checking for updates, and for the secondary thread to ask the main thread for confirmation on downloading updates.
I don't have that much experience with IPC and multithreading in real life situations, so I'm not sure how I should go about designing this. I would like to eventually have this work on both Windows and POSIX, but let us focus on POSIX for now. Here's my idea so far:
Secondary thread pseudocode:
repeat forever:
check_for_updates()
if (are_any_updates()) {
put the list of available updates on some message queue
send signal SIGUSER1 to main thread
wait for response from that message queue
if (response is positive) download_updates()
}
unblock signal SIGUSER1 on secondary thread
Sleep(one hour)
block signal SIGUSER1
if (any_signal_was_received_while_sleeping)
any_signal_was_received_while_sleeping := false
Sleep(one more hour)
SIGUSER1 handler on secondary thread (main thread has requested us to check for updates):
block signal SIGUSER1 (making sure we don't get signal in signal)
any_signal_was_received_while_sleeping := true
check_for_updates()
...
unblock signal SIGUSER1
Basically, main thread uses SIGUSER1 to ask the secondary thread to force checking for updates, while secondary thread uses SIGUSER1 to ask the main thread to look into the message queue for the available updates and to confirm whether they should be downloaded or not.
I'm not sure if this is a good design or if it would even work properly. One of my problems is related to handling SIGUSER1 received in the main thread, because it's a pretty big application and I'm not really sure when is the right time to block and unblock it (I assume it should be somewhere in the message loop).
Any opinion is appreciated, including advice on what IPC features should I use on Windows (maybe RPC instead of signals?). I could completely remove the use of message queue if I settled on threads, but I might consider using processes instead. I'll clearly use threads on Windows, but I'm not sure about POSIX yet.
You should strongly consider using boost::thread to solve your problem. It is far more comprehensible than directly using posix and is cross platform. Take the time to use a better tool and you will end up saving yourself a great deal of effort.
In particular I think you will find that a condition variable would neatly facilitate your simple interaction.
EDIT:
You can do almost anything with the correct use of mutexes and condition variables. Another piece of advice would be to encapsulate your threads inside class objects. This allows you to write functions that act on the thread and it's data. In your case the main thread could have a method like requestUpdateConfirmation(), inside this you can block the calling thread and wait for the main thread to deal with the request before releasing the caller.

I want to wait on both a file descriptor and a mutex, what's the recommended way to do this?

I would like to spawn off threads to perform certain tasks, and use a thread-safe queue to communicate with them. I would also like to be doing IO to a variety of file descriptors while I'm waiting.
What's the recommended way to accomplish this? Do I have to created an inter-thread pipe and write to it when the queue goes from no elements to some elements? Isn't there a better way?
And if I have to create the inter-thread pipe, why don't more libraries that implement shared queues allow you to create the shared queue and inter-thread pipe as a single entity?
Does the fact I want to do this at all imply a fundamental design flaw?
I'm asking this about both C++ and Python. And I'm mildly interested in a cross-platform solution, but primarily interested in Linux.
For a more concrete example...
I have some code which will be searching for stuff in a filesystem tree. I have several communications channels open to the outside world through sockets. Requests that may (or may not) result in a need to search for stuff in the filesystem tree will be arriving.
I'm going to isolate the code that searches for stuff in the filesystem tree in one or more threads. I would like to take requests that result in a need to search the tree and put them in a thread-safe queue of things to be done by the searcher threads. The results will be put into a queue of completed searches.
I would like to be able to service all the non-search requests quickly while the searches are going on. I would like to be able to act on the search results in a timely fashion.
Servicing the incoming requests would generally imply some kind of event-driven architecture that uses epoll. The queue of disk-search requests and the return queue of results would imply a thread-safe queue that uses mutexes or semaphores to implement the thread safety.
The standard way to wait on an empty queue is to use a condition variable. But that won't work if I need to service other requests while I'm waiting. Either I end up polling the results queue all the time (and delaying the results by half the poll interval, on average), blocking and not servicing requests.
Whenever one uses an event driven architecture, one is required to have a single mechanism to report event completion. On Linux, if one is using files, one is required to use something from the select or poll family meaning that one is stuck with using a pipe to initiate all none file related events.
Edit: Linux has eventfd and timerfd. These can be added to your epoll list and used to break out of the epoll_wait when either triggered from another thread or on a timer event respectively.
There is another option and that is signals. One can use fcntl modify the file descriptor such that a signal is emitted when the file descriptor becomes active. The signal handler may then push a file-ready message onto any type of queue of your choosing. This may be a simple semaphore or mutex/condvar driven queue. Since one is now no longer using select/poll, one no longer needs to use a pipe to queue none file based messages.
Health warning: I have not tried this and although I cannot see why it will not work, I don't really know the performance implications of the signal approach.
Edit: Manipulating a mutex in a signal handler is probably a very bad idea.
I've solved this exact problem using what you mention, pipe() and libevent (which wraps epoll). The worker thread writes a byte to its pipe FD when its output queue goes from empty to non-empty. That wakes up the main IO thread, which can then grab the worker thread's output. This works great is actually very simple to code.
You have the Linux tag so I am going to throw this out: POSIX Message Queues do all this, which should fulfill your "built-in" request if not your less desired cross-platform wish.
The thread-safe synchronization is built-in. You can have your worker threads block on read of the queue. Alternatively MQs can use mq_notify() to spawn a new thread (or signal an existing one) when there is a new item put in the queue. And since it looks like you are going to be using select(), MQ's identifier (mqd_t) can be used as a file descriptor with select.
It seems nobody has mentioned this option yet:
Don't run select/poll/etc. in your "main thread". Start a dedicated secondary thread which does the I/O and pushes notifications into your thread-safe queue (the same queue which your other threads use to communicate with the main thread) when I/O operations complete.
Then your main thread just needs to wait on the notification queue.
Duck's and twk's are actually better answers than doron's (the one selected by the OP), in my opinion. doron suggests writing to a message queue from within the context of a signal handler, and states that the message queue can be "any type of queue." I would strongly caution you against this since many C library/system calls cannot safely be called from within a signal handler (see async-signal-safe).
In particuliar, if you choose a queue protected by a mutex, you should not access it from a signal handler. Consider this scenario: your consumer thread locks the queue to read it. Immediately after, the kernel delivers the signal to notify you that a file descriptor now has data on it. You signal handler runs in the consumer thread, necessarily), and tries to put something on your queue. To do this, it first has to take the lock. But it already holds the lock, so you are now deadlocked.
select/poll is, in my experience, the only viable solution to an event-driven program in UNIX/Linux. I wish there were a better way inside a mutlithreaded program, but you need some mechanism to "wake up" your consumer thread. I have yet to find a method that does not involve a system call (since the consumer thread is on a waitqueue inside the kernel during any blocking call such as select).
EDIT: I forgot to mention one Linux-specific way to handle signals when using select/poll: signalfd(2). You get a file descriptor you can select/poll on, and you handling code runs normally instead of in a signal handler's context.
This is a very common seen problem, especially when you are developing network server-side program. Most Linux server-side program's main look will loop like this:
epoll_add(serv_sock);
while(1){
ret = epoll_wait();
foreach(ret as fd){
req = fd.read();
resp = proc(req);
fd.send(resp);
}
}
It is single threaded(the main thread), epoll based server framework. The problem is, it is single threaded, not multi-threaded. It requires that proc() should never blocks or runs for a significant time(say 10 ms for common cases).
If proc() will ever runs for a long time, WE NEED MULTI THREADS, and executes proc() in a separated thread(the worker thread).
We can submit task to the worker thread without blocking the main thread, using a mutex based message queue, it is fast enough.
epoll_add(serv_sock);
while(1){
ret = epoll_wait();
foreach(ret as fd){
req = fd.read();
queue.add_job(req); // fast, non blockable
}
}
Then we need a way to obtain the task result from a worker thread. How? If we just check the message queue directly, before or after epoll_wait().
epoll_add(serv_sock);
while(1){
ret = epoll_wait(); // may blocks for 10ms
resp = queue.check_result(); // fast, non blockable
foreach(ret as fd){
req = fd.read();
queue.add_job(req); // fast, non blockable
}
}
However, the checking action will execute after epoll_wait() to end, and epoll_wait() usually blocks for 10 micro seconds(common cases) if all file descriptors it waits are not active.
For a server, 10 ms is quite a long time! Can we signal epoll_wait() to end immediately when task result is generated?
Yes! I will describe how it is done in one of my open source project:
Create a pipe for all worker threads, and epoll waits on that pipe as well. Once a task result is generated, the worker thread writes one byte into the pipe, then epoll_wait() will end in nearly the same time! - Linux pipe has 5 us to 20 us latency.
In my project SSDB(a Redis protocol compatible in-disk NoSQL database), I create a SelectableQueue for passing messages between the main thread and worker threads. Just like its name, SelectableQueue has an file descriptor, which can be wait by epoll.
SelectableQueue: https://github.com/ideawu/ssdb/blob/master/src/util/thread.h#L94
Usage in main thread:
epoll_add(serv_sock);
epoll_add(queue->fd());
while(1){
ret = epoll_wait();
foreach(ret as fd){
if(fd is queue){
sock, resp = queue->pop_result();
sock.send(resp);
}
if(fd is client_socket){
req = fd.read();
queue->add_task(fd, req);
}
}
}
Usage in worker thread:
fd, req = queue->pop_task();
resp = proc(req);
queue->add_result(fd, resp);
C++11 has std::mutex and std::condition_variable. The two can be used to have one thread signal another when a certain condition is met. It sounds to me like you will need to build your solution out of these primitives. If you environment does not yet support these C++11 library features, you can find very similar ones at boost. Sorry, can't say much about python.
One way to accomplish what you're looking to do is by implementing the Observer Pattern
You would register your main thread as an observer with all your spawned threads, and have them notify it when they were done doing what they were supposed to (or updating during their run with the info you need).
Basically, you want to change your approach to an event-driven model.

Processing messages is too slow, resulting in a jerky, unresponsive UI - how can I use multiple threads to alleviate this?

I'm having trouble keeping my app responsive to user actions. Therefore, I'd like to split message processing between multiple threads.
Can I simply create several threads, reading from the same message queue in all of them, and letting which ever one is able process each message?
If so, how can this be accomplished?
If not, can you suggest another way of resolving this problem?
You cannot have more than one thread which interacts with the message pump or any UI elements. That way lies madness.
If there are long processing tasks which can be farmed out to worker threads, you can do it that way, but you'll have to use another thread-safe queue to manage them.
If this were later in the future, I would say use the Asynchronous Agents APIs (plug for what I'm working on) in the yet to be released Visual Studio 2010 however what I would say given todays tools is to separate the work, specifically in your message passing pump you want to do as little work as possible to identify the message and pass it along to another thread which will process the work (hopefully there isn't Thread Local information that is needed). Passing it along to another thread means inserting it into a thread safe queue of some sort either locked or lock-free and then setting an event that other threads can watch to pull items from the queue (or just pull them directly). You can look at using a 'work stealing queue' with a thread pool for efficiency.
This will accomplish getting the work off the UI thread, to have the UI thread do additional work (like painting the results of that work) you need to generate a windows message to wake up the UI thread and check for the results, an easy way to do this is to have another 'work ready' queue of work objects to execute on the UI thread. imagine an queue that looks like this: threadsafe_queue<function<void(void)> basically you can check if it to see if it is non-empty on the UI thread, and if there are work items then you can execute them inline. You'll want the work objects to be as short lived as possible and preferably not do any blocking at all.
Another technique that can help if you are still seeing jerky movement responsiveness is to either ensure that you're thread callback isn't executing longer that 16ms and that you aren't taking any locks or doing any sort of I/O on the UI thread. There's a series of tools that can help identify these operations, the most freely available is the 'windows performance toolkit'.
Create the separate thread when processing the long operation i.e. keep it simple, the issue is with some code you are running that is taking too long, that's the code that should have a separate thread.