OpenCV -- safe to enqueue the same function with different data on same Stream? - c++

I'm trying to optimise my OpenCV code to run on the GPU. The problem is that there seem to be conflicting opinions on what is and isn't safe to run on the GPU.
In the thread here: how to use gpu::Stream in OpenCV? , the answer states:
Currently, you may face problems if same operation is enqueued twice with different data to different streams.
I would be happy to solve this by enqueuing these operations onto the same stream. However, in the document here http://on-demand.gputechconf.com/gtc/2013/webinar/gtc-express-itseez-opencv-webinar.pdf the author writes (slide 28):
Current limitation:
– Unsafe to enqueue the same GPU operation multiple times
And he shows an example in which he states that it's unsafe to enqueue the same operation on the same stream, too.
I am confused -- would it be safe for me to enqueue the same operation on the same Stream, or not? Does anyone know?
Intuitively, I would have thought it to be OK, since the same stream would, I imagine, run in serial, so the two functions would never try to concurrently access the same data. But I'd really like confirmation before I implement something.
Thank you for your help!

Related

mutex vs. lock-free when using threads for audio signal processing

I'm trying to build an Audio Spectrum Analyzer in C++. I'm trying to avoid frameworks (like e.g. JUCE), however i will use FFTW3 to calculate the FFT i need. I use WASAPI to obtain audio data.
What i've got so far:
I use WASAPI to get raw data (from the audiobuffer) of the default output device via loopback mode.
I wrote some OpenGL-code, that let's me illustrate data in different ways.
What's missing:
I still need to perform the FFT and merge everything together.
What my problem is:
I'm not sure how to handle thread communication, or more on how to deal with thread-safety in this application. My idea for the setup is this: I use a FIFO-queue with fixed length (circular buffer?) and fill it with given WASAPI data whenever it's ready (Thread 1). Then have another Thread (Thread 2) to read the queue and to perform the FFT. Frequencies are stored in an array and drawn via another thread, that runs the OpenGL-Code (Thread 3).
Does it make sense to use lock-free implementations of a fifo-queue and array or should i use mutexes to guarantee thread-safety? Would it make sense to lock while performing read/write operations to said datastructures and then blocking Thread 2 and 3 while no data is available?
I'm sorry if this is trivial to you, but i'm kind of new to multithreading.
Thank you very much.

how to synchronize three dependent threads

If I have
1. mainThread: write data A,
2. Thread_1: read A and write it to into a Buffer;
3. Thread_2: read from the Buffer.
how to synchronize these three threads safely, with not much performance loss? Is there any existing solution to use? I use C/C++ on linux.
IMPORTANT: the goal is to know the synchronization mechanism or algorithms for this particular case, not how mutex or semaphore works.
First, I'd consider the possibility of building this as three separate processes, using pipes to connect them. A pipe is (in essence) a small buffer with locking handled automatically by the kernel. If you do end up using threads for this, most of your time/effort will be spent on creating nearly an exact duplicate of the pipes that are already built into the kernel.
Second, if you decide to build this all on your own anyway, I'd give serious consideration to following a similar model anyway. You don't need to be slavish about it, but I'd still think primarily in terms of a data structure to which one thread writes data, and from which another reads the data. By strong preference, all the necessary thread locking necessary would be built into that data structure, so most of the code in the thread is quite simple, reading, processing, and writing data. The main difference from using normal Unix pipes would be that in this case you can maintain the data in a more convenient format, instead of all the reading and writing being in text.
As such, what I think you're looking for is basically a thread-safe queue. With that, nearly everything else involved becomes borders on trivial (at least the threading part of it does -- the processing involved may not be, but at least building it with multiple threads isn't adding much to the complexity).
It's hard to say how much experience with C/C++ threads you have. I hate to just point to a link but have you read up on pthreads?
https://computing.llnl.gov/tutorials/pthreads/
And for a shorter example with code and simple mutex'es (lock object you need to sync data):
http://students.cs.byu.edu/~cs460ta/cs460/labs/pthreads.html
I would suggest Boost.Thread for this purpose. This is quite good framework with mutexes and semaphores, and it is multiplatform. Here you can find very good tutorial about this.
How exactly synchronize these threads is another problem and needs more information about your problem.
Edit The simplest solution would be to put two mutexes -- one on A and second on Buffer. You don't have to worry about deadlocks in this particular case. Just:
Enter mutex_A from MainThread; Thread1 waits for mutex to be released.
Leave mutex from MainThread; Thread1 enters mutex_A and mutex_Buffer, starts reading from A and writes it to Buffer.
Thread1 releases both mutexes. ThreadMain can enter mutex_A and write data, and Thread2 can enter mutex_Buffer safely read data from Buffer.
This is obviously the simplest solution, and probably can be improved, but without more knowledge about the problem, this is the best I can come up with.

Issue with Mutual Execution of Concurrent Go Routines

In my code there are three concurrent routines. I try to give a brief overview of my code,
Routine 1 {
do something
*Send int to Routine 2
Send int to Routine 3
Print Something
Print Something*
do something
}
Routine 2 {
do something
*Send int to Routine 1
Send int to Routine 3
Print Something
Print Something*
do something
}
Routine 3 {
do something
*Send int to Routine 1
Send int to Routine 2
Print Something
Print Something*
do something
}
main {
routine1
routine2
routine3
}
I want that, while codes between two do something (codes between two star marks) is executing, flow of control must not go to other go routines. For example, when routine1 is executing the events between two stars (sending and printing events), routine 2 and 3 must be blocked (means flow of execution does not pass to routine 2 or 3 from routine 1).After completing last print event, flow of execution may pass to routine 2 or 3.Can anybody help me by specifying, how can I achieve this ? Is it possible to implement above specification by WaitGroup ? Can anybody show me by giving a simple example how to implement above specified example by using WaitGroup. Thanks.
NB:May be this is repeat question of this. I tried by using that sync-lock mechanism, however, may be because I have a large code that's why I could not put lock-unlock properly, and it's creating deadlock situation (or may be my method is error producing). Can anybody help me by a simple procedure thus I can achieve this. I give a simple example of my code here where Here I want to put two prints and sending event inside mutex (for routine 1) thus routine 2 can't interrupt it. Can you help me how is it possible. One possible solution given,
http://play.golang.org/p/-uoQSqBJKS which gives error.
Why do you want to do this?
The deadlock problem is, if you don't allow other goroutines to be scheduled, then your channel sends can't proceed, unless there's buffering. Go's channels have finite buffering, so you end up with a race condition on draining before they get sent on while full. You could introduce infinite buffering, or put each send in its own goroutine, but it again comes down to: why are you trying to do this; what are you trying to achieve?
Another thing: if you only want to ensure mutual exclusion of the three sets of code between *s, then yes, you can use mutexes. If you want to ensure that no code interrupts your block, regardless of where it was suspended, then you might need to use runtime.LockOSThread and runtime.UnlockOSThread. These are fairly low level and you need to know what you're doing, and they're rarely needed. Of you want there to be no other goroutines running, you'll have to have runtime.GOMAXPROCS(1), which is currently the default.
The problem in answering your question is that it seems no one understands what your problem really is. I see you're asking repeatedly about roughly the same, though no progress has been done. There's no offense in saying this. It's an attempt to help you by a suggestion to reformulate your problem in a way comprehensible to others. As a possible nice side effect, some problems do solve themselves while being explained to others in an understandable way. I've experienced that many times by myself.
Another hint could be in the suspicious mix of explicit syncing and channel communication. That doesn't mean the design is necessarily broken. It just doesn't happen in a typical/simple case. Once again, your problem might be atypical/non trivial.
Perhaps it's somehow possible to redesign your problem using only channels. Actually I believe that every problem involving explicit synchronization (in Go) could be coded while using only channels. That said, it is true some problems are written with explicit synchronization very easily. Also channel communication, as cheap as it is, is not as cheap as most synchronization primitives. But that could be looked after later, when the code works. If the "pattern" for some say sync.Mutex will visibly emerge in the code, it should be possible to switch to it and much more easy to do that when the code already works and hopefully has tests to watch your steps while making the adjustments.
Try to think about your goroutines like independently acting agents which:
Exclusively own the data received from the channel. The language will
not enforce this, you must deploy own's discipline.
Don't anymore touch the data they've sent to a channel. It follows from first rule, but important enough to be explicit.
Interact with other agents (goroutines) by data types, which encapsulate a whole unit of workflow/computation. This eliminates e.g. your earlier struggle with geting the right number of channel messages before the "unit" is complete.
For every channel they use, it must be absolutely clear in before if the channel must be unbuffered, must be buffered for fixed number of items or if it may be unbound.
Don't have to think (know) about what other agents are doing above getting a message from them if that is needed for the agent to do its own task - part of the bigger picture.
Using even such few rules of thumb should hopefully produce code which is more easy to reason about and which usually doesn't requires any other synchronization. (I'm intentionally ignoring performance issues of mission critical applications now.)

Testing concurrent data structure

What are some methods for testing concurrent data structures to make sure the data structs behave correctly when accessed from multiple threads ?
All of the other answers have focused on actually testing the code by putting it through its paces and actually running it in one form or another or politely saying "don't do it yourself, use an existing library".
This is great and all, but IMO, the most important (practical tests are important too) test is to look at the code line by line and for every line of code ask "what happens if I get interrupted by another thread here?" Imagine another thread, running just about any of the other lines/functions during this interruption. Do things still stay consistent? When competing for resources, does the other thread[s] block or spin?
This is what we did in school when learning about concurrency and it is a surprisingly effective approach. Bottom line, I feel that taking the time to prove to yourself that things are consistent and work as expected in all states is the first technique you should use when dealing with this stuff.
Concurrent systems are probabilistic and errors are often difficult to replicate. Therefore you need to run various input/output cases, each tested over time (hours, days, etc) in order to detect possible errors.
Tests for concurrent data structure involves examining the container's state before and after expected events such as insert and delete.
Use a pre-existing, pre-tested library that meets your needs if possible.
Make sure that the code has appropriate self-consistency checks (preferably fast sanity checks), and run your code on as many different types of hardware as possible to help narrow down interesting timing problems.
Have multiple people peer review the code, preferably without a pre-explanation of how it's supposed to work. That way they have to grok the code which should help catch more bugs.
Set up a bunch of threads that do nothing but random operations on the data structures and check for consistency at some rate.
Start with the assumption that your calls to access/modify data are not thread safe and use locks to ensure only a single thread can access/modify any part of the data at a time. Only after you can prove to yourself that a specific type of access is safe outside of the lock by multiple threads at once should you move that code outside of the lock.
Assume worst case scenarios, e.g. that your code will stop right in the middle of some pointer manipulation or another critical point, and that another thread will encounter that data in mid-transition. If that would have a bad result, leave it within the lock.
I normally test these kinds of things by interjecting sleep() calls at appropriate places in the distributed threads/processes.
For instance, to test a lock, put sleep(2) in all your threads at the point of contention, and spawn two threads roughly 1 second apart. The first one should obtain the lock, and the second should have to wait for it.
Most race conditions can be tested by extending this method, but if your system has too many components it may be difficult or impossible to know every possible condition that needs to be tested.
Run your concurrent threads for one or a few days and look what happens. (Sounds strange, but finding out race conditions is such a complex topic that simply trying it is the best approach).

How do I concurrently download and convert a binary file using threads?

I have a program that downloads a binary file, from another PC.
I also have a another standalone program that can convert this binary file to a human readable CSV.
I would like to bring the conversion tool "into" the download tool, creating a thread in the download tool that kicks off the conversion code (so it can start converting while it is downloading, reducing the total time of download and convert independently).
I believe I can successfully kick off another thread but how do I synchronize the conversion thread with the main download?
i.e. The conversion catches up with the download, needs to wait for more to download, then start converting again, etc.
Is this similar to the Synchronizing Execution of Multiple Threads ? If so does this mean the downloaded binary needs to be a resource accessed by semaphores?
Am I on the right path or should i be pointed in another direction before I start?
Any advice is appreciated.
Thank You.
This is a classic case of the producer-consumer problem with the download thread as the producer and the conversion thread as the consumer.
Google around and you'll find an implementation for your language of choice. Here are some from MSDN: How to: Implement Various Producer-Consumer Patterns.
Intead of downloading to a file, you should write the downloaded data to a pipe. The convert thread can be reading from the pipe and then writing the converted output to a file. That will automatically synchronize them.
If you need the original file as well as the converted one, just have the download thread write the data to the file then write the same data to the pipe.
Yes, you undoubtedly need semaphores (or something similar such as an event or critical section) to protect access to the data.
My immediate reaction would be to think primarily in terms of a sequence of blocks though, not an entire file. Second, I almost never use a semaphore (or anything similar) directly. Instead, I would normally use a thread-safe queue, so when the network thread has read a block, it puts a structure into the queue saying where the data is and such. The processing thread waits for an item in the queue, and when one arrives it pops and processes that block.
When it finishes processing a block, it'll typically push the result onto another queue for the next stage of processing (e.g., writing to a file), and (quite possibly) put a descriptor for the processed block onto another queue, so the memory can be re-used for reading another block of input.
At least in my experience, this type of design eliminates a large percentage of thread synchronization issues.
Edit: I'm not sure about guidelines about how to design a thread-safe queue, but I've posted code for a simple one in a previous answer.
As far as design patterns go, I've seen this called at least "pipeline" and "production line" (though I'm not sure I've seen the latter in much literature).