IMage elaboration and saving using Multithreading

IMage elaboration and saving using Multithreading - c++

I've written a software in C++ for processing the video stream coming from a camera, using openCV libraries.
I would like to save the video frame while processing it, in order to have the possibility to run the code many times offline using the exact same video as input.
I was thinking to use multi-threading using the Producer/Consumer Pattern.
My idea would be to have one producer (frame grabber) and two consumer (one for processing the image and the second one for saving the frames on file (as video)).
I don't have experience with multi-threading programming, so I've searched for some tutorials on internet.
All the tutorials I've found were about one producer and one consumer, but what I need is slightly different: my need would be a producer that sends the same image to both consumers and after both consumers finish their work, go ahead with the next frame. The point is that the producer would have one queue where it stores the frames, while the consumers both would need to read the same element once from the same queue.
Do you have any suggestion?
Do you think that the pattern I've chosen fits my need?
Thanks.

Producer-consumer works. In your case, the producer could "produce" twice, first placing it in the processing queue, then placing a 2nd copy in the saving queue.

Related

Tee/passthrough DirectShow data as video source

I have an application that gets video samples from a frame grabber card via DirectShow. The application then does some processing and sends the video signal over a network. I now want to duplicate this video signal such that another DirectShow-enabled software (like Skype) can use the original input signal, too.
I know that you can create Tee filters in DirectShow like the one used to split a video signal for recording and preview. However, as I understand, this filter is only useful within a single graph, ie I cannot use it to forward the video from my process to eg Skype.
I also know that I could write my own video source, but this would run in the process of the consuming application. The problem is that I cannot put the logic of my original application in such a video source filter.
The only solution I could think of is my application writing the frames to a shared memory block and a video source filter reading it from there. Synchronisation would be done using a shared mutex or so. Could that work? I specifically do not like the synchronisation part?
And more importantly, is there a better solution to solve this problem?

The APIs work as you identified: a video capture application, such as Skype, is requesting video stream without interprocess communication in mind, there is no IPC involved to consume output generated in another process. Your challenge here is to provide this IPC yourself so that one application is generating the data, and then another extends existing API (virtual video source device) and picks existing data, then delivers as generated.
With video, you have a relatively big stream of data and you are interested in avoiding its excessive copying. File mappings (AKA shared memory) are the right thing to do: you put bytes in one process and they are immediately visible in another. You can synchronize access to the data using names events and mutexes which both processes use collaboratively - to signal availability of new buffer of data, as indication that used buffer is no longer in use etc.

design pattern(s) or best practice for image acquisition and image processing in real time application

First, by "real-time", image processing of an image should take 0.1 second or less in this application.
In our application, three threads in addition to main thread are running. One for image acquisition, second for image processing, third for robot. Between the two threads, there is an image queue to share so while camera en-queues images and robot de-queues the processed images, imaging processor de-queues images and enqueues processed images. One restriction you might've noticed is that processed images should be in sequence meaning keeping the same order of images as in image acquisition.
Is there any design pattern or best practice to be applied for this architecture.

The pipes and filter pattern can be well suited for this.
The acquisition filter needs to be serial in order.
The processing filter can run in parallel.
The transport-to-robot filter needs to be serial in order.
In order to accomplish this with pre-existing technology, I have seen realtime applications processing large amounts of data use Intel's Threading Building Blocks (TBB). In the Thread Building Blocks Tutorial, the "Working on the Assembly Line: pipeline" section describes a similar problem:
A simple text processing example will be used to demonstrate the usage of pipeline and filter to perform parallel formatting. The example reads a text file, squares each decimal numeral in the text, and writes the modified text to a new file. [...] Assume that the raw file I/O is sequential. The squaring filter can be done in parallel. That is, if you can serially read n chunks very quickly, you can transform each of the n chunks in parallel, as long as they are written in the proper order to the output file.
And the accompanying code:
void RunPipeline( int ntoken, FILE* input_file, FILE* output_file ) {
tbb::parallel_pipeline(
ntoken,
tbb::make_filter<void,TextSlice*>(
tbb::filter::serial_in_order, MyInputFunc(input_file) )
& tbb::make_filter<TextSlice*,TextSlice*>(
tbb::filter::parallel, MyTransformFunc() )
& tbb::make_filter<TextSlice*,void>(
tbb::filter::serial_in_order, MyOutputFunc(output_file) ) );
}
Regardless of whether or not TBB is used, it can serve as a great implementation reference for a pipe and filter pattern that decouples the pattern from the algorithms, while providing the ability to control data order/threading for the filters.

I think your approach is correct.
The possible improvement could be usage of thread pools, for example for image processing (especially if it takes much more time than acquisition). You could consider OpenMP or boost threadpool, or boost::asio::io_service

Use Post or PostAndAsyncReply with F#'s MailboxProcessor?

I've seen different snippets demonstrating a Put message that returns unit with F#'s MailboxProcessor. In some, only the Post method is used while others use PostAndAsyncReply, with the reply channel immediately replying once the message is being processed. In doing some testing, I found a significant time lag when awaiting the reply, so it seems that unless you need a real reply, you should use Post.
Note: I started asking this in another thread but thought it useful to post as a full question. In the other thread, Tomas Petricek mentioned that the reply channel could be used a wait mechanism to ensure the caller delayed until the Put message was processed.
Does using PostAndAsyncReply help with message ordering, or is it just to force a pause until the first message is processed? In terms of performance Post appears the right solution. Is that accurate?
Update:
I just thought of a reason why PostAndAsyncReply might be necessary in the BlockingQueueAgent example: Scan is used to find Get messages when the queue is full, so you don't want to Put and then Get before the previous Put has completed.

I think I generally agree with your summary - it makes sense that PostAndAsyncReply is slower than Post, so if the caller doesn't need to get a notification from the agent when the operation (such as putting value into the queue) completes, it should definitely expose a way to do that using just Post. The fact that PostAndAsyncReply is a lot slower probably means that some agents should expose both options and let the caller decide.
Regarding the specific example of BlockingQueueAgent (or a similar one that I used to implement one-place buffer), the typical application of the agent is to solve the consumer-producer problem. In consumer-producer problem, we want to block the producer when the queue is full and block the consumer when it is empty. The .NET BlockingCollection supports only synchronous blocking, which is a bit bad (i.e. it can block the whole thread pool).
The using the BlockingQueueAgent that sends the Put messsage using PostAndAsyncReply, we can wait until the element is added to the queue asynchronously (so it blocks the producer, but without blocking threads!) An example of typical usage is the image processing pipeline that I wrote some time ago. Here is one snippet from that:
// Phase 2: Scale to a thumbnail size and add frame
let scalePipelinedImages = async {
while true do
let! info = loadedImages.AsyncGet()
scaleImage info
do! scaledImages.AsyncAdd(info) }
This loop repeatedly gets an image from the loadedImages queue, does some processing and writes the result to scaledImages. The blocking using the queue (both when reading and when writing) controls the parallelism, so that the steps of pipeline run in parallel, but it does not keep loading more and more images if the pipeline cannot handle them at the required speed.

My advice is to design your system so you can use Post as much as possible.
This technology was designed for asynchronous concurrency where the objective is to fire-and-forget messages. The idea of waiting for a response goes directly against the grain of this.

Multiple Producers Single Consumer Queue

I am new to multithreading and have designed a program that receives data from two microcontroller measuring various temperatures (Ambient and Water) and draws the data to the screen. Right now the program is singly threaded and its performance SUCKS A BIG ONE.
I get basic design approaches with multithreading but not well enough to create a thread to do a task but what I don't get is how to get threads to perform seperate task and place the data into a shared data pool. I figured that I need to make a queue that has one consumer and multiple producers (would like to use std::queue). I have seen some code on the gtkmm threading docs that show a single Con/Pro queue and they would lock the queue object produce data and signal the sleeping thread that it is finished then the producer would sleep. For what I need would I need to sleep a thread, would there be data conflicts if i didn't sleep any of the threads, and would sleeping a thread cause a data signifcant data delay (I need realtime data to be drawn 30 frames a sec)
How would I go about coding such a queue using the gtkmm/glibmm library.

Here's a suggestion:
1. Have two threads, that are responsible for obtaining data and placing into a buffer. Each thread has it's own (circular) buffer.
2. There will be a third thread that is responsible for getting data from the buffers and displaying on the screen.
3. The screen thread sends messages to the data threads requesting some data, then displays the data. The messages help synchronize execution and avoid dead-locks.
4. None of the threads should "wait on single or multiple objects", but poll for events.
Think of this scenario using people. One person is delivering water temperature readings. Another person delivering ambient temperature readings. A third person receives or asks for the data and displays the data (on a white board). The objective is to keep everybody operating at maximum efficiency without any collisions.

If you're looking for a lock free implementation of this, you won't find one. When data structures are being written to, something needs to keep two threads from simultaneously updating the data structure and corrupting it.
Is there any reason you can't have each thread collect on it's own, with it's own structure, and then combine the results at the end?

Looking for design advise - Statistics reporter

I need to implement a statistics reporter - an object that prints to screen bunch of statistic.
This info is updated by 20 threads.
The reporter must be a thread itself that wakes up every 1 sec, read the info and prints it to screen.
My design so far: InfoReporterElement - one element of info. has two function, PrintInfo and UpdateData.
InfoReporterRow - one row on screen. A row holds vector of ReporterInfoElement.
InfoReporterModule - a module composed of a header and vector of rows.
InfoRporter - the reporter composed of a vector of modules and a header. The reporter exports the function 'PrintData' that goes over all modules\rows\basic elements and prints the data to screen.
I think that I should an Object responsible to receive updates from the threads and update the basic info elements.
The main problem is how to update the info - should I use one mutex for the object or use mutex per basic element?
Also, which object should be a threads - the reporter itself, or the one that received updates from the threads?

I would say that first of all, the Reporter itself should be a thread. It's basic in term of decoupling to isolate the drawing part from the active code (MVC).
The structure itself is of little use here. When you reason in term of Multithread it's not so much the structure as the flow of information that you should check.
Here you have 20 active threads that will update the information, and 1 passive thread that will display it.
The problem here is that you encounter the risk of introducing some delay in the work to be done because the active thread cannot acquire the lock (used for display). Reporting (or logging) should never block (or as little as possible).
I propose to introduce an intermediate structure (and thread), to separate the GUI and the work: a queuing thread.
active threads post event to the queue
the queuing thread update the structure above
the displaying thread shows the current state
You can avoid some synchronization issues by using the same idea that is used for Graphics. Use 2 buffers: the current one (that is displayed by the displaying thread) and the next one (updated by the queuing thread). When the queuing thread has processed a batch of events (up to you to decide what a batch is), it asks to swap the 2 buffers, so that next time the displaying thread will display fresh info.
Note: On a more personal note, I don't like your structure. The working thread has to know exactly where on the screen the element it should update is displayed, this is a clear breach of encapsulation.
Once again, look up MVC.
And since I am neck deep in patterns: look up Observer too ;)

The main problem is how to update the
info - should i use one mutex for the
object or use mutex per basic element?
Put a mutex around the basic unit of update action. If this is an InfoReporterElement object, you'd need a mutex per such object. Otherwise, if a row is updated at a time, by any one of the threads then put the mutex around the row and so on.
Also, which object should be a threads
- the reporter itself, or the one that received updates from the threads?
You can put all of them in separate threads -- multiple writer threads that update the information and one reader thread that reads the value.

You seem to have a pretty good grasp of the basics of concurrency.
My intial thought would be a queue which has a mutex which locks for writes and deletes. If you have the time then I would look at lock-free access.
For you second concern I would have just one reader thread.

A piece of code would be nice to operate on.
Attach a mutex to every InfoReporterElement. As you've written in a comment, not only you need getting and setting element value, but also increment it or probably do another stuff, so what I'd do is make a mutexed member function for every interlocked operation I'd need.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js