I would like to ask for advice. As I am not very experienced user of C++. I lack a bit knowledge about threads - but I've been doing a lot in Android so I know the general idea.
I need to write 8 *.wav files at once. I have a callback called very often with upcoming signal from 8 input channels. I need to save data to *.wav from each channel (recording). This require me to open file every time I get new data and write additional 256 samples of data at the end of it.
Doing so with 2 inputs is fine, but after 3 or more my input latency is starting to increase. Processor is lagging so probably I have to do it in some kind of thread.
I think it should be quite common problem but I haven't learn yet how to handle it. Can someone explain me right way to do it? Is it necessary to use it http://www.cplusplus.com/reference/thread/thread/ or are there any other simple/elegant patterns.
You need to record, or save, the data from 8 input channels.
I highly recommend at least 8 large buffers to contain the data.
When there is a pause in the input or in the background, you can write the data to the files. Wait for a large amount of data to collect before writing the buffers to the files. The file I/O loves to process large blocks of data rather than many small ones. You can always flush the output streams which tells the OS to write to the file.
If you want to play with threads, I recommend at least three.
Thread 1 -- read input channels and store in buffers.
Thread 2 -- GUI
Thread 3 -- Writes buffers to files.
Remember that thread 1 is the highest priority. When it sees a low amount of space remaining in the buffer, it should wake up thread 3 to write out the buffers to the files.
You should maintain at least 2 buffers for each input channel. This is called double buffering and allows thread 3 to write the buffer to file while thread 1 is reading the input channel data into another buffer.
Related
Racket has both a notion of pipes and channels.
In the case of a pipe (created with make-pipe), any data written to the output port, can be read from the associated input port.
Channels are similar, but with one major difference: writing something to the input blocks until the output is simultaneously read. This is particularly useful for concurrency as it can be used for inter-thread communication and synchronization.
Racket also have a notion of asynchronous channels. These are similar to plain channels, but additionally have a buffer. If data is written to the buffer and it is not full, than the writing thread continues. The reading thread will block if the queue is empty, but otherwise it can read the latest data and continue on.
The question is, what is the difference between a pipe and an asynchronous channel? Clearly asynchronous channels were created with threads in mind, while pipes are independent of threading. But both APIs seem to serve a near identical purpose:
Provide a (possibly infinite) buffer where some producer can put input.
Provide an output for some consumer to get the data on the buffer.
Allow the consumer to wait until data is available.
Allow the producer to place input and continue execution.
The main difference between the two seems to be with the items placed in each. Pipes seem designed to mostly handle text (and bytes), and set their size accordingly. Where as channels handle items placed in the queue, rather than the size of those items themselves.
For example, a buffer size of '2' could hold a string with 2 bytes in it, while an asynchronous channel with a buffer size of '2' can hold 2 items, however large those items are.
This would lead one to think that maybe pipes are only used for text, where channels are more general. However, non-textual items can still be written to pipes, as shown with make-pipe-with-specials.
So, what is the different uses between asynchronous channels and pipes?
Pipes are ports, and so carry bytes. Channels carry arbitrary values.
Certainly you can write some nontrivial value to a pipe and read it back on the other side. But fundamentally it's being converted into bytes, sent through the pipe, and turned back into a value on the other end. Channels skip that.
This means you can send values through channels that would not survive the trip through write and read and so can't be sent through a pipe. Unreadable values like structures that have a custom-write procedure and, well, ports!
This is my understanding. I haven't read the code. Just the documentation. I learned from the links you gave as well as the documentation for print-unreadable and The Printer.
Thanks for reading my post.
I have a problem with multithreading an opencv application I was hoping you guys could help me out with.
My aim is to Save 400 frames (in jpeg) from the middle of a video sequence for further examination.
I have the code running fine single threaded, but the multithreading is causing quite a lot of issues so I’m wondering if I have got the philosophy all wrong.
In terms of a schematic of what I should do, would I be best to:
Option 1 : somehow simultaneously access the single video file (or make copies?), then with individual threads cycling through the video frame by frame, save each frame when it is between predetermined limits? E.g. thread 1 saves frames 50 to 100, thread 2 saves frames 101 to 150 etc.
Option 2 : open the file once, cycle through frame by frame then pass an individual frame to a series of unique threads to carry out a saving operation. E.g. frame 1 passed to thread 1 for saving, frame 2 to thread 2 for saving, frame 3 to thread 1, frame 4 to thread 2 etc etc.
Option 3 : some other buffer/thread arrangement which is a better idea than the above!
I'm using visual C++ with the standard libs.
Many thanks for your help on this,
Cheers, Kay
Option 1 is what i have tried to do this far, but because of the errors, i was wondering if it was even possible to do this! Can threads usually access the same file? how do I find out how many threads i can have?
Certainly, different threads can access the same file, but it's really a question if the supporting libraries support that. For reading a video stream, you can use either OpenCV or ffmpeg (you can use both in the same app, ffmpeg for reading and OpenCV for processing, for example). Haven't looked at the docs, so I'm guessing here: either lib should allow multiple readers on the same file.
To find out the number of cores:
SYSTEM_INFO sysinfo;
GetSystemInfo( &sysinfo );
numCPU = sysinfo.dwNumberOfProcessors;
from this post . You would create one thread / core as a starting point, then change the number based on your performance needs and on actual testing.
I am working on a problem where I want to replayed data stored in a file at a specified rate.
For Eg: 25,000 records/second.
The file is in ascii format. Currently, I read each line of the file and apply a regex to
extract the data. 2- 4 lines make up a record. I timed this operation and it takes close to
15 microseconds for generating each record.
The time taken to publish each record is 6 microseconds.
If I perform the reading and writing sequentially, then I would end up with 21 microseconds to publish each record. So effectively, this means my upper bound is ~47K records per second.
If I decide to multi thread the reading and writing then I will be able to send out a packet every 9 microsecond ( neglecting the locking penalty since reader and writer share the same Q ) which gives a throughput of 110K ticks per second.
Is my previous design correct ?
What kind of Queue and locking construct has minimum penalty when a single producer and consumer share a queue ?
If I would like to scale beyond this what's the best approach ?
My application is in C++
If it takes 15uS to read/prepare a record then your maximum throughput will be about 1sec/15uSec = 67k/sec. You can ignore the 6uSec part as the single thread reading the file cannot generate more records than that. (try it, change the program to only read/process and discard the output) not sure how you got 9uS.
To make this fly beyond 67k/sec ...
A) estimate the maximum records per second you can read from the disk to be formatted. While this depends on hardware a lot, a figure of 20Mb/sec is typical for an average laptop. This number will give you the upper bound to aim for, and as you get close you can ease off trying.
B) create a single thread just to read the file and incur the IO delay. This thread should write to large preallocated buffers, say 4Mb each. See http://en.wikipedia.org/wiki/Circular_buffer for a way of managing these. You are looking to hold maybe 1000 records per buffer (guess, but not just 8 ish records!) pseudo code:
while not EOF
Allocate big buffer
While not EOF and not buffer full
Read file using fgets() or whatever
Apply only very small preprocessing, ideally none
Save into buffer
Release buffer for other threads
C) create another thread ( or several if the order of records is not important) to process a ring buffer when it is full, your regex step. This thread in turn writes to another set of output ring buffers (tip, keep the ring buffer control structures apart in memory)
While run-program
Wait/get an input buffer to process, semaphores/mutex/whatever you prefer
Allocate output buffer
Process records from input buffer,
Place result in output buffer
Release output buffer for next thread
Release input buffer for reading thread
D) create you final thread to consume the data. It isn't clear if this output is being written to disk or network, so this might affect the disk reading thread.
Wait/get input buffer from processed records pool
Output records to wherever
Return buffer to processed records pool
Notes.
Preallocate all buffers and pass them back to where they came from. Eg you might have 4 buffers between file reading thread and processing threads, when all 4 are infuse, the file reader waits for one to be free, it doesn't just allocate new buffers.
Try not to memset() buffers if you can avoid it, waste of memory bandwidth.
You won't need many buffers, 6? Per ring buffer?
The system will auto tune to slowest thread ( http://en.wikipedia.org/wiki/Theory_of_constraints ) so if you can read and prepare data faster than you want to output it, all the buffers will fill up and everything will pause except the output.
As the threads are passing reasonable amounts of data each sync point, the overhead of this will not matter too much.
The above design is how some of my code reads CSV files as quick as possible, basically it all comes to to input IO bandwidth as limiting factor.
I am working on a project where we can have input data stream with 100 Mbps.
My program can be used overnight for capturing these data and thus will generate huge data file. My program logic which interpret these data is complex and can process only 1 Mb data per second.
We also dump the bytes to some log file after getting processed. We do not want to loose any incoming data and at the same time want my program to work in real time.So; we are maintaining a circular buffer which acts like a cache.
Right now only way to save incoming data from getting lost is to increase size of this buffer.
Please suggest better way to do this and also what are the alternate way of caching I can try?
Stream the input to a file. Really, there is no other choice. It comes in faster than you can process it.
You could create one file per second of input data. That way you can directly start processing old files while new files are being streamed on the disk.
I am writing an application needs to use large audio multi-samples, usually around 50 mb in size. One file contains approximately 80 individual short sound recordings, which can get played back by my application at any time. For this reason all the audio data gets loaded into memory for quick access.
However, when loading one of these files, it can take many seconds to put into memory, meaning my program if temporarily frozen. What is a good way to avoid this happening? It must be compatible with Windows and OS X. It freezes at this : myMultiSampleClass->open(); which has to do a lot of dynamic memory allocation and reading from the file using ifstream.
I have thought of two possible options:
Open the file and load it into memory in another thread so my application process does not freeze. I have looked into the Boost library to do this but need to do quite a lot of reading before I am ready to implement. All I would need to do is call the open() function in the thread then destroy the thread afterwards.
Come up with a scheme to make sure I don't load the entire file into memory at any one time, I just load on the fly so to speak. The problem is any sample could be triggered at any time. I know some other software has this kind of system in place but I'm not sure how it works. It depends a lot on individual computer specifications, it could work great on my computer but someone with a slow HDD/Memory could get very bad results. One idea I had was to load x samples of each audio recording into memory, then if I need to play, begin playback of the samples that already exist whilst loading the rest of the audio into memory.
Any ideas or criticisms? Thanks in advance :-)
Use a memory mapped file. Loading time is initially "instant", and the overhead of I/O will be spread over time.
I like solution 1 as a first attempt -- simple & to the point.
If you are under Windows, you can do asynchronous file operations -- what they call OVERLAPPED -- to tell the OS to load a file & let you know when it's ready.
i think the best solution is to load a small chunk or single sample of wave data at a time during playback using asynchronous I/O (as John Dibling mentioned) to a fixed size of playback buffer.
the strategy will be fill the playback buffer first then play (this will add small amount of delay but guarantees continuous playback), while playing the buffer, you can re-fill another playback buffer on different thread (overlapped), at least you need to have two playback buffer, one for playing and one for refill in the background, then switch it in real-time
later you can set how large the playback buffer size based on client PC performance (it will be trade off between memory size and processing power, fastest CPU will require smaller buffer thus lower delay).
You might want to consider a producer-consumer approach. This basically involved reading the sound data into a buffer using one thread, and streaming the data from the buffer to your sound card using another thread.
The data reader is the producer, and streaming the data to the sound card is the consumer. You need high-water and low-water marks so that, if the buffer gets full, the producer stops reading, and if the buffer gets low, the producer starts reading again.
A C++ Producer-Consumer Concurrency Template Library
http://www.bayimage.com/code/pcpaper.html
EDIT: I should add that this sort of thing is tricky. If you are building a sample player, the load on the system varies continuously as a function of which keys are being played, how many sounds are playing at once, how long the duration of each sound is, whether the sustain pedal is being pressed, and other factors such as hard disk speed and buffering, and amount of processor horsepower available. Some programming optimizations that you eventually employ will not be obvious at first glance.