Recording both rendering and recording device - c++

I'm writing a program in C++, on Windows. I need to support Windows Vista+.
I want to record both the microphone and speaker simultaneously.
I'm using the WASAPI and can record the microphone and speaker separately, but I would like to have just one stream supplying me the input from both streams (for example, for recording a client play the guitar along with the music he hears on his headphones), instead of merging the two buffers together somehow (which I guess will lead me to timing issues).
Is there a way to do this?

I'm actually working on a library which can do exactly that, merge streams from multiple devices. You might want to give it a try: see xt-audio.com. If you're implementing this yourself, here's some things to consider:
If you're capturing the speakers through a WASAPI loopback interface you're operating in shared mode, in this case latency might be unacceptable for live performance. If possible stick to exclusive mode and use a loopback cable or hardware loopback device if you have one (e.g. the old fashioned "stereo mix" devices etc).
If you're merging buffers then yes, you're going to have timing issues. This is generally unavoidable when syncing independent devices. Pops/clicks can largely be avoided using a secondary intermediate buffer which introduces additional latency, but eventually you're going to have to pad/drop some samples to keep streams in sync.
Do NOT use separate threads for each independent stream. This will increase context switches and thereby increase the minimum achievable latency. Instead, designate one device as the master device, wait for that device's event to be raised, then read input from all devices whether they are "ready" or not (this is were dropping/padding comes into play).
In general you can get really decent performance from WASAPI exclusive mode, even running multiple streams together. But for something as critical as live performance you might want to consider a pro audio interface with ASIO drivers where everything just ticks off the same clock, or synchronization is at least handled at the driver level.

Related

Tee/passthrough DirectShow data as video source

I have an application that gets video samples from a frame grabber card via DirectShow. The application then does some processing and sends the video signal over a network. I now want to duplicate this video signal such that another DirectShow-enabled software (like Skype) can use the original input signal, too.
I know that you can create Tee filters in DirectShow like the one used to split a video signal for recording and preview. However, as I understand, this filter is only useful within a single graph, ie I cannot use it to forward the video from my process to eg Skype.
I also know that I could write my own video source, but this would run in the process of the consuming application. The problem is that I cannot put the logic of my original application in such a video source filter.
The only solution I could think of is my application writing the frames to a shared memory block and a video source filter reading it from there. Synchronisation would be done using a shared mutex or so. Could that work? I specifically do not like the synchronisation part?
And more importantly, is there a better solution to solve this problem?
The APIs work as you identified: a video capture application, such as Skype, is requesting video stream without interprocess communication in mind, there is no IPC involved to consume output generated in another process. Your challenge here is to provide this IPC yourself so that one application is generating the data, and then another extends existing API (virtual video source device) and picks existing data, then delivers as generated.
With video, you have a relatively big stream of data and you are interested in avoiding its excessive copying. File mappings (AKA shared memory) are the right thing to do: you put bytes in one process and they are immediately visible in another. You can synchronize access to the data using names events and mutexes which both processes use collaboratively - to signal availability of new buffer of data, as indication that used buffer is no longer in use etc.

What is an acceptable MIDI bandwidth over USB?

I am working on a MIDI effect (a VST plugin which modifies incoming MIDI generates new data and forwards it out) using the Juce Framework in C++. I see that it's technically possible to generate a new MIDI message with EVERY sample making my stream of MIDI flow at 16 to 24 bits 41,000 times a second or more. This seems entirely impossible for MIDI hardware to handle.
Is there any guideline or rule I must adhere to when when I decide on my bandwidth for MIDI over USB to hardware synths, new and old?
EDIT:
I should add that for what I am trying to do, higher bandwidth would help, but should work with hardware such as the Arturia Minibrute. I am attempting to do novel things like apply envelopes and LFOs to the modulation and pitch wheel.
MIDI over DIN cables runs at 31250 bits/s, i.e., 3125 bytes/s.
The USB MIDI specification does not specifiy any bandwidth, but the underlying USB bulk transfer protocol implicitly allows the receiving device to decide when to accept new packets.
In other words, the USB MIDI device can decide how fast it runs, but there is no easy mechanism to determine this limit (especially if your OS just drops MIDI messages that the device driver cannot deliver fast enough).
USB/MIDI interfaces run at exactly 3125 bytes/s.
USB MIDI devices where no 'real' MIDI interface is involved might be able to run faster; for example, my SC-8820 can process about 10 KB/s.
In practice, you cannot know what hardware sits behind some generic MIDI port.
You should use the 3125 bytes/s limit unless you have special knowledge about the device.

Getting notification that the serial port is ready to be read from

I have to write a C++ application that reads from the serial port byte by byte. This is an important need as it is receiving messages over radio transmission using modbus and the end of transmission is defined by 3.5 character length duration so I MUST be able to get the message byte by byte. The current system utilises DOS to do this which uses hardware interrupts. We wish to transfer to use Linux as the OS for this software, but we lack expertise in this area. I have tried a number of things to do this - firstly using polling with non-blocking read, using select with very short timeout values, setting the size of the read buffer of the serial port to one byte, and even using a signal handler on SIGIO, but none of these things provide quite what I require. My boss informs me that the DOS application we currently run uses hardware interrupts to get notification when there is something available to read from the serial port and that the hardware is accessible directly. Is there any way that I can get this functionality from a user space Linux application? Could I do this if I wrote a custom driver (despite never having done this before and having close to zero knowledge of how the kernel works) ??. I have heard that Linux is a very popular OS for hardware control and embedded devices so I am guessing that this kind of thing must be possible to do somehow, but I have spent literally weeks on this so far and still have no concrete idea of how best to proceed.
I'm not quite sure how reading byte-by-byte helps you with fractional-character reception, unless it's that there is information encoded in the duration of intervals between characters, so you need to know the timing of when they are received.
At any rate, I do suspect you are going to need to make custom modifications to the serial port kernel driver; that's really not all that bad as a project goes, and you will learn a lot. You will probably also need to change the configuration of the UART "chip" (really just a tiny corner of some larger device) to make it interrupt after only a single byte (ie emulate a 16450) instead of when it's typically 16-byte (emulating at 16550) buffer is partway full. The code of the dos program might actually be a help there. An alternative if the baud rate is not too fast would be to poll the hardware in the kernel or a realtime extension (or if it is really really slow as it might be on an HF radio link, maybe even in userspace)
If I'm right about needing to know the timing of the character reception, another option would be offload the reception to a micro-controller with dual UARTS (or even better, one UART and one USB interface). You could then have the micro watch the serial stream, and output to the PC (either on the other serial port at a much faster baud rate, or on the USB) little packages of data that include one received character and a timestamp - or even have it decode the protocol for you. The nice thing about this is that it would get you operating system independence, and would work on legacy free machines (byte-by-byte access is probably going to fail with an off-the-shelf USB-serial dongle). You can probably even make it out of some cheap eval board, rather than having to manufacture any custom hardware.

usb disk write latency (windows)

I am writing to USB disk from a lowest priority thread, using chunked buffer writing and still, from time to time the system in overall lags on this operation. If I disable writing to disk only, everything works fine. I can't use Windows file operations API calls, only C write. So I thought maybe there is a WinAPI function to turn on/off USB disk write caching which I could use in conjunction with FlushBuffers or similar alternatives? The number of drives for operations is undefined.
Ideally I would like to never be lagging using write call and the caching, if it will be performed transparently is ok too.
EDIT: would _O_SEQUENTIAL flag on write only operations be of any use here?
Try to reduce I/O priority for the thread.
See this article: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686277(v=vs.85).aspx
In particular use THREAD_MODE_BACKGROUND_BEGIN for your IO thread.
Warning: this doesn't work in Windows XP
The thread priority won't affect the delay that happens in the process of writing the media, because it's done in the kernel mode by the file system/disk drivers that don't pay attention to the priority of the calling thread.
You might try to use "T" flag (_O_SHORTLIVED) and flush the buffers at the end of the operation, also try to decrease the buffer size.
There are different types of data transfer for USB, for data there are 3:
1.Bulk Transfer,
2.Isochronous Transfer, and
3.Interrupt Transfer.
Bulk Transfers Provides:
Used to transfer large bursty data.
Error detection via CRC, with guarantee of delivery.
No guarantee of bandwidth or minimum latency.
Stream Pipe - Unidirectional
Full & high speed modes only.
Bulk transfer is good for data that does not require delivery in a guaranteed amount of time The USB host controller gives a lower priority to bulk transfer than the other types of transfer.
Isochronous Transfers Provides:
Guaranteed access to USB bandwidth.
Bounded latency.
Stream Pipe - Unidirectional
Error detection via CRC, but no retry or guarantee of delivery.
Full & high speed modes only.
No data toggling.
Isochronous transfers occur continuously and periodically. They typically contain time sensitive information, such as an audio or video stream. If there were a delay or retry of data in an audio stream, then you would expect some erratic audio containing glitches. The beat may no longer be in sync. However if a packet or frame was dropped every now and again, it is less likely to be noticed by the listener.
Interrupt Transfers Provides:
Guaranteed Latency
Stream Pipe - Unidirectional
Error detection and next period retry.
Interrupt transfers are typically non-periodic, small device "initiated" communication requiring bounded latency. An Interrupt request is queued by the device until the host polls the USB device asking for data.
From the above, it seems that you want a Guaranteed Latency, so you should use Isochronous mode. There are some libraries that you can use like libusb, or you can read more in msdn
To find out what is letting your system hang you first need to drill down to the Windows hang. What was Windows doing while you did experience the hang?
To find this out you can take a kernel dump. How to get and analyze a Kernel Dump read here.
Depending on the findings you get there you then need to decide if there is anything under your control you can do about. Since you are using a third party library to to the writing there is little you can do except to set the IO priority, thread priority on thread or process level. If the library you were given links against a specific CRT you could try to build your own customized version of it to e.g. flush after every write to prevent write combining by the OS to write only data in big chunks back to disc.
Edit1
Your best bet would be to flush the device after every write. This could force the OS to flush any pending data and write the current pending writes to disc without caching the writes up to certain amount.
The second best thing would be to simply wait after each write to give the OS the chance to write pending changes though small back to disc after a certain time interval.
If you are deeper into performance you should try out XPerf which has a nice GUI and shows you even the call stack where your process did hang. The Windows Team and many other teams at MS use this tool to troubleshoot hang experiences. The latest edition with many more features comes with the Windows 8 SDK. But beware that Xperf only works on OS > Vista.

Loading large multi-sample audio files into memory for playback - how to avoid temporary freezing

I am writing an application needs to use large audio multi-samples, usually around 50 mb in size. One file contains approximately 80 individual short sound recordings, which can get played back by my application at any time. For this reason all the audio data gets loaded into memory for quick access.
However, when loading one of these files, it can take many seconds to put into memory, meaning my program if temporarily frozen. What is a good way to avoid this happening? It must be compatible with Windows and OS X. It freezes at this : myMultiSampleClass->open(); which has to do a lot of dynamic memory allocation and reading from the file using ifstream.
I have thought of two possible options:
Open the file and load it into memory in another thread so my application process does not freeze. I have looked into the Boost library to do this but need to do quite a lot of reading before I am ready to implement. All I would need to do is call the open() function in the thread then destroy the thread afterwards.
Come up with a scheme to make sure I don't load the entire file into memory at any one time, I just load on the fly so to speak. The problem is any sample could be triggered at any time. I know some other software has this kind of system in place but I'm not sure how it works. It depends a lot on individual computer specifications, it could work great on my computer but someone with a slow HDD/Memory could get very bad results. One idea I had was to load x samples of each audio recording into memory, then if I need to play, begin playback of the samples that already exist whilst loading the rest of the audio into memory.
Any ideas or criticisms? Thanks in advance :-)
Use a memory mapped file. Loading time is initially "instant", and the overhead of I/O will be spread over time.
I like solution 1 as a first attempt -- simple & to the point.
If you are under Windows, you can do asynchronous file operations -- what they call OVERLAPPED -- to tell the OS to load a file & let you know when it's ready.
i think the best solution is to load a small chunk or single sample of wave data at a time during playback using asynchronous I/O (as John Dibling mentioned) to a fixed size of playback buffer.
the strategy will be fill the playback buffer first then play (this will add small amount of delay but guarantees continuous playback), while playing the buffer, you can re-fill another playback buffer on different thread (overlapped), at least you need to have two playback buffer, one for playing and one for refill in the background, then switch it in real-time
later you can set how large the playback buffer size based on client PC performance (it will be trade off between memory size and processing power, fastest CPU will require smaller buffer thus lower delay).
You might want to consider a producer-consumer approach. This basically involved reading the sound data into a buffer using one thread, and streaming the data from the buffer to your sound card using another thread.
The data reader is the producer, and streaming the data to the sound card is the consumer. You need high-water and low-water marks so that, if the buffer gets full, the producer stops reading, and if the buffer gets low, the producer starts reading again.
A C++ Producer-Consumer Concurrency Template Library
http://www.bayimage.com/code/pcpaper.html
EDIT: I should add that this sort of thing is tricky. If you are building a sample player, the load on the system varies continuously as a function of which keys are being played, how many sounds are playing at once, how long the duration of each sound is, whether the sustain pedal is being pressed, and other factors such as hard disk speed and buffering, and amount of processor horsepower available. Some programming optimizations that you eventually employ will not be obvious at first glance.