kAudioUnitSubType_VoiceProcessingIO cause volume step down - audiounit

I'm using AudioUnit to Play and Record,when i set kAudioUnitSubType_VoiceProcessingIO, the sound is lower than RemoteIO, Why? Who can tell me how to change this issue?

It's lower because the voice processing algorithm and audio filters need some dynamic range, or headroom to turn the volume up and down, or have frequency response peaks above the average level. Thus this processing needs to start with the volume lower so there is room to go up.
The way to change it is to not use Audio Unit voice processing.

The way we got around the low volume of VoiceProcessingIO is to use an additional Compressor Audio Unit and control the gain from there. If you do so, don't forget to disable the AGC property of kAudioUnitSubType_VoiceProcessingIO.

Related

How to measure high-resolution keypress time in C++?

I have a Millikey Response Box with a 1 000 Hz sampling rate and a light sensor with a 10 000 Hz sampling rate. I would like to measure end-to-end response time from the moment of a button press to a change on the screen triggered by the button press in my C++ program. I'm struggling to understand how to do it.
My idea was that, ideally, I would have the button press create a keypress event that holds a timestamp of when it was created at a 1 000 Hz sampling rate. My C++ program would handle this event at its own frequency by recording the high-frequency timestamp and triggering the brightness change on the screen. Next, the light sensor would pick up the change and generate its own keypress event at a sampling rate of 10 000 Hz. At the next cycle, my C++ program would pick up this event and get the actual end-to-end time by comparing the two high-resolution timestamps.
Is this idea sensible? If yes, how can I implement it?
So far I used GLFW to capture keyboard events and to get the timestamp in the key callback, but as far as I understand, that is not exactly the time when the key was pressed.
Trying to do all that with a computer and peripherals is likely going to lead you nowhere, because of the unknown and likely erratic latency in various parts of the chain.
So here is a trick I have used successfully to measure latency. I connected the light sensor to a sound generator. I then recorded the sound of the button being hit together with the sound from the sensor. Recorded at 96Khz, this will give you a very accurate and precise reading. You just have to measure the delay in an audio editor, or I guess you could have a stand alone program to analyse the audio too.
The answer for me was to use LabStreamingLayer. I use App-Input to capture keyboard events, LabRecorder to capture the stream of these events, and then Python importer to parse the resulting XDF file. All the above runs and captures events in the background while the keypress triggers the screen change in my C++ program which is detected by the light sensor.
I'm sure that the other answers and comments make good suggestions for when one would want to implement this on a low level themselves. My question was also not refined since my understanding was limited, so thank you for the contribution anyway!

How to detect camera frame loss using Windows media API like Media Foundation or DirectShow?

I am writing an application for Windows that runs a CUDA accelerated HDR algorithm. I've set up an external image signal processor device that presents as a UVC device, and delivers 60 frames per second to the Windows machine over USB 3.0.
Every "even" frame is a more underexposed frame, and every "odd" frame is a more overexposed frame, which allows my CUDA code perform a modified Mertens exposure fusion algorithm to generate a high quality, high dynamic range image.
Very abstract example of Mertens exposure fusion algorithm here
My only problem is that I don't know how to know when I'm missing frames, since the only camera API I have interfaced with on Windows (Media Foundation) doesn't make it obvious that a frame I grab with IMFSourceReader::ReadSample isn't the frame that was received after the last one I grabbed.
Is there any way that I can guarantee that I am not missing frames, or at least easily and reliably detect when I have, using a Windows available API like Media Foundation or DirectShow?
It wouldn't be such a big deal to miss a frame and then have to purposefully "skip" the next frame in order to grab the next oversampled or undersampled frame to pair with the last frame we grabbed, but I would need to know how many frames were actually missed since a frame was last grabbed.
Thanks!
There is IAMDroppedFrames::GetNumDropped method in DirectShow and chances are that it can be retrieved through Media Foundation as well (never tried - they are possibly obtainable with a method similar to this).
The GetNumDropped method retrieves the total number of frames that the filter has dropped since it started streaming.
However I would question its reliability. The reason is that with these both APIs, the attribute which is more or less reliable is a time stamp of a frame. Capture devices can flexibly reduce frame rate for a few reasons, including both external like low light conditions and internal like slow blocking processing downstream in the pipeline. This makes it hard to distinguish between odd and even frames, but time stamp remains accurate and you can apply frame rate math to convert to frame indices.
In your scenario I would however rather detect large gaps in frame times to identify possible gap and continuity loss, and from there run algorithm that compares exposure on next a few consecutive frames to get back to sync with under-/overexposition. Sounds like a more reliable way out.
After all this exposure problem is highly likely to be pretty much specific to the hardware you are using.
Normally MFSampleExtension_Discontinuity is here for this. When you use IMFSourceReader::ReadSample, check this.

Gstreamer total latancy

Goal:
Measure the whole Pipeline time thats​ need a frame from the Stream src to the sink. The src is a IP camera and we should detect how long take a frame from the camera to the sink, If the time to high we should Show something in the Display.
Can you guys explain me how is this measurment possible in gstreamer ?
Our gstreamer Applikation is written in c++ some hints or code examples are welcome
Thank you du mucj guys
You can do this with pad probes perhaps:
https://gstreamer.freedesktop.org/documentation/application-development/advanced/pipeline-manipulation.html#using-probes
Depending on your pipeline behavior - you would choose the earliest element that can access reasonable data (not sure what the camera delivers as samples in your case) record the current system's time to the sample's DTS/PTS (frame reordering may be a pitfall here) and do the same thing at the last pad you have access to.
Compare the system's times of a sample with the same PTS/DTS and you should have the time delta the sample spend in the pipeline. Depending on your required accuracy this may be a good enough estimate.

GOP size for realtime video stream

I'm working on a kind of rich remote desktop system, with a video stream of the desktop encoded using avcodec/x264. I have to set manually the GOP size for the stream, and so far I was using a size of fps/2.
But I've just read the following on Wikipedia:
This structure [Group Of Picture# suggests a problem because the fourth frame (a P-frame) is needed in order to predict the second and the third (B-frames). So we need to transmit the P-frame before the B-frames and it will delay the transmission (it will be necessary to keep the P-frame).
It means I'm creating a lot of latency since the client needs to receive at least half of the GOP to output the first frame following the I frame. What is the best strategy for the GOP size if I want the smallest latency possible ? A gop of 1 picture ?
If you want to minimize latency with h264, you should generally avoid b-frames. This way the decoder has at least a chance to emit decoded frames early. This prevents decoder-induced latency.
You may also want to tune the encoder for latency, by reducing/disabling look-ahead. x264 has a "zero-latency" setting which should be a good starting point for finding you optimal settings.
The "GOP" size (which afaik is not really defined for h264; I'll just assume you mean the I(DR)-frame interval) does not necessarily affect the latency. This parameter only affects how long a client will have to wait until it can "sync" on the stream (time-to-first-picture).

C++ socket server to Flash/Adobe AIR?

I am using AIR to do some augmented reality using fudicial marker tracking. I am using FLARToolkit and it works fine, except the frame rate drops to ridiculous lows in certain lighting conditions. This is because Flash only uses the CPU for processing, and every frame it is applying filters, adjusting thresholds, and analyzing the pixels to find the marker pattern. Without any hardware acceleration, it can get really slow.
I did some searching and it looks like the fastest and most stable tracking library is Studierstube ( http://handheldar.icg.tugraz.at/stbtracker.php and http://studierstube.icg.tugraz.at/download.php ). Unfortunately, I am not a C++ developer. But it seems that the tracking is insanely fast using this tracker (especially since it isn't all CPU processing like Flash is).
So my plan is to build (or rather have someone build) a small C++ program that leverages this tracker, and then sends the marker position data every frame (only need 30 FPS) to my Flash client application to display back the video and some augmented reality experiences. I believe this would be done through a socket server or something right? Is this possible and fairly easy for someone who is a decent C++ developer? I would ask him/her but I am in search for such a person.
May this link will be helpful?
http://www.adobe.com/devnet/air/flex/quickstart/articles/interacting_with_native_process.html
As some told here, it'll be done with nativeprocess...