iOS waveform generator connected via AUGraph - c++

I have created a simple waveform generator which is connected to an AUGraph. I have reused some sample code from Apple to set AudioStreamBasicDescription like this
void SetCanonical(UInt32 nChannels, bool interleaved)
// note: leaves sample rate untouched
{
mFormatID = kAudioFormatLinearPCM;
int sampleSize = SizeOf32(AudioSampleType);
mFormatFlags = kAudioFormatFlagsCanonical;
mBitsPerChannel = 8 * sampleSize;
mChannelsPerFrame = nChannels;
mFramesPerPacket = 1;
if (interleaved)
mBytesPerPacket = mBytesPerFrame = nChannels * sampleSize;
else {
mBytesPerPacket = mBytesPerFrame = sampleSize;
mFormatFlags |= kAudioFormatFlagIsNonInterleaved;
}
}
In my class I call this function
mClientFormat.SetCanonical(2, true);
mClientFormat.mSampleRate = kSampleRate;
while sample rate is
#define kSampleRate 44100.0f;
The other setting are taken from sample code as well
// output unit
CAComponentDescription output_desc(kAudioUnitType_Output, kAudioUnitSubType_RemoteIO, kAudioUnitManufacturer_Apple);
// iPodEQ unit
CAComponentDescription eq_desc(kAudioUnitType_Effect, kAudioUnitSubType_AUiPodEQ, kAudioUnitManufacturer_Apple);
// multichannel mixer unit
CAComponentDescription mixer_desc(kAudioUnitType_Mixer, kAudioUnitSubType_MultiChannelMixer, kAudioUnitManufacturer_Apple);
Everything works fine, but the problem is that I am not getting stereo sound and my callback function is failing (bad access) when I try to reach the second buffer
Float32 *bufferLeft = (Float32 *)ioData->mBuffers[0].mData;
Float32 *bufferRight = (Float32 *)ioData->mBuffers[1].mData;
// Generate the samples
for (UInt32 frame = 0; frame < inNumberFrames; frame++)
{
switch (generator.soundType) {
case 0: //Sine
bufferLeft[frame] = sinf(thetaLeft) * amplitude;
bufferRight[frame] = sinf(thetaRight) * amplitude;
break;
So it seems I am getting mono instead of stereo. The pointer bufferRight is empty, but don't know why.
Any help will be appreciated.

I can see two possible errors. First, as #invalidname pointed out, recording in stereo probably isn't going to work on a mono device such as the iPhone. Well, it might work, but if it does, you're just going to get back dual-mono stereo streams anyways, so why bother? You might as well configure your stream to work in mono and spare yourself the CPU overhead.
The second problem is probably the source of your sound distortion. Your stream description format flags should be:
kAudioFormatFlagIsSignedInteger |
kAudioFormatFlagsNativeEndian |
kAudioFormatFlagIsPacked
Also, don't forget to set the mReserved flag to 0. While the value of this flag is probably being ignored, it doesn't hurt to explicitly set it to 0 just to make sure.
Edit: Another more general tip for debugging audio on the iPhone -- if you are getting distortion, clipping, or other weird effects, grab the data payload from your phone and look at the recording in a wave editor. Being able to zoom down and look at the individual samples will give you a lot of clues about what's going wrong.
To do this, you need to open up the "Organizer" window, click on your phone, and then expand the little arrow next to your application (in the same place where you would normally uninstall it). Now you will see a little downward pointing arrow, and if you click it, Xcode will copy the data payload from your app to somewhere on your hard drive. If you are dumping your recordings to disk, you'll find the files extracted here.
reference from link

I'm guessing the problem is that you're specifying an interleaved format, but then accessing the buffers as if they were non-interleaved in your IO callback. ioData->mBuffers[1] is invalid because all the data, both left and right channels, is interleaved in ioData->mBuffers[0].mData. Check ioData->mNumberBuffers. My guess is it is set to 1. Also, verify that ioData->mBuffers[0].mNumberChannels is set to 2, which would indicate interleaved data.
Also check out the Core Audio Public Utility classes to help with things like setting up formats. Makes it so much easier. Your code for setting up format could be reduced to one line, and you'd be more confident it is right (though to me your format looks set up correctly - if what you want is interleaved 16-bit int):
CAStreamBasicDescription myFormat(44100.0, 2, CAStreamBasicDescription::kPCMFormatInt16, true)
Apple used to package these classes up in the SDK that was installed with Xcode, but now you need to download them here: https://developer.apple.com/library/mac/samplecode/CoreAudioUtilityClasses/Introduction/Intro.html
Anyway, it looks like the easiest fix for you is to just change the format to non-interleaved. So in your code: mClientFormat.SetCanonical(2, false);

Related

How to copy every N-th byte(s) of a C array

I am writing bit of code in C++ where I want to play a .wav file and perform an FFT (with fftw) on it as it comes (and eventually display that FFT on screen with ncurses). This is mainly just as a "for giggles/to see if I can" project, so I have no restrictions on what I can or can't use aside from wanting to try to keep the result fairly lightweight and cross-platform (I'm doing this on Linux for the moment). I'm also trying to do this "right" and not just hack it together.
I'm using SDL2_audio to achieve the playback, which is working fine. The callback is called at some interval requesting N bytes (seems to be desiredSamples*nChannels). My idea is that at the same time I'm copying the memory from my input buffer to SDL I might as well also copy it in to fftw3's input array to run an FFT on it. Then I can just set ncurses to refresh at whatever rate I'd like separate from the audio callback frequency and it'll just pull the most recent data from the output array.
The catch is that the input file is formatted where the channels are packed together. I.E "(LR) (LR) (LR) ...". So while SDL expects this, I need a way to just get one channel to send to FFTW.
The audio callback format from SDL looks like so:
void myAudioCallback(void* userdata, Uint8* stream, int len) {
SDL_memset(stream, 0, sizeof(stream));
SDL_memcpy(stream, audio_pos, len);
audio_pos += len;
}
where userdata is (currently) unused, stream is the array that SDL wants filled, and len is the length of stream (I.E the number of bytes SDL is looking for).
As far as I know there's no way to get memcpy to just copy every other sample (read: Copy N bytes, skip M, copy N, etc). My current best idea is a brute-force for loop a la...
// pseudocode
for (int i=0; i<len/2; i++) {
fftw_in[i] = audio_pos + 2*i*sizeof(sample)
}
or even more brute force by just reading the file a second time and only taking every other byte or something.
Is there another way to go about accomplishing this, or is one of these my best option? It feels kind of kludgey to go from a nice one line memcpy to send to the data to SDL to some sort of weird loop to send it to fftw.
Very hard OP's solution can be simplified (for copying bytes):
// pseudocode
const char* s = audio_pos;
for (int d = 0; s < audio_pos + len; d++, s += 2*sizeof(sample)) {
fftw_in[d] = *s;
}
If I new what fftw_in is, I would memcpy blocks sizeof(*fftw_in).
Please check assembly generated by #S.M.'s solution.
If the code is not vectorized, I would use intrinsics (depending on your hardware support) like _mm_mask_blend_epi8

AudioBufferList for AUHAL unit whose output stream format is compressed.

ACKNOWLEDGEMENTS
I know this post is quite long, but I try to contextualise as much as possible my issue because I reckon it's quite unique (couldn't find any related issue apart from this one. The final question is at the very end of the post and here's the complete code.
First of all, a bit of context. I am using CoreAudio and AudioToolbox library, more precisely Audio Units. I am on macOS. My ultimate goal is to record audio from any input device (hence the use of Audio Units over a simple AudioQueueBuffer) and write it to an audio file. I reckon the trickiest part of my program is to convert from LPCM to AAC (in my case) within a single audio unit, hence no use of AUGraph is made.
My program is basically just one Audio Unit, encapsulated in a class, AudioUnit mInputUnit which is an AUHAL unit. Hence, I have followed this this technical note to set it up. Basically, I link the input scope of the input element (since the output element is disabled) to an Audio Device, i.e. my built-in microphone.
Then I update the AudioFormat of the output scope of the unit accordingly.
...
inputStream.mFormatID = kAudioFormatMPEG4AAC;
inputStream.mFormatFlags = 0;
inputStream.mBitsPerChannel = 0;
checkError(
AudioUnitSetProperty(
mInputUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
1,
&inputStream,
propertySize
),
"Couldn't set output stream format."
);
Therefore, at this point, the Audio Unit should work as follow:
Record from input device in LPCM [INPUT SCOPE] ==> Convert from LPCM to ==> Render in AAC.
Please note that each stream format (input and output) uses 2 channels. Neither the input nor the output stream has its mFormatFlags set to kAudioFormatIsNonInterleaved, so both of them are interleaved.
In fact, I think this is where the issue comes from but can't see why.
At this point, everything seems to work right. The issue arises when I try to render the audio unit after having set the input callback.
I have found a note that says the following :
“By convention, AUHAL deinterleaves multichannel audio. This means that you set up two AudioBuffers of one channel each instead of setting up one AudioBuffer with mNumberChannels==2. A common cause of paramErr (-50) problems in AudioUnitRender() calls is having AudioBufferLists whose topology (or arrangement of buffers) doesn’t match what the unit is prepared to produce. When dealing at the unit level, you almost always want to do noninterleaved like this.”
Excerpt From: Chris Adamson & Kevin Avila. “Learning Core Audio: A Hands-On Guide to Audio Programming for Mac and iOS.” iBooks.
Hence, I have followed the appropriate code structure to render the audio.
OSStatus Recorder::inputProc(
void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData
)
{
Recorder *This = (Recorder *) inRefCon;
CAStreamBasicDescription outputStream;
This->getStreamBasicDescription(kAudioUnitScope_Output, outputStream);
UInt32 bufferSizeBytes = inNumberFrames * sizeof(Float32);
UInt32 propertySize = offsetof(AudioBufferList, mBuffers[0]) + (sizeof(AudioBuffer) * outputStream.mChannelsPerFrame);
auto bufferList = (AudioBufferList*) malloc(propertySize);
bufferList->mNumberBuffers = outputStream.mChannelsPerFrame;
for(UInt32 i = 0; i < bufferList->mNumberBuffers; ++i)
{
bufferList->mBuffers[i].mNumberChannels = 1;
bufferList->mBuffers[i].mDataByteSize = bufferSizeBytes;
bufferList->mBuffers[i].mData = malloc(bufferSizeBytes);
}
checkError(
AudioUnitRender(
This->mInputUnit,
ioActionFlags,
inTimeStamp,
inBusNumber,
inNumberFrames,
bufferList
),
"Couldn't render audio unit."
);
free(bufferList);
}
And then, when I try to render the audio, I have the following error Error: Couldn't render audio unit. (-50) which is actually the one that should had been fixed by following the note, which confuses me even more.
THE question
At this point, I don't know if this is something to do with my overall architecture, i.e. should I use a AUGraph and add an output unit instead of trying to convert from canonical format to a compressed format WITHIN a single AUHAL unit?
Or is this something to do with the way I pre-allocate my AudioBufferList?
I have managed to fix this issue by redesigning the whole process. To make it short, I still have a unique AUHAL unit, but instead of doing the format conversion within the AUHAL unit, I do it in the render callback, with an Extended Audio File, which takes a source format and a destination format.
The whole challenge is to find the right format description, which is basically just testing different values for mFormatID, mFormatFlags etc...

OpenCV VideoCapture: Howto get specific frame correctly?

I am trying to get at specific frame from a video file using OpenCV 2.4.11.
I have tried to follow the documentation and online tutorials of how to do it correctly and have now tested two approaches:
1) The first method is brute force reading each frame using the video.grab() until I reach the specific frame (timestamp) I want. This method is slow if the specific frame is late in the video sequence!
string videoFile(videoFilename);
VideoCapture video(videoFile);
double videoTimestamp = video.get(CV_CAP_PROP_POS_MSEC);
int videoFrameNumber = static_cast<int>(video.get(CV_CAP_PROP_POS_FRAMES));
while (videoTimestamp < targetTimestamp)
{
videoTimestamp = video.get(CV_CAP_PROP_POS_MSEC);
videoFrameNumber = static_cast<int>(video.get(CV_CAP_PROP_POS_FRAMES));
// Grabe frame (but don't decode the frame as we are only "Fast forwarding")
video.grab();
}
// Get and save frame
if (video.retrieve(frame))
{
char txtBuffer[100];
sprintf(txtBuffer, "Video1Frame_Target_%f_TS_%f_FN_%d.png", targetTimestamp, videoTimestamp, videoFrameNumber);
string imgName = txtBuffer;
imwrite(imgName, frame);
}
2) The second method I uses the video.set(...). This method is faster and doesn't seem to be any slower if the specific frame is late in the video sequence.
string videoFile(videoFilename);
VideoCapture video2(videoFile);
videoTimestamp = video2.get(CV_CAP_PROP_POS_MSEC);
videoFrameNumber = static_cast<int>(video2.get(CV_CAP_PROP_POS_FRAMES));
video2.set(CV_CAP_PROP_POS_MSEC, targetTimestamp);
while (videoTimestamp < targetTimestamp)
{
videoTimestamp = video2.get(CV_CAP_PROP_POS_MSEC);
videoFrameNumber = (int)video2.get(CV_CAP_PROP_POS_FRAMES);
// Grabe frame (but don't decode the frame as we are only "Fast forwarding")
video2.grab();
}
// Get and save frame
if (video2.retrieve(frame))
{
char txtBuffer[100];
sprintf(txtBuffer, "Video2Frame_Target_%f_TS_%f_FN_%d.png", targetTimestamp, videoTimestamp, videoFrameNumber);
string imgName = txtBuffer;
imwrite(imgName, frame);
}
Problem) Now the issue is that using the two methods does end up with the same frame number of the content of the target image frame is not equal?!?
I am tempted to conclude that Method 1 is the correct one and there is something wrong with the OpenCV video.set(...) method. But if I use the VLC player finding the approximate target frame position it is actually Method 2 that is closest to a "correct" result?
As some extra info: I have tested the same video sequence but in two different video files being encoded with respectively 'avc1' MPG4 and 'wmv3' WMV codec.
Using the WMV file the two found frames are way off?
Using the MPG4 file the two found frames are only slightly off?
Is there anybody having some experience with this, can explain my findings and tell me the correct way to get a specific frame from a video file?
Obviously there's still a bug in opencv/ ffmpeg.
ffmpeg doesn't deliver the frames that are wanted and/or opencv doesn't handles this. See here and here.
[Edit:
Until that bug is fixed (either in ffmpeg or (as a work-around in opencv)) the only way to get exact frame by number is to "fast forward" as you did.
(Concerning VLC-player: I suspect that it uses that buggy set ()-interface. As for a player it is usually not too important to seek frame-exact. But for an editor it is).]
I think that OpenCV uses FFmpeg for video decoding.
We once had a similar problem but used FFmpeg directly. By default, random (but exact) frame access isn't guaranteed. The WMV decoder was particularly fuzzy.
Newer versions of FFmpeg allow you access to lower-level routines which can be used to build a retrieval function for frames. This solution was a little involved and nothing I can remember off my head right now. I try to find some more details later.
As a quick work-around, I would suggest to decode your videos off-line and then work on sequences off images. Though, this increases the amount of storage needed, it guarantees exact random frame access. You can use FFmpeg to convert your video file in to a sequence of images like this:
ffmpeg -i "input.mov" -an -f image2 "output_%05d.png"

Saving output frame as an image file CUDA decoder

I am trying to save the decoded image file back as a BMP image using the code in CUDA Decoder project.
if (g_bReadback && g_ReadbackSID)
{
CUresult result = cuMemcpyDtoHAsync(g_bFrameData[active_field], pDecodedFrame[active_field], (nDecodedPitch * nHeight * 3 / 2), g_ReadbackSID);
long padded_size = (nWidth * nHeight * 3 );
CString output_file;
output_file.Format(_T("image/sample_45.BMP"));
SaveBMP(g_bFrameData[active_field],nWidth,nHeight,padded_size,output_file );
if (result != CUDA_SUCCESS)
{
printf("cuMemAllocHost returned %d\n", (int)result);
}
}
But the saved image looks like this
Can anybody help me out here what am i doing wrong .. Thank you.
After investigating further, there were several modifications I made to your approach.
pDecodedFrame is actually in some non-RGB format, I think it is NV12 format which I believe is a particular YUV variant.
pDecodedFrame gets converted to an RGB format on the GPU using a particular CUDA kernel
the target buffer for this conversion will either be a surface provided by OpenGL if g_bUseInterop is specified, or else an ordinary region allocated by the driver API version of cudaMalloc if interop is not specified.
The target buffer mentioned above is pInteropFrame (even in the non-interop case). So to make an example for you, for simplicity I chose to only use the non-interop case, because it's much easier to grab the RGB buffer (pInteropFrame) in that case.
The method here copies pInteropFrame back to the host, after it has been populated with the appropriate RGB image by cudaPostProcessFrame. There is also a routine to save the image as a bitmap file. All of my modifications are delineated with comments that include RMC so search for that if you want to find all the changes/additions I made.
To use, drop this file in the cudaDecodeGL project as a replacement for the videoDecodeGL.cpp source file. Then rebuild the project. Then run the executable normally to display the video. To capture a specific frame, run the executable with the nointerop command-line switch, eg. cudaDecodGL nointerop and the video will not display, but the decode operation and frame capture will take place, and the frame will be saved in a framecap.bmp file. If you want to change the specific frame number that is captured, modify the g_FrameCapSelect = 37; variable to some other number besides 37, and recompile.
Here is the replacement for videoDecodeGL.cpp I used pastebin because SO has a limit on the number of characters that can be entered in a question body.
Note that my approach is independent of whether readback is specified. I would recommend not using readback for this sequence.

Converting PDF to JPG like Photoshop quality - Commercial C++ / Delphi library

For the implementation of a Windows based page-flip application I need to be able to convert a large number of PDF pages into good quality JPG, not just thumbnails.
The aim is to achieve the best quality / file size for that, similar to Photoshops Save for Web does that.
Currently Im using Datalogics Adobe PDF Library SDK, which does not seem to be able to fullfil that task. I am thus looking for an alternative commcerical C++ or Delphi library which provides a good qualtiy / size / speed.
After doing some search here, I noticed that most posts are about GS & Imagekick, which I have also tested, but I am not satisfied with the output and the speed.
The target is to import the PDFs with 300dpi and convert them with JPG quality 50, 1500px height and an ouput size of 300-500kb.
If anyone could point out a good library for that task, I would be most greatful.
The Gnostice PDFtoolKit VCL may be a candidate. Convert to JPEG is one of the options.
I always recommend Graphics32 for all your image manipulation needs; you have several resamplers to choose. However, I don't think it can read PDF files as images. But if you can generate the big image yourself it may be a good choice.
Atalasoft DotImage (with the PDF rasterizer add-on) will do that (I work on PDF technologies there). You'd be working in C# (or another .NET) language:
ConvertToJpegs(string outfileStem, Stream pdf)
{
JpegEncoder encoder = new JpegEncoder();
encoder.Quality = 50;
int page = 1;
PdfImageSource source = new PdfImageSource(pdf);
source.Resolution = 300; // sets the rendering resolution to 200 dpi
// larger numbers means better resolution in the image, but will cost in
// terms of output file size - as resolution increases, memory used increases
// as a function of the square of the resolution, whereas compression only
// saves maybe a flat 30% of the total image size, depending on the Quality
// setting on the encoder.
while (source.HasMoreImages()) {
AtalaImage image = source.AcquireNext();
// this image will be in either 8 bit gray or 24 bit rgb depending
// on the page contents.
try {
string path = String.Format("{0}{1}.jpg", outFileStem, page++);
// if you need to resample the image, this is the place to do it
image.Save(path, encoder, null);
}
finally {
source.Release(image);
}
}
}
There is also Quick PDF Library
Have a look at DynaPDF. I know its pretty expensive but you can try the starter pack.
P.S.:before buying a product please make sure it meets your needs.