Visualize stream from a PMD Camboard Nano in Qt - c++

EDIT: honest recommendation
If you want to stream from a PMD in realtime, use C#. Any UI is simple to create and there os quite a mighty library, MetriCam by Metrilus AG, which supports streaming for a variety of 3D-Cameras. I am able to get stable 45 fps with that.
ORIGINAL:
I've been trying to get depth information from a PMD camboard nano and visualize it in a GUI. The Information is delivered as a 165x120 float array.
As I also want to use the Data for analysis purpose (image quality, white noise etc.), I need to grab the frames at a specific framerate. The problem is, that the SDK which PMD delivers with its camera (for MATLAB & C) only provides the possibility to grab single frames by calling
pmdUpdate(hnd);
so the framerate is dependent on how often you poll the image data.
I initially tried to do the analysis in MATLAB, but I couldn't get more than 30 fps out of the camera and adding some further code to the loop made it impossible to work with (I need at least reliable 25 fps).
I then switched to C, where I got rates of up to 70 fps, but could not visualize the data.
Then I tried it with Qt, which is based on C/C++ - it should therefore be fast polling the image data - and where I could easily include the libraries of the PMDSDK. As I am new to Qt, though, I do not know much about the UI-Elements.
So my question:
Is there any performant way to visualize a 2D-float-array on a Qt-GUI? If not, how about anything useful in Visual Studio with C++?
(I know that drawing every pixel one by one on a QGraphicsView is dumb, but I tried it, and I get a whopping framerate of .4 fps...)
Thanks for any helpful suggestions!
Jannik

The QImage Class actually has a constructor that accepts a uchar pointer/array. You only need to map my float values to RGB values in uchar-format.
pmdGetDistances(hnd, dist, dd.img.numColumns*dd.img.numRows*sizeof(float));
uchar *imagemap = new uchar[dd.img.numColumns*dd.img.numRows*3];
int i,j;
for (i = 0; i < 165; i++){
for (j = 0; j < 120; j++){
uchar value = (uchar)std::floor(40*dist[j*165+i]);
if(value > 255 || value < 0){
value = 0;
}
//colorscaling integrated
imagemap[3*(j*165+i)] = floor((255-value)*(255-value)/255.0);
imagemap[3*(j*165+i)+1] = abs(floor((value-127)/1.5));
imagemap[3*(j*165+i)+2] = floor(value*value/255.0);
}
}
The QImage can then be converted to Pixmap and displayed in the QGraphicsView. This worked for me, but the framerate seems not really stable.
QImage image(imagemap, 165, 120, 165*3, QImage::Format_RGB888);
QPixmap pmap(QPixmap::fromImage(image));
scene->addPixmap(pmap.scaled(165,120));
ui->viewCamera->update();
It could be worth a try to send the Thread sleeping until the desired time is elapsed.QThread::msleep(msec);

Related

Audio Visualizer from wav looks wrong

I'm having trouble making an audio visualizer look accurate. The bins that have a significant amount of sound tend to draw correctly, but the problem I'm having is that all the frequencies with no significant sound seem to be coming back with a value that usually bounces between -60dB and -40dB. This forms a flat bouncing line (usually in the higher freqencies).
I want to display 512 bins or less at 30 frames per second. I've been reading up on FFT and audio non stop for a couple weeks now, and so far my process has been:
Load pcm data from wav file. This comes in as 44100 samples per second that have a range of -/+ 32767. I'm assuming I treat these as real numbers when passing them to the FFT.
Divide these samples up into 1470 per frame. (446 are ignored)
Take 1024 samples and apply a Hann window.
Pass the samples to FFT as an array of real[1024] as well as another array of the same size filled with zeros for the imaginary part.
Get the magnitude by looping through the (samples/2) bins and do a sqrt(real[i]*real[i] + img[i]*img[i]).
Taking 20 * log(magnitude) to get the decibel level of each bin
Draw a rectangle for each bin. Draw these bins for each frame.
I've tested it with a couple songs, and a wav file I generated that just plays a tone at 440Hz. With the wav file, I do get a spike at the 440 bin, but all the other bins form a line that isn't much shorter than the 440 bin. Also every other frame, the bins apart from 440 look like a graphed log function with a dip on some other bin.
I'm writing this in c++. Using STK to only load left channel from the audio file:
//put every sample in the song into a temporary vector
for (int i = 0; i < stkObject->getSize(); i++)
{
standardVector.push_back(stkObject->tick(LEFT));
}
I'm using FFTReal to perform the FFT:
std::vector<std::vector <double> > leftChannelData;
int numberOfFrames = stkObject->getSize()/samplesPerFrame;
leftChannelData.resize(numberOfFrames);
for(int i = 0; i < numberOfFrames; i++)
{
for(int j = 0; j < FFT_SAMPLE_LENGTH; j++)
{
real[j] = standardVector[j + (i*samplesPerFrame)];
}
applyHannWindow(real, FFT_SAMPLE_LENGTH);
fft_object.do_fft(imaginary,real);
//FFTReal instructions say to run this after an fft
fft_object.rescale(real);
leftChannelData[i].resize(FFT_SAMPLE_LENGTH/2);
for (int j = 0; j < FFT_SAMPLE_LENGTH/2; j++)
{
double magnitude = sqrt(real[j]*real[j] + imaginary[j]*imaginary[j]);
double dbValue = 20 * log(magnitude/maxMagnitude);
leftChannelData[i].at(j) = dbValue;
}
}
I'm at a loss as to what's causing this. I've tried various ways to pull those 446 samples I'm ignoring, but the results don't seem to change. I think I may be doing something fundamentally wrong. I've tried normalizing the pcm data before handing it to the fft and I've tried normalizing the magnitude before finding the decibels, but it doesn't seem to be working. Any thoughts?
EDIT: I don't see any difference between log(magnitude) and log(magnitude/maxMagnitude). All it seems to do is shift all of the bin's values evenly downwards.
EDIT2:
Here's a what they look like to get a visual:
Song playing low sounds - with log(mag)
Song playing low sounds - same but with log(mag/maxMag)
Again, log(mag) and log(mag/maxMag) generally look the same, but with values spanning in the negative. Like MSalters said, the decibel can approach -infinite, so I can clamp those values to -100dB. Then take log(mag/maxMag) and add 100. That way the rectangle's height range from 0 to 100 instead of -100 to 0.
Is this what I should do? I've tried this, but it still looks wrong. Maybe it's just a scaling issue? When I do this, a lot of the bars don't make it above the line when it sounds like they should. And if they do make it above 0, they do so just barely.
You have to understand that you're not taking the Fourier Transform of an infinite signal, but the FT of an windowed version thereof. And your window isn't even a plain Hann window. Discarding 446 points is effectively a rectangular window function. The FT of the window functions will both show up in your output.
Secondly, the dB scale is logarithmic. That indeed means it can go quite low in the absence of a signal. You mention -60 dB, but it in fact could hit minus infinity. The only thing that would save you from that is the window function, which will introduce smear at about -110 dB.
The noise (stop band ripple) produced by a quantized Von Hann window of length 1024 could well be around -40 to -60 dB. So one strategy is to just set a threshold, and ignore (don't plot) all values below that threshold.
Also, try removing the rescale(real) function, as that could distort your complex vector before you take the log magnitude.
Also, make sure you are actually loading the audio samples into your real vector correctly (sign, number of bits and endianess).

plotting real time Data on (qwt )Oscillocope

I'm trying to create a program, using Qt (c++), which can record audio from my microphone using QAudioinput and QIODevice.
Now, I want to visualize my signal
Any help would be appreciated. Thanks
[Edit1] - copied from your comment (by Spektre)
I Have only one Buffer for both channel
I use Qt , the value of channel are interlaced on buffer
this is how I separate values
for ( int i = 0, j = 0; i < countSamples ; ++j)
{
YVectorRight[j]=Samples[i++];
YVectorLeft[j] =Samples[i++];
}
after I plot YvectorRight and YvectorLeft. I don't see how to trigger only one channel
hehe done this few years back for students during class. I hope you know how oscilloscopes works so here are just the basics:
timebase
fsmpl is input signal sampling frequency [Hz]
Try to use as big as possible (44100,48000, ???) so the max frequency detected is then fsmpl/2 this gives you the top of your timebase axis. The low limit is given by your buffer length
draw
Create function that will render your sampling buffer from specified start address (inside buffer) with:
Y-scale ... amplitude setting
Y-offset ... Vertical beam position
X-offset ... Time shift or horizontal position
This can be done by modification of start address or by just X-offsetting the curve
Level
Create function which will emulate Level functionality. So search buffer from start address and stop if amplitude cross Level. You can have more modes but these are basics you should implement:
amplitude: ( < lvl ) -> ( > lvl )
amplitude: ( > lvl ) -> ( < lvl )
There are many other possibilities for level like glitch,relative edge,...
Preview
You can put all this together for example like this: you have start address variable so sample data to some buffer continuously and on timer call level with start address (and update it). Then call draw with new start address and add timebase period to start address (of course in term of your samples)
multichannel
I use Line IN so I have stereo input (A,B = left,right) therefore I can add some other stuff like:
Level source (A,B,none)
render mode (timebase,Chebyshev (Lissajous curve if closed))
Chebyshev = x axis is A, y axis is B this creates famous Chebyshev images which are good for dependent sinusoidal signals. Usually forming circles,ellipses,distorted loops ...
miscel stuff
You can add filters for channels emulating capacitance or grounding of input and much more
GUI
You need many settings I prefer analog knobs instead of buttons/scrollbars/sliders just like on real Oscilloscope
(semi)Analog values: Amplitude,TimeBase,Level,X-offset,Y-offset
discrete values: level mode(/,),level source(A,B,-),each channel (direct on,ground,off,capacity on)
Here are some screenshots of my oscilloscope:
Here is screenshot of my generator:
And finally after adding some FFT also Spectrum Analyser
PS.
I started with DirectSound but it sucks a lot because of buggy/non-functional buffer callbacks
I use WinAPI WaveIn/Out for all sound in my Apps now. After few quirks with it, is the best for my needs and has the best latency (Directsound is too slow more than 10 times) but for oscilloscope it has no merit (I need low latency mostly for emulators)
Btw. I have these three apps as linkable C++ subwindow classes (Borland)
and last used with my ATMega168 emulator for my sensor-less BLDC driver debugging
here you can try my Oscilloscope,generator and Spectrum analyser If you are confused with download read the comments below this post btw password is: "oscill"
Hope it helps if you need help with anything just comment me
[Edit1] trigger
You trigger all channels at once but the trigger condition is checked usually just from one Now the implementation is simple for example let the trigger condition be the A(left) channel rise above level so:
first make continuous playback with no trigger you wrote it is like this:
for ( int i = 0, j = 0; i < countSamples ; ++j)
{
YVectorRight[j]=Samples[i++];
YVectorLeft[j] =Samples[i++];
}
// here draw or FFT,draw buffers YVectorRight,YVectorLeft
Add trigger
To add trigger condition you just find sample that meets it and start drawing from it so you change it to something like this
// static or global variables
static int i0=0; // actual start for drawing
static bool _copy_data=true; // flag that new samples need to be copied
static int level=35; // trigger level value datatype should be the same as your samples...
int i,j;
for (;;)
{
// copy new samples to buffer if needed
if (_copy_data)
for (_copy_data=false,i=0,j=0;i<countSamples;++j)
{
YVectorRight[j]=Samples[i++];
YVectorLeft[j] =Samples[i++];
}
// now search for new start
for (i=i0+1;i<countSamples>>1;i++)
if (YVectorLeft[i-1]<level) // lower then level before i
if (YVectorLeft[i]>=level) // higher then level after i
{
i0=i;
break;
}
if (i0>=(countSamples>>1)-view_samples) { i0=0; _copy_data=true; continue; }
break;
}
// here draw or FFT,draw buffers YVectorRight,YVectorLeft from i0 position
the view_samples is the viewed/processed size of data (for one or more screens) it should be few times less then the (countSamples>>1)
this code can loose one screen on the border area to avoid that you need to implement cyclic buffers (rings) but for starters is even this OK
just encode all trigger conditions through some if's or switch statement

OpenCV Image Manipulation

I am trying to find out the difference in 2 images.
Scenario: Suppose that i have 2 images, one of a background and the other of a person in front of the background, I want to subtract the two images in such a way that I get the position of the person, that is the program can detect where the person was standing and give the subtracted image as the output.
The code that I have managed to come up with is taking two images from the camera and re-sizing them and is converting both the images to gray scale. I wanted to know what to do after this. I checked the subtract function provided by OpenCV but it takes arrays as inputs so I don't know how to progress.
The code that I have written is:
cap>>frame; //gets the first image
cv::cvtColor(frame,frame,CV_RGB2GRAY); //converts it to gray scale
cv::resize(frame,frame,Size(30,30)); //re-sizes it
cap>>frame2;//gets the second image
cv::cvtColor(frame2,frame2,CV_RGB2GRAY); //converts it to gray scale
cv::resize(frame2,frame2,Size(30,30)); //re-sizes it
Now do I simply use the subtract function like:
cv::subtract(frame_gray,frame,frame);
or do I apply some filters first and then use the subtract function?
As others have noticed, it's a tricky problem: easy to come up with a hack that will work sometimes, hard to come up with a solution that will work most of the time with minimal human intervention. Also, much easier to do if you can control tightly the material and illumination of the background. The professional applications are variously known as "chromakeying" (esp. in the TV industry), "bluescreening", "matting" or "traveling matte" (in cinematography), "background removal" in computer vision.
The groundbreaking work for matting quasi-uniform backdrops was done by Petro Vlahos many years ago. The patents on its basic algorithms have already expired, so you can go to town with them (and find open source implementations of various quality). Needless to say, IANAL, so do your homework on the patent subject.
Matting out more complex backgrounds is still an active research area, especially for the case when no 3D information is available. You may want to look into a few research papers that have come out of MS Research in the semi-recent past (A. Criminisi did some work in that area).
Using the subtract would not be appropriate because, it might result in some values becoming negative and will work only if you are trying to see if there is a difference or not( a boolean true/false).
If you need to get the pixels where it is differing, you should do a pixel by pixel comparison - something like:
int rows = frame.rows;
int cols = frame.cols;
cv::Mat diffImage = cv::Mat::zeros(rows, cols, CV_8UC1);
for(int i = 0; i < rows; ++i)
{
for(int j = 0; j < cols; ++j)
{
if(frame.at<uchar>(i,j) != frame2.at<uchar>(i,j))
diffImage.at<uchar>(i, j) = 255;
}
}
now, you can either show or save diffImage. All pixels that differ will be white while the similar ones will be in black

Draw sound wave with possibility to zoom in/out

I'm writing a sound editor for my graduation. I'm using BASS to extract samples from MP3, WAV, OGG etc files and add DSP effects like echo, flanger etc. Simply speaching I made my framework that apply an effect from position1 to position2, cut/paste management.
Now my problem is that I want to create a control similar with this one from Cool Edit Pro that draw a wave form representation of the song and have the ability to zoom in/out select portions of the wave form etc. After a selection i can do something like:
TInterval EditZone = WaveForm->GetSelection();
where TInterval have this form:
struct TInterval
{
long Start;
long End;
}
I'm a beginner when it comes to sophisticated drawing so any hint on how to create a wave form representation of a song, using sample data returned by BASS, with ability to zoom in/out would be appreciated.
I'm writing my project in C++ but I can understand C#, Delphi code so if you want you can post snippets in last two languages as well :)
Thanx DrOptix
By Zoom, I presume you mean horizontal zoom rather than vertical. The way audio editors do this is to scan the wavform breaking it up into time windows where each pixel in X represents some number of samples. It can be a fractional number, but you can get away with dis-allowing fractional zoom ratios without annoying the user too much. Once you zoom out a bit the max value is always a positive integer and the min value is always a negative integer.
for each pixel on the screen, you need to have to know the minimum sample value for that pixel and the maximum sample value. So you need a function that scans the waveform data in chunks and keeps track of the accumulated max and min for that chunk.
This is slow process, so professional audio editors keep a pre-calculated table of min and max values at some fixed zoom ratio. It might be at 512/1 or 1024/1. When you are drawing with a zoom ration of > 1024 samples/pixel, then you use the pre-calculated table. if you are below that ratio you get the data directly from the file. If you don't do this you will find that you drawing code gets to be too slow when you zoom out.
Its worthwhile to write code that handles all of the channels of the file in an single pass when doing this scanning, slowness here will make your whole program feel sluggish, it's the disk IO that matters here, the CPU has no trouble keeping up, so straightforward C++ code is fine for building the min/max tables, but you don't want to go through the file more than once and you want to do it sequentially.
Once you have the min/max tables, keep them around. You want to go back to the disk as little as possible and many of the reasons for wanting to repaint your window will not require you to rescan your min/max tables. The memory cost of holding on to them is not that high compared to the disk io cost of building them in the first place.
Then you draw the waveform by drawing a series of 1 pixel wide vertical lines between the max value and the min value for the time represented by that pixel. This should be quite fast if you are drawing from pre built min/max tables.
I've recently done this myself. As Marius suggests you need to work out how many samples are at each column of pixels. You then work out the minimum and maximum and then plot a vertical line from the maximum to the minimum.
As a first pass this seemingly works fine. The problem you'll get is that as you zoom out it will start to take too long to retrieve the samples from disk. As a solution to this I built a "peak" file alongside the audio file. The peak file stores the minimum/maximum pairs for groups of n samples. PLaying with n till you get the right amount is up to uyou. Personally I found 128 samples to be a good tradeoff between size and speed. Its also worth remembering that, unless you are drawing a control larger than 65536 pixels in size that you needn't store this peak information as anything more than 16-bit values which saves a bit of space.
Wouldn't you just plot the sample points on a 2 canvas? You should know how many samples there are per second for a file (read it from the header), and then plot the value on the y axis. Since you want to be able to zoom in and out, you need to control the number of samples per pixel (the zoom level). Next you take the average of those sample points per pixel (for example take the average of every 5 points if you have 5 samples per pixel. Then you can use a 2d drawing api to draw lines between the points.
Using the open source NAudio Package -
public class WavReader2
{
private readonly WaveFileReader _objStream;
public WavReader2(String sPath)
{
_objStream = new WaveFileReader(sPath);
}
public List<SampleRangeValue> GetPixelGraph(int iSamplesPerPixel)
{
List<SampleRangeValue> colOutputValues = new List<SampleRangeValue>();
if (_objStream != null)
{
_objStream.Position = 0;
int iBytesPerSample = (_objStream.WaveFormat.BitsPerSample / 8) * _objStream.WaveFormat.Channels;
int iNumPixels = (int)Math.Ceiling(_objStream.SampleCount/(double)iSamplesPerPixel);
byte[] aryWaveData = new byte[iSamplesPerPixel * iBytesPerSample];
_objStream.Position = 0; // startPosition + (e.ClipRectangle.Left * iBytesPerSample * iSamplesPerPixel);
for (float iPixelNum = 0; iPixelNum < iNumPixels; iPixelNum += 1)
{
short iCurrentLowValue = 0;
short iCurrentHighValue = 0;
int iBytesRead = _objStream.Read(aryWaveData, 0, iSamplesPerPixel * iBytesPerSample);
if (iBytesRead == 0)
break;
List<short> colValues = new List<short>();
for (int n = 0; n < iBytesRead; n += 2)
{
short iSampleValue = BitConverter.ToInt16(aryWaveData, n);
colValues.Add(iSampleValue);
}
float fLowPercent = (float)((float)colValues.Min() /ushort.MaxValue);
float fHighPercent = (float)((float)colValues.Max() / ushort.MaxValue);
colOutputValues.Add(new SampleRangeValue(fHighPercent, fLowPercent));
}
}
return colOutputValues;
}
}
public struct SampleRangeValue
{
public float HighPercent;
public float LowPercent;
public SampleRangeValue(float fHigh, float fLow)
{
HighPercent = fHigh;
LowPercent = fLow;
}
}

Plotting waveform of the .wav file

I wanted to plot the wave-form of the .wav file for the specific plotting width.
Which method should I use to display correct waveform plot ?
Any Suggestions , tutorial , links are welcomed....
Basic algorithm:
Find number of samples to fit into draw-window
Determine how many samples should be presented by each pixel
Calculate RMS (or peak) value for each pixel from a sample block. Averaging does not work for audio signals.
Draw the values.
Let's assume that n(number of samples)=44100, w(width)=100 pixels:
then each pixel should represent 44100/100 == 441 samples (blocksize)
for (x = 0; x < w; x++)
draw_pixel(x_offset + x,
y_baseline - rms(&mono_samples[x * blocksize], blocksize));
Stuff to try for different visual appear:
rms vs max value from block
overlapping blocks (blocksize x but advance x/2 for each pixel etc)
Downsampling would not probably work as you would lose peak information.
Either use RMS, BlockSize depends on how far you are zoomed in!
float RMS = 0;
for (int a = 0; a < BlockSize; a++)
{
RMS += Samples[a]*Samples[a];
}
RMS = sqrt(RMS/BlockSize);
or Min/Max (this is what cool edit/Audtion Uses)
float Max = -10000000;
float Min = 1000000;
for (int a = 0; a < BlockSize; a++)
{
if (Samples[a] > Max) Max = Samples[a];
if (Samples[a] < Min) Min = Samples[a];
}
Almost any kind of plotting is platform specific. That said, .wav files are most commonly used on Windows, so it's probably a fair guess that you're interested primarily (or exclusively) in code for Windows as well. In this case, it mostly depends on your speed requirements. If you want a fairly static display, you can just draw with MoveTo and (mostly) LineTo. If that's not fast enough, you can gain a little speed by using something like PolyLine.
If you want it substantially faster, chances are that your best bet is to use something like OpenGL or DirectX graphics. Either of these does the majority of real work on the graphics card. Given that you're talking about drawing a graph of sound waves, even a low-end graphics card with little or no work on optimizing the drawing will probably keep up quite easily with almost anything you're likely to throw at it.
Edit: As far as reading the .wav file itself goes, the format is pretty simple. Most .wav files are uncompressed PCM samples, so drawing them is a simple matter of reading the headers to figure out the sample size and number of channels, then scaling the data to fit in your window.
Edit2: You have a couple of choices for handling left and right channels. One is to draw them in two separate plots, typically one above the other. Another is to draw them superimposed, but in different colors. Which is more suitable depends on what you're trying to accomplish -- if it's mostly to look cool, a superimposed, multi-color plot will probably work nicely. If you want to allow the user to really examine what's there in detail, you'll probably want two separate plots.
What exactly do you mean by a waveform? Are you trying to plot the level of the frequency components in the signal a.k.a the spectrum, most commonly seen in musci visualizers, car stereos, boomboxes? If so, you should use the Fast Fourier Transform. FFT is a standard technique to split a time domain signal into its individual frequencies. There are tons of good FFT library routines available.
In C++, you can use the openFrameworks library to set up a music player for wav, extract the FFT and draw it.
You can also use Processing with the Minim library to do the same. I have tried it and it is pretty straightforward.
Processing even has support for OpenGL and it is a snap to use.