Plotting waveform of the .wav file - c++

I wanted to plot the wave-form of the .wav file for the specific plotting width.
Which method should I use to display correct waveform plot ?
Any Suggestions , tutorial , links are welcomed....

Basic algorithm:
Find number of samples to fit into draw-window
Determine how many samples should be presented by each pixel
Calculate RMS (or peak) value for each pixel from a sample block. Averaging does not work for audio signals.
Draw the values.
Let's assume that n(number of samples)=44100, w(width)=100 pixels:
then each pixel should represent 44100/100 == 441 samples (blocksize)
for (x = 0; x < w; x++)
draw_pixel(x_offset + x,
y_baseline - rms(&mono_samples[x * blocksize], blocksize));
Stuff to try for different visual appear:
rms vs max value from block
overlapping blocks (blocksize x but advance x/2 for each pixel etc)
Downsampling would not probably work as you would lose peak information.

Either use RMS, BlockSize depends on how far you are zoomed in!
float RMS = 0;
for (int a = 0; a < BlockSize; a++)
{
RMS += Samples[a]*Samples[a];
}
RMS = sqrt(RMS/BlockSize);
or Min/Max (this is what cool edit/Audtion Uses)
float Max = -10000000;
float Min = 1000000;
for (int a = 0; a < BlockSize; a++)
{
if (Samples[a] > Max) Max = Samples[a];
if (Samples[a] < Min) Min = Samples[a];
}

Almost any kind of plotting is platform specific. That said, .wav files are most commonly used on Windows, so it's probably a fair guess that you're interested primarily (or exclusively) in code for Windows as well. In this case, it mostly depends on your speed requirements. If you want a fairly static display, you can just draw with MoveTo and (mostly) LineTo. If that's not fast enough, you can gain a little speed by using something like PolyLine.
If you want it substantially faster, chances are that your best bet is to use something like OpenGL or DirectX graphics. Either of these does the majority of real work on the graphics card. Given that you're talking about drawing a graph of sound waves, even a low-end graphics card with little or no work on optimizing the drawing will probably keep up quite easily with almost anything you're likely to throw at it.
Edit: As far as reading the .wav file itself goes, the format is pretty simple. Most .wav files are uncompressed PCM samples, so drawing them is a simple matter of reading the headers to figure out the sample size and number of channels, then scaling the data to fit in your window.
Edit2: You have a couple of choices for handling left and right channels. One is to draw them in two separate plots, typically one above the other. Another is to draw them superimposed, but in different colors. Which is more suitable depends on what you're trying to accomplish -- if it's mostly to look cool, a superimposed, multi-color plot will probably work nicely. If you want to allow the user to really examine what's there in detail, you'll probably want two separate plots.

What exactly do you mean by a waveform? Are you trying to plot the level of the frequency components in the signal a.k.a the spectrum, most commonly seen in musci visualizers, car stereos, boomboxes? If so, you should use the Fast Fourier Transform. FFT is a standard technique to split a time domain signal into its individual frequencies. There are tons of good FFT library routines available.
In C++, you can use the openFrameworks library to set up a music player for wav, extract the FFT and draw it.
You can also use Processing with the Minim library to do the same. I have tried it and it is pretty straightforward.
Processing even has support for OpenGL and it is a snap to use.

Related

Visualize stream from a PMD Camboard Nano in Qt

EDIT: honest recommendation
If you want to stream from a PMD in realtime, use C#. Any UI is simple to create and there os quite a mighty library, MetriCam by Metrilus AG, which supports streaming for a variety of 3D-Cameras. I am able to get stable 45 fps with that.
ORIGINAL:
I've been trying to get depth information from a PMD camboard nano and visualize it in a GUI. The Information is delivered as a 165x120 float array.
As I also want to use the Data for analysis purpose (image quality, white noise etc.), I need to grab the frames at a specific framerate. The problem is, that the SDK which PMD delivers with its camera (for MATLAB & C) only provides the possibility to grab single frames by calling
pmdUpdate(hnd);
so the framerate is dependent on how often you poll the image data.
I initially tried to do the analysis in MATLAB, but I couldn't get more than 30 fps out of the camera and adding some further code to the loop made it impossible to work with (I need at least reliable 25 fps).
I then switched to C, where I got rates of up to 70 fps, but could not visualize the data.
Then I tried it with Qt, which is based on C/C++ - it should therefore be fast polling the image data - and where I could easily include the libraries of the PMDSDK. As I am new to Qt, though, I do not know much about the UI-Elements.
So my question:
Is there any performant way to visualize a 2D-float-array on a Qt-GUI? If not, how about anything useful in Visual Studio with C++?
(I know that drawing every pixel one by one on a QGraphicsView is dumb, but I tried it, and I get a whopping framerate of .4 fps...)
Thanks for any helpful suggestions!
Jannik
The QImage Class actually has a constructor that accepts a uchar pointer/array. You only need to map my float values to RGB values in uchar-format.
pmdGetDistances(hnd, dist, dd.img.numColumns*dd.img.numRows*sizeof(float));
uchar *imagemap = new uchar[dd.img.numColumns*dd.img.numRows*3];
int i,j;
for (i = 0; i < 165; i++){
for (j = 0; j < 120; j++){
uchar value = (uchar)std::floor(40*dist[j*165+i]);
if(value > 255 || value < 0){
value = 0;
}
//colorscaling integrated
imagemap[3*(j*165+i)] = floor((255-value)*(255-value)/255.0);
imagemap[3*(j*165+i)+1] = abs(floor((value-127)/1.5));
imagemap[3*(j*165+i)+2] = floor(value*value/255.0);
}
}
The QImage can then be converted to Pixmap and displayed in the QGraphicsView. This worked for me, but the framerate seems not really stable.
QImage image(imagemap, 165, 120, 165*3, QImage::Format_RGB888);
QPixmap pmap(QPixmap::fromImage(image));
scene->addPixmap(pmap.scaled(165,120));
ui->viewCamera->update();
It could be worth a try to send the Thread sleeping until the desired time is elapsed.QThread::msleep(msec);

Runtime Sound Generation in C++ on Windows

How might one generate audio at runtime using C++? I'm just looking for a starting point. Someone on a forum suggested I try to make a program play a square wave of a given frequency and amplitude.
I've heard that modern computers encode audio using PCM samples: At a give rate for a specific unit of time (eg. 48 kHz), the amplitude of a sound is recorded at a given resolution (eg. 16-bits). If I generate such a sample, how do I get my speakers to play it? I'm currently using windows. I'd prefer to avoid any additional libraries if at all possible but I'd settle for a very light one.
Here is my attempt to generate a square wave sample using this principal:
signed short* Generate_Square_Wave(
signed short a_amplitude ,
signed short a_frequency ,
signed short a_sample_rate )
{
signed short* sample = new signed short[a_sample_rate];
for( signed short c = 0; c == a_sample_rate; c++ )
{
if( c % a_frequency < a_frequency / 2 )
sample[c] = a_amplitude;
else
sample[c] = -a_amplitude;
}
return sample;
}
Am I doing this correctly? If so, what do I do with the generated sample to get my speakers to play it?
Your loop has to use c < a_sample_rate to avoid a buffer overrun.
To output the sound you call waveOutOpen and other waveOut... functions. They are all listed here:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd743834(v=vs.85).aspx
The code you are using generates a wave that is truly square, binary kind of square, in short the type of waveform that does not exist in real life. In reality most (pretty sure all) of the sounds you hear are a combination of sine waves at different frequencies.
Because your samples are created the way they are they will produce aliasing, where a higher frequency masquerades as a lower frequency causing audio artefacts. To demonstrate this to yourself write a little program which sweeps the frequency of your code from 20-20,000hz. You will hear that the sound does not go up smoothly as it raises in frequency. You will hear artefacts.
Wikipedia has an excellent article on square waves: https://en.m.wikipedia.org/wiki/Square_wave
One way to generate a square wave is to perform an inverse Fast Fourier Transform which transforms a series of frequency measurements into a series of time based samples. Then generating a square wave is a matter of supplying the routine with a collection of the measurements of sin waves at different frequencies that make up a square wave and the output is a buffer with a single cycle of the waveform.
To generate audio waves is computationally expensive so what is often done is to generate arrays of audio samples and play them back at varying speeds to play different frequencies. This is called wave table synthesis.
Have a look at the following link:
https://www.earlevel.com/main/2012/05/04/a-wavetable-oscillator%E2%80%94part-1/
And some more about band limiting a signal and why it’s necessary:
https://dsp.stackexchange.com/questions/22652/why-band-limit-a-signal

OpenCV Image Manipulation

I am trying to find out the difference in 2 images.
Scenario: Suppose that i have 2 images, one of a background and the other of a person in front of the background, I want to subtract the two images in such a way that I get the position of the person, that is the program can detect where the person was standing and give the subtracted image as the output.
The code that I have managed to come up with is taking two images from the camera and re-sizing them and is converting both the images to gray scale. I wanted to know what to do after this. I checked the subtract function provided by OpenCV but it takes arrays as inputs so I don't know how to progress.
The code that I have written is:
cap>>frame; //gets the first image
cv::cvtColor(frame,frame,CV_RGB2GRAY); //converts it to gray scale
cv::resize(frame,frame,Size(30,30)); //re-sizes it
cap>>frame2;//gets the second image
cv::cvtColor(frame2,frame2,CV_RGB2GRAY); //converts it to gray scale
cv::resize(frame2,frame2,Size(30,30)); //re-sizes it
Now do I simply use the subtract function like:
cv::subtract(frame_gray,frame,frame);
or do I apply some filters first and then use the subtract function?
As others have noticed, it's a tricky problem: easy to come up with a hack that will work sometimes, hard to come up with a solution that will work most of the time with minimal human intervention. Also, much easier to do if you can control tightly the material and illumination of the background. The professional applications are variously known as "chromakeying" (esp. in the TV industry), "bluescreening", "matting" or "traveling matte" (in cinematography), "background removal" in computer vision.
The groundbreaking work for matting quasi-uniform backdrops was done by Petro Vlahos many years ago. The patents on its basic algorithms have already expired, so you can go to town with them (and find open source implementations of various quality). Needless to say, IANAL, so do your homework on the patent subject.
Matting out more complex backgrounds is still an active research area, especially for the case when no 3D information is available. You may want to look into a few research papers that have come out of MS Research in the semi-recent past (A. Criminisi did some work in that area).
Using the subtract would not be appropriate because, it might result in some values becoming negative and will work only if you are trying to see if there is a difference or not( a boolean true/false).
If you need to get the pixels where it is differing, you should do a pixel by pixel comparison - something like:
int rows = frame.rows;
int cols = frame.cols;
cv::Mat diffImage = cv::Mat::zeros(rows, cols, CV_8UC1);
for(int i = 0; i < rows; ++i)
{
for(int j = 0; j < cols; ++j)
{
if(frame.at<uchar>(i,j) != frame2.at<uchar>(i,j))
diffImage.at<uchar>(i, j) = 255;
}
}
now, you can either show or save diffImage. All pixels that differ will be white while the similar ones will be in black

Confusion regarding Image compression algorithms

I had been reading a webpage on Image Compression (Lossy and Non-lossy).
Now this is my problem, I was successful in making a project on Face detection using opencv - however - my Project Guide is not satisfied - my project simply captures the frames from a Capture device [ webcam ] and passes frames in a function to detect the Faces in those frames and outputs the detect frames in Windows.
My Project Guide wants me to implement some algorithm either of image compression or morphing , etc. but was not happy on seeing such heavy usage of the Library -
So what I would like to know - is it possible to code using C or C++ - image compression algorithms? If yes would not the code size be huge? (my project is supposed to be a minor one)
Please help me out, suppose I want to use the RLE compression using C++ how should I go about it?
You want to invent your own image compression or implement one of the standard ones?
( I assume this is for some sort of class/assignment, you wouldn't do this in the real world!)
You can compress simple images a little using something like Run-Length, especially if you can reduce the number of colours ie. a cartoon or graphic, but for a real photo style image it isn't going to work - that's why complex lossy techniques like jpeg or wavelets were invented.
It's very possible, and RLE compression is quite easy. If you want to look at a relatively straight-forward approach to RLE that won't use a lot of code, look at implementing a version of packbits.
Here's another link as well: http://michael.dipperstein.com/rle/index.html (includes an implementation with source-code for both traditional RLE and packbits)
BTW, keep in mind that you could, with noisy data, actually end up with more data than uncompressed using RLE schemes. For most "real-world" images though that have some form of low-pass filtering applied and a relatively good signal-to-noise ration (i.e,. above 40db), you should expect around 1.5:1 to 1.7:1 compression ratios.
Another option for lossless compression would be huffman-encoding ... that algorithm is more tolerant of noisy images, in that it generally prevents the data-expansion that could occur with those types of images when encoded with a RLE compression algorithm.
Finally, you didn't mention whether you were working with color or grayscale images ... if it's a color image, remember that you will find much greater redundancy if you compress each color-channel in a planar-color-channel image, rather than trying to compress contiguous RGB data.
RLE is the best way to go here. Even the "simplest" compression algorithms are non-trivial and require in-depth knowledge of color space transforms, discrete sin/cosine transforms, entropy, etc.
Back to RLE... to loop through pixesls use something like this:
cv::Mat img = cv::imread("lenna.png");
for(int i=0; i < img.rows; i++)
for(int j=0; i < img.cols; j++)
// You can now access the pixel value with cv::Vec3b
std::cout << img.at<cv::Vec3b>(i,j)[0] << " " << img.at<cv::Vec3b>(i,j)[1] << " " << img.at<cv::Vec3b>(i,j)[2] << std::endl;
Count the number of similar pixels in a row and store them in any data structure (maybe a < #Occurences, Vec3b > tuple in a vector?). Once you have your final vector, don't forget to store the size of your image somewhere with the aforementioned vector (maybe in a simple compressedImage struct) and voilà, you just compressed an image. To store it in a file, I suggest you use boost::serialize or something similar.
Your final struct may look something similar to:
struct compressedImage {
int height;
int width;
vector< pair<int, Vec3b> > data;
};
Happy coding!
You want to implement a compression based on colour reduction with a space-filling-curve or a spatial index. A si reduce the 2d complexity to a 1d complexity and it looks like a quadtree and a bit like a fractal. You want to look for Nick's hilbert curve quadtree spatial index blog!
Here is another interesting RLE encoding idea: Lossless hierarchical run length encoding. Maybe that's something for you?
if you need to abstract the raster type, you can use GDAL C++ library. Here is the list of supported by default or on request raster formats:
http://gdal.org/formats_list.html

Draw sound wave with possibility to zoom in/out

I'm writing a sound editor for my graduation. I'm using BASS to extract samples from MP3, WAV, OGG etc files and add DSP effects like echo, flanger etc. Simply speaching I made my framework that apply an effect from position1 to position2, cut/paste management.
Now my problem is that I want to create a control similar with this one from Cool Edit Pro that draw a wave form representation of the song and have the ability to zoom in/out select portions of the wave form etc. After a selection i can do something like:
TInterval EditZone = WaveForm->GetSelection();
where TInterval have this form:
struct TInterval
{
long Start;
long End;
}
I'm a beginner when it comes to sophisticated drawing so any hint on how to create a wave form representation of a song, using sample data returned by BASS, with ability to zoom in/out would be appreciated.
I'm writing my project in C++ but I can understand C#, Delphi code so if you want you can post snippets in last two languages as well :)
Thanx DrOptix
By Zoom, I presume you mean horizontal zoom rather than vertical. The way audio editors do this is to scan the wavform breaking it up into time windows where each pixel in X represents some number of samples. It can be a fractional number, but you can get away with dis-allowing fractional zoom ratios without annoying the user too much. Once you zoom out a bit the max value is always a positive integer and the min value is always a negative integer.
for each pixel on the screen, you need to have to know the minimum sample value for that pixel and the maximum sample value. So you need a function that scans the waveform data in chunks and keeps track of the accumulated max and min for that chunk.
This is slow process, so professional audio editors keep a pre-calculated table of min and max values at some fixed zoom ratio. It might be at 512/1 or 1024/1. When you are drawing with a zoom ration of > 1024 samples/pixel, then you use the pre-calculated table. if you are below that ratio you get the data directly from the file. If you don't do this you will find that you drawing code gets to be too slow when you zoom out.
Its worthwhile to write code that handles all of the channels of the file in an single pass when doing this scanning, slowness here will make your whole program feel sluggish, it's the disk IO that matters here, the CPU has no trouble keeping up, so straightforward C++ code is fine for building the min/max tables, but you don't want to go through the file more than once and you want to do it sequentially.
Once you have the min/max tables, keep them around. You want to go back to the disk as little as possible and many of the reasons for wanting to repaint your window will not require you to rescan your min/max tables. The memory cost of holding on to them is not that high compared to the disk io cost of building them in the first place.
Then you draw the waveform by drawing a series of 1 pixel wide vertical lines between the max value and the min value for the time represented by that pixel. This should be quite fast if you are drawing from pre built min/max tables.
I've recently done this myself. As Marius suggests you need to work out how many samples are at each column of pixels. You then work out the minimum and maximum and then plot a vertical line from the maximum to the minimum.
As a first pass this seemingly works fine. The problem you'll get is that as you zoom out it will start to take too long to retrieve the samples from disk. As a solution to this I built a "peak" file alongside the audio file. The peak file stores the minimum/maximum pairs for groups of n samples. PLaying with n till you get the right amount is up to uyou. Personally I found 128 samples to be a good tradeoff between size and speed. Its also worth remembering that, unless you are drawing a control larger than 65536 pixels in size that you needn't store this peak information as anything more than 16-bit values which saves a bit of space.
Wouldn't you just plot the sample points on a 2 canvas? You should know how many samples there are per second for a file (read it from the header), and then plot the value on the y axis. Since you want to be able to zoom in and out, you need to control the number of samples per pixel (the zoom level). Next you take the average of those sample points per pixel (for example take the average of every 5 points if you have 5 samples per pixel. Then you can use a 2d drawing api to draw lines between the points.
Using the open source NAudio Package -
public class WavReader2
{
private readonly WaveFileReader _objStream;
public WavReader2(String sPath)
{
_objStream = new WaveFileReader(sPath);
}
public List<SampleRangeValue> GetPixelGraph(int iSamplesPerPixel)
{
List<SampleRangeValue> colOutputValues = new List<SampleRangeValue>();
if (_objStream != null)
{
_objStream.Position = 0;
int iBytesPerSample = (_objStream.WaveFormat.BitsPerSample / 8) * _objStream.WaveFormat.Channels;
int iNumPixels = (int)Math.Ceiling(_objStream.SampleCount/(double)iSamplesPerPixel);
byte[] aryWaveData = new byte[iSamplesPerPixel * iBytesPerSample];
_objStream.Position = 0; // startPosition + (e.ClipRectangle.Left * iBytesPerSample * iSamplesPerPixel);
for (float iPixelNum = 0; iPixelNum < iNumPixels; iPixelNum += 1)
{
short iCurrentLowValue = 0;
short iCurrentHighValue = 0;
int iBytesRead = _objStream.Read(aryWaveData, 0, iSamplesPerPixel * iBytesPerSample);
if (iBytesRead == 0)
break;
List<short> colValues = new List<short>();
for (int n = 0; n < iBytesRead; n += 2)
{
short iSampleValue = BitConverter.ToInt16(aryWaveData, n);
colValues.Add(iSampleValue);
}
float fLowPercent = (float)((float)colValues.Min() /ushort.MaxValue);
float fHighPercent = (float)((float)colValues.Max() / ushort.MaxValue);
colOutputValues.Add(new SampleRangeValue(fHighPercent, fLowPercent));
}
}
return colOutputValues;
}
}
public struct SampleRangeValue
{
public float HighPercent;
public float LowPercent;
public SampleRangeValue(float fHigh, float fLow)
{
HighPercent = fHigh;
LowPercent = fLow;
}
}