Plot an audio waveform in C++/Qt - c++

I have an university assignement which consists in displaying the waveform of an audio file using C++/Qt. We should be able to modify the scale that we use to display it (expressed in audio samples per screen pixel).
So far, I am able to:
open the audio file
read the samples
plot the samples at a given scale
To plot the samples at a given scale, I have tried two strategies. Let assume that N is the value of the scale:
for i going from 0 to the width of my window, plot the i * Nth audio sample at the screen pixel i. This is very fast and constant in time because we always access the same amount of audio data points.
However, it does not represent the waveform correctly, as we use the value of only 1 point to represent N points.
for i going from 0 to N * width, plot the ith audio sample at the screen position i / (N * width) and let Qt figure out how to represent that correctly on physical screen pixels.
That plots very beautiful waveforms but it takes hell a lot of time to access data. For instance, if I want to display 500 samples per pixel and the width of my window is 100px, I have to access 50 000 points, which are then plotted by Qt as 100 physical points (pixels).
So, how can I get a correct plot of my audio data, which can be calculated fast? Should I calculate the average of N samples for each physical pixel? Should I do some curve fitting?
In other words, what kind of operation is involved when Qt/Matplotlib/Matlab/etc plot thousands of data point to a very limited amount of physical pixels?

Just because I do know how to do it and I already asked something similar on stackoverflow I'll reference this. I'll provide code later.
Drawing Waveforms is a real problem. I tried to figure this out for more than a half of a year!
To sum this up:
According to the Audacity Documentation:
The waveform view uses two shades of blue, one darker and one lighter.
The dark blue part of the waveform displays the tallest peak in the area that pixel represents. At default zoom level Audacity will
display many samples within that pixel width, so this pixel represents
the value of the loudest sample in the group.
The light blue part of the waveform displays the average RMS (Root Mean Square) value for the same group of samples. This is a rough
guide to how loud this area might sound, but there is no way to
extract or use this RMS part of the waveform separately.
So you simply try to get the important information out of a chunk of data. If you do this over and over you'll have multiple stages which can be used for drawing.
I'll provide some code here, please bear with me it's in development:
template<typename T>
class CacheHandler {
public:
std::vector<T> data;
vector2d<T> min, max, rms;
CacheHandler(std::vector<T>& data) throw(std::exception);
void addData(std::vector<T>& samples);
/*
irreversible removes data.
Fails if end index is greater than data length
*/
void removeData(int endIndex);
void removeData(int startIndex, int endIndex);
};
using this:
template<typename T>
inline WaveformPane::CacheHandler<T>::CacheHandler(std::vector<T>& data, int sampleSizeInBits) throw(std::exception)
{
this->data = data;
this->sampleSizeInBits = sampleSizeInBits;
int N = log(data.size()) / log(2);
rms.resize(N); min.resize(N); max.resize(N);
rms[0] = calcRMSSegments(data, 2);
min[0] = getMinPitchSegments(data, 2);
max[0] = getMaxPitchSegments(data, 2);
for (int i = 1; i < N; i++) {
rms[i] = calcRMSSegments(rms[i - 1], 2);
min[i] = getMinPitchSegments(min[i - 1], 2);
max[i] = getMaxPitchSegments(max[i - 1], 2);
}
}

What I'd suggest is something like this:
Given totalNumSamples audio samples in your audio file, and widgetWidth pixels of width in your display widget, you can calculate which samples are to be represented by each pixel:
// Given an x value (in pixels), returns the appropriate corresponding
// offset into the audio-samples array that represents the
// first sample that should be included in that pixel.
int GetFirstSampleIndexForPixel(int x, int widgetWidth, int totalNumSamples)
{
return (totalNumSamples*x)/widgetWidth;
}
virtual void paintEvent(QPaintEvent * e)
{
QPainter p(this);
for (int x=0; x<widgetWidth; x++)
{
const int firstSampleIndexForPixel = GetFirstSampleIndexForPixel(x, widgetWidth, totalNumSamples);
const int lastSampleIndexForPixel = GetFirstSampleIndexForPixel(x+1, widgetWidth, totalNumSamples)-1;
const int largestSampleValueForPixel = GetMaximumSampleValueInRange(firstSampleIndexForPixel, lastSampleIndexForPixel);
const int smallestSampleValueForPixel = GetMinimumSampleValueInRange(firstSampleIndexForPixel, lastSampleIndexForPixel);
// draw a vertical line spanning all sample values that are contained in this pixel
p.drawLine(x, GetYValueForSampleValue(largestSampleValueForPixel), x, GetYValueForSampleValue(smallestSampleValueForPixel));
}
}
Note that I didn't include source code for GetMinimumSampleValueInRange(), GetMaximumSampleValueInRange(), or GetYValueForSampleValue(), since hopefully what they do is obvious from their names, but if not, let me know and I can explain them.
Once you have the above working reasonably well (i.e. drawing a waveform that shows the entire file into your widget), you can start working on adding in zoom-and-pan functionality. Horizontal zoom can be implemented by modifying the behavior of GetFirstSampleIndexForPixel(), e.g.:
int GetFirstSampleIndexForPixel(int x, int widgetWidth, int sampleIndexAtLeftEdgeOfWidget, int sampleIndexAfterRightEdgeOfWidget)
{
int numSamplesToDisplay = sampleIndexAfterRightEdgeOfWidget-sampleIndexAtLeftEdgeOfWidget;
return sampleIndexAtLeftEdgeOfWidget+((numSamplesToDisplay*x)/widgetWidth);
}
With that, you can zoom/pan simply by passing in different values for sampleIndexAtLeftEdgeOfWidget and sampleIndexAfterRightEdgeOfWidget that together indicate the subrange of the file you want to display.

Related

Adding graphs together in c++ (to generate fractal noise)

I am trying to make a 1D fractal noise function. I have a function generating every single individual graph, but am struggling with how to add them together. I am following this tutorial
https://web.archive.org/web/20160530124230/http://freespace.virgin.net/hugo.elias/models/m_perlin.htm
Here is my code for my final noise function
(I am using sfml, which is what the sf::vector2f are. It's just a vector of two floats, representing a coordinate.)
void fractalNoise() {
std::vector<sf::Vector2f> allGraphs;
std::vector<sf::Vector2f> singleNoise;
float persistance = 0.8; //represents the decrease of amplitude with frequency.
//The closer to one, the less the amplitude decreases each iteration
int nOOPM1 = 10; //number of iterations
for (int i = 0; i < nOOPM1; i++) {
float frequency = pow(2, i);
float amplitude = pow(persistance, I);
//generate a random plots of noise, equidistant on the x, and random on the Y.
//the 3 is the interpolation method(ignore this), and the 1000 is how many points to draw
singleNoise = this->interpolateNoise(
this->generateNoise(frequency, 300 * amplitude), 3, 1000);
between each point.
allGraphs.insert(allGraphs.end(), singleNoise.begin(), singleNoise.end());
}
this->noiseGenerated = allGraphs;
//every pixel stored in noiseGenerated is rendered to a window
};
I understand that the allGraphs.insert is just putting the next graph after the current one, but I am unsure how to add each graph together. Because of the nature of fractal noise, and the fact my frequencies are always changing, I can't just add the noise points before interpolating them, as they will mostly have different x values
Any help would be appreciated

How to quickly scan and analyze large groups of pixels?

I am trying to build an autoclicker using C++ to beat a 2D videogame in which the following situation appears:
The main character is in the center of the screen, the background is completely black and enemies are coming from all directions. I want my program to be capable of clicking on enemies just as they appear on the screen.
What I came up at first is that the enemies have a minimum size of 15px, so I tried doing a search every 15 pixels and analyze if any pixel is different than the background's RGB, using GetPixel(). It looks something like this:
COLORREF color;
int R, G, B;
for(int i=0; i<SCREEN_SIZE_X; i+=15){ //These SCREEN_SIZE values are #defined with the ones of my screen
for(int j=0;j<SCREEN_SIZE_Y, j+=15){
//The following conditional excludes the center which is the player's position
if((i<PLAYER_MIN_EDGE_X or i>PLAYER_MAX_EDGE_X) and (j<PLAYER_MIN_EDGE_Y or j>PLAYER_MAX_EDGE_Y)){
color = GetPixel(GetDC(nullptr), i, j);
R = GetRValue(color);
G = GetGValue(color);
B = GetBValue(color);
if(R!=0 or G!=0 or B!=0) cout<<"Enemy Found"<<endl;
}
}
}
It turns out that, as expected, the GetPixel() function is extremely slow as it has to verify about 4000 pixels to cover just one screen scan. I was thinking about a way to solve this faster, and while looking at the keyboard I noticed the button "Pt Scr", and then realized that whatever that button is doing it is able to almost instantly save the information of millions of pixels.
I surely think there is a proper and different technic to approach this kind of problem.
What kind of theory or technic for pixel analyzing should I investigate and read about so that this can be considered respectable code, and to get it actually work, and much faster?
The GetPixel() routine is slow because it's fetching the data from the videocard (device) memory one by one. So to optimize your loop, you have to fetch the entire screen at once, and put it into an array of pixels. Then, you can iterate over that array of pixels much faster, because it'll be operating over the data in your RAM (host memory).
For a better optimization, I also recommend clearing the pixels of your player (in the center of the screen) after fetching the screen into your pixel array. This way, you can eliminate that if((i<PLAYER_MIN_EDGE_X or i>PLAYER_MAX_EDGE_X) and (j<PLAYER_MIN_EDGE_Y or j>PLAYER_MAX_EDGE_Y)) condition inside the loop.
CImage image;
//Save DC to image
int R, G, B;
BYTE *pRealData = (BYTE*)image.GetBits();
int pit = image.GetPitch();
int bitCount = image.GetBPP()/8;
int w=image.GetWidth();
int h=image.GetHeight();
for (int i=0;i<h;i++)
{
for (int j=0;j<w;j++)
{
B=*(pRealData + pit*i + j*bitCount);
G=*(pRealData + pit*i + j*bitCount +1);
R=*(pRealData + pit*i + j*bitCount +2);
}
}

Fast, good quality pixel interpolation for extreme image downscaling

In my program, I am downscaling an image of 500px or larger to an extreme level of approx 16px-32px. The source image is user-specified so I do not have control over its size. As you can imagine, few pixel interpolations hold up and inevitably the result is heavily aliased.
I've tried bilinear, bicubic and square average sampling. The square average sampling actually provides the most decent results but the smaller it gets, the larger the sampling radius has to be. As a result, it gets quite slow - slower than the other interpolation methods.
I have also tried an adaptive square average sampling so that the smaller it gets the greater the sampling radius, while the closer it is to its original size, the smaller the sampling radius. However, it produces problems and I am not convinced this is the best approach.
So the question is: What is the recommended type of pixel interpolation that is fast and works well on such extreme levels of downscaling?
I do not wish to use a library so I will need something that I can code by hand and isn't too complex. I am working in C++ with VS 2012.
Here's some example code I've tried as requested (hopefully without errors from my pseudo-code cut and paste). This performs a 7x7 average downscale and although it's a better result than bilinear or bicubic interpolation, it also takes quite a hit:
// Sizing control
ctl(0): "Resize",Range=(0,800),Val=100
// Variables
float fracx,fracy;
int Xnew,Ynew,p,q,Calc;
int x,y,p1,q1,i,j;
//New image dimensions
Xnew=image->width*ctl(0)/100;
Ynew=image->height*ctl(0)/100;
for (y=0; y<image->height; y++){ // rows
for (x=0; x<image->width; x++){ // columns
p1=(int)x*image->width/Xnew;
q1=(int)y*image->height/Ynew;
for (z=0; z<3; z++){ // channels
for (i=-3;i<=3;i++) {
for (j=-3;j<=3;j++) {
Calc += (int)(src(p1-i,q1-j,z));
} //j
} //i
Calc /= 49;
pset(x, y, z, Calc);
} // channels
} // columns
} // rows
Thanks!
The first point is to use pointers to your data. Never use indexes at every pixel. When you write: src(p1-i,q1-j,z) or pset(x, y, z, Calc) how much computation is being made? Use pointers to data and manipulate those.
Second: your algorithm is wrong. You don't want an average filter, but you want to make a grid on your source image and for every grid cell compute the average and put it in the corresponding pixel of the output image.
The specific solution should be tailored to your data representation, but it could be something like this:
std::vector<uint32_t> accum(Xnew);
std::vector<uint32_t> count(Xnew);
uint32_t *paccum, *pcount;
uint8_t* pin = /*pointer to input data*/;
uint8_t* pout = /*pointer to output data*/;
for (int dr = 0, sr = 0, w = image->width, h = image->height; sr < h; ++dr) {
memset(paccum = accum.data(), 0, Xnew*4);
memset(pcount = count.data(), 0, Xnew*4);
while (sr * Ynew / h == dr) {
paccum = accum.data();
pcount = count.data();
for (int dc = 0, sc = 0; sc < w; ++sc) {
*paccum += *i;
*pcount += 1;
++pin;
if (sc * Xnew / w > dc) {
++dc;
++paccum;
++pcount;
}
}
sr++;
}
std::transform(begin(accum), end(accum), begin(count), pout, std::divides<uint32_t>());
pout += Xnew;
}
This was written using my own library (still in development) and it seems to work, but later I changed the variables names in order to make it simpler here, so I don't guarantee anything!
The idea is to have a local buffer of 32 bit ints which can hold the partial sum of all pixels in the rows which fall in a row of the output image. Then you divide by the cell count and save the output to the final image.
The first thing you should do is to set up a performance evaluation system to measure how much any change impacts on the performance.
As said precedently, you should not use indexes but pointers for (probably) a substantial
speed up & not simply average as a basic averaging of pixels is basically a blur filter.
I would highly advise you to rework your code to be using "kernels". This is the matrix representing the ratio of each pixel used. That way, you will be able to test different strategies and optimize quality.
Example of kernels:
https://en.wikipedia.org/wiki/Kernel_(image_processing)
Upsampling/downsampling kernel:
http://www.johncostella.com/magic/
Note, from the code it seems you apply a 3x3 kernel but initially done on a 7x7 kernel. The equivalent 3x3 kernel as posted would be:
[1 1 1]
[1 1 1] * 1/9
[1 1 1]

colorbalance in an image using c++ and opencv

I'm trying to score the colorbalance of an image using c++ and opencv.
To do this the easiest way is to count the number of pixels in each color and then see if one of the colors is more prevalent.
I figured I should probably used calcHist and with the split function I can split a image in R, G, and B histograms. However I am unsure about what to do next. I could probably walk through all the bins and just see how many pixels are in there but this seems like a lot of work (I currently use 256 bins).
Is there a faster way to count the pixels in a color range? Also I am not sure how it would work if white or black are the more prevalant colors?
Automatic color balance algorithm is described in this link http://web.stanford.edu/~sujason/ColorBalancing/simplestcb.html
For C++ Code you can refer to this link : https://www.morethantechnical.com/2015/01/14/simplest-color-balance-with-opencv-wcode/
/// perform the Simplest Color Balancing algorithm
void SimplestCB(Mat& in, Mat& out, float percent) {
assert(in.channels() == 3);
assert(percent > 0 && percent < 100);
float half_percent = percent / 200.0f;
vector<Mat> tmpsplit; split(in,tmpsplit);
for(int i=0;i<3;i++) {
//find the low and high precentile values (based on the input percentile)
Mat flat; tmpsplit[i].reshape(1,1).copyTo(flat);
cv::sort(flat,flat,CV_SORT_EVERY_ROW + CV_SORT_ASCENDING);
int lowval = flat.at<uchar>(cvFloor(((float)flat.cols) * half_percent));
int highval = flat.at<uchar>(cvCeil(((float)flat.cols) * (1.0 - half_percent)));
cout << lowval << " " << highval << endl;
//saturate below the low percentile and above the high percentile
tmpsplit[i].setTo(lowval,tmpsplit[i] < lowval);
tmpsplit[i].setTo(highval,tmpsplit[i] > highval);
//scale the channel
normalize(tmpsplit[i],tmpsplit[i],0,255,NORM_MINMAX);
}
merge(tmpsplit,out);
}
// Usage example
void main() {
Mat tmp,im = imread("lily.png");
SimplestCB(im,tmp,1);
imshow("orig",im);
imshow("balanced",tmp);
waitKey(0);
return;
}
Colour balance is normally looking at a white (or gray) surface and checking the ratios of red/blue to green. A perfectly balanced system would have equal signal levels in red/blue.
You can then simply work out the average red/blue from the test gray card image and apply the same scaling to your real image.
Doing it on a live image with no reference is trickier, you have to find areas that are probably white (ie bright and nearly r=g=b) and use them as the reference
There's no definitive algorithm for colour balance, so anything you might implement, however good it is, will probably fail in some conditions.
One of the simplest algorithms is called Grey World, and assumes that statistically the average colour of a scene should be grey. And if it isn't, it means that it needs to be corrected to grey. So, very simply (in pseudo-python), if you have an image RGB:
cc[0] = np.mean(RGB[:,0]) # calculating channel-wise average
cc[1] = np.mean(RGB[:,1])
cc[2] = np.mean(RGB[:,2])
cc = cc / np.sqrt((cc**2).sum()) # normalise the light (you might want to
# play with this a bit
RGB /= cc # divide every pixel by the estimated light
Note that here I'm assuming that RGB is an array of floats with values between 0 and 1. Something else that helps is to exclude from the average pixels that contain values below and above certain thresholds (e.g., below 0.05 and above 0.95). This way you ignore pixels whose value is heavily influenced by noise (small values) and pixels that saturated the camera sensor and whose colour may not be reliable (large values).

Vertically flipping an Char array: is there a more efficient way?

Lets start with some code:
QByteArray OpenGLWidget::modifyImage(QByteArray imageArray, const int width, const int height){
if (vertFlip){
/* Each pixel constist of four unisgned chars: Red Green Blue Alpha.
* The field is normally 640*480, this means that the whole picture is in fact 640*4 uChars wide.
* The whole ByteArray is onedimensional, this means that 640*4 is the red of the first pixel of the second row
* This function is EXTREMELY SLOW
*/
QByteArray tempArray = imageArray;
for (int h = 0; h < height; ++h){
for (int w = 0; w < width/2; ++w){
for (int i = 0; i < 4; ++i){
imageArray.data()[h*width*4 + 4*w + i] = tempArray.data()[h*width*4 + (4*width - 4*w) + i ];
imageArray.data()[h*width*4 + (4*width - 4*w) + i] = tempArray.data()[h*width*4 + 4*w + i];
}
}
}
}
return imageArray;
}
This is the code I use right now to vertically flip an image which is 640*480 (The image is actually not guaranteed to be 640*480, but it mostly is). The color encoding is RGBA, which means that the total array size is 640*480*4. I get the images with 30 FPS, and I want to show them on the screen with the same FPS.
On an older CPU (Athlon x2) this code is just too much: the CPU is racing to keep up with the 30 FPS, so the question is: can I do this more efficient?
I am also working with OpenGL, does that have a gimmic I am not aware of that can flip images with relativly low CPU/GPU usage?
According to this question, you can flip an image in OpenGL by scaling it by (1,-1,1). This question explains how to do transformations and scaling.
You can improve at least by doing it blockwise, making use of the cache architecture. In your example one of the accesses (either the read OR the write) will be off-cache.
For a start it can help to "capture scanlines" if you're using two loops to loop through the pixels of an image, like so:
for (int y = 0; y < height; ++y)
{
// Capture scanline.
char* scanline = imageArray.data() + y*width*4;
for (int x = 0; x < width/2; ++x)
{
const int flipped_x = width - x-1;
for (int i = 0; i < 4; ++i)
swap(scanline[x*4 + i], scanline[flipped_x*4 + i]);
}
}
Another thing to note is that I used swap instead of a temporary image. That'll tend to be more efficient since you can just swap using registers instead of loading pixels from a copy of the entire image.
But also it generally helps if you use a 32-bit integer instead of working one byte at a time if you're going to be doing anything like this. If you're working with pixels with 8-bit types but know that each pixel is 32-bits, e.g., as in your case, you can generally get away with a case to uint32_t*, e.g.
for (int y = 0; y < height; ++y)
{
uint32_t* scanline = (uint32_t*)imageArray.data() + y*width;
std::reverse(scanline, scanline + width);
}
At this point you might parellelize the y loop. Flipping an image horizontally (it should be "horizontal" if I understood your original code correctly) in this way is a little bit tricky with the access patterns, but you should be able to get quite a decent boost using the above techniques.
I am also working with OpenGL, does that have a gimmic I am not aware
of that can flip images with relativly low CPU/GPU usage?
Naturally the fastest way to flip images is to not touch their pixels at all and just save the flipping for the final part of the pipeline when you render the result. For this you might render a texture in OGL with negative scaling instead of modifying the pixels of a texture.
Another thing that's really useful in video and image processing is to represent an image to process like this for all your image operations:
struct Image32
{
uint32_t* pixels;
int32_t width;
int32_t height;
int32_t x_stride;
int32_t y_stride;
};
The stride fields are what you use to get from one scanline (row) of an image to the next vertically and one column to the next horizontally. When you use this representation, you can use negative values for the stride and offset the pixels accordingly. You can also use the stride fields to, say, render only every other scanline of an image for fast interactive half-res scanline previews by using y_stride=height*2 and height/=2. You can quarter-res an image by setting x stride to 2 and y stride to 2*width and then halving the width and height. You can render a cropped image without making your blit functions accept a boatload of parameters by just modifying these fields and keeping the y stride to width to get from one row of the cropped section of the image to the next:
// Using the stride representation of Image32, this can now
// blit a cropped source, a horizontally flipped source,
// a vertically flipped source, a source flipped both ways,
// a half-res source, a quarter-res source, a quarter-res
// source that is horizontally flipped and cropped, etc,
// and all without modifying the source image in advance
// or having to accept all kinds of extra drawing parameters.
void blit(int dst_x, int dst_y, Image32 dst, Image32 src);
// We don't have to do things like this (and I think I lost
// some capabilities with this version below but it hurts my
// brain too much to think about what capabilities were lost):
void blit_gross(int dst_x, int dst_y, int dst_w, int dst_h, uint32_t* dst,
int src_x, int src_y, int src_w, int src_h,
const uint32_t* src, bool flip_x, bool flip_y);
By using negative values and passing it to an image operation (ex: a blit operation), the result will naturally be flipped without having to actually flip the image. It'll end up being "drawn flipped", so to speak, just as with the case of using OGL with a negative scaling transformation matrix.