I am trying to develop a simple evolution algorithm in C++. To make my calculations faster I decided to use async functions to run multiple calculations at once:
std::vector<std::future<int> > compute(8);
unsigned nptr = 0;
int syncp = 0;
while(nptr != network::networks.size()){
compute.at(syncp) = std::async(&network::analyse, &network::networks.at(nptr), data, width, height, sw, dFnum.at(idx));
syncp++;
if(syncp == 8){
syncp = 0;
for(unsigned i = 0; i < 8; i++){
compute.at(i).get();
}
}
nptr++;
}
This is how I start my calculating function. The function is called analyse, and for each "network" it assigns a score depending on how good it identifies the image.
This is part of the analyse function:
for(unsigned i = 0; i < entry.size(); i++){
double sum = 0;
data * d = &entry.at(i);
pattern * p = &pattern::patterns.at(d->patNo);
int sx = iWidth;
int sy = iHeight;
if(d->xPercentage*iWidth + d->xSpan*iWidth < sx) sx = d->xPercentage*iWidth + d->xSpan*iWidth;
if(d->yPercentage*iHeight + d->xSpan*iWidth < sy) sy = d->yPercentage*iHeight + d->xSpan*iWidth;
int xdisp = sx-d->xPercentage*iWidth;
int ydisp = sy-d->yPercentage*iHeight;
for(int x = d->xPercentage*iWidth; x < sx; x++){
for(int y = d->yPercentage*iHeight; y < sy; y++){
double xpl = x-d->xPercentage*iWidth;
double ypl = y-d->yPercentage*iHeight;
xpl /= xdisp;
ypl /= ydisp;
unsigned idx = (unsigned)(xpl*(p->width) + ypl*(p->height)*(p->width));
if(idx >= p->lweight.size()) idx = p->lweight.size()-1;
double weight = p->lweight.at(idx) - 5;
if(imageData[y*iWidth+x])
sum += weight;
else
sum -= 2*weight;
}
}
digitWeight[d->digit-1] += sum;
}
}
Now, there is no need to analyse the function itself - I'm sure it works, I have tested it on a single thread, and it runs just fine. The only problem is, after some time of execution, I get errors like segmentation fault, or vector range check error.
They mostly happen at this line:
digitWeight[d->digit-1] += sum;
Now, you can be sure that d->digit-1 is a valid range for this array.
The problem is that the value of the d pointer is different than it was here:
data * d = &entry.at(i);
It magically changes during the execution of the function, and starts pointing to different data, leading to errors. I have tried saving the value of d->digit to some variable and later use this variable, and it worked fine for just a while longer, before crashing on another shared resource, imageData this time.
I'm thinking this might be something related to data sharing - all async functions share the same array of data - it's a static vector. But this data is only read, not written anywhere, so why would it stop working? I know of something called mutex locking, but this would make no sense to lock this async functions, as it would run just as slow as a single threaded program would run.
I have also tried running the functions like this:
std::vector<std::thread*> threads(8);
unsigned nptr = 0;
int threadp = 0;
while(nptr != network::networks.size()){
threads.at(threadp) = new std::thread(&network::analyse, &network::networks.at(nptr), data, width, height, sw, dFnum.at(idx));
threadp++;
if(threadp == 8){
threadp = 0;
for(unsigned i = 0; i < 8; i++){
if(threads.at(i)->joinable()) threads.at(i)->join();
delete threads.at(i);
}
}
nptr++;
}
and it did work for a second, but after some time a very similar error appeared.
Data is a structure containing 7 integers, one of which is an ID of
pattern, and pattern is a class that contains two integers - width and height
and vector of chars.
Why does it happen on read-only data and how can I prevent it?
Here is an example of what happens:
Related
im trying to draw a mandelbrot and want to use 4 threats to do the calculation at the same time but a different part of the image , here are the functions
void Mandelbrot(int x_min,int x_max,int y_min,int y_max,Image &im)
{
for (int i = y_min; i < y_max; i++)
{
for (int j = x_min; j < x_max; j++)
{
//scaled x and y cordinate
double x0 = mape(j, 0, W, MinX, MaxX);
double y0 = mape(i, 0, H, MinY, MaxY);
double x = 0.0f;
double y = 0.0f;
int iteration = 0;
double z = 0;
while (abs(z)<2.0f && iteration < maxIteration)// && iteration < maxIteration)
{
double xtemp = x * x - y * y + x0;
y = 2 * x * y + y0;
x = xtemp;
iteration++;
z = x * x + y * y;
if (z > 10)//must be 10
break;
}
int b =mape(iteration, 0, maxIteration, 0, 255);
if (iteration == maxIteration)
b = 0;
im.setPixel(j, i, Color(b,b,0));
}
}
}
mape functions just convert a number from one range to another
Here is the thread function
void th(Image& im)
{
float size = (float)im.getSize().x / num_th;
int x_min = 0, x_max = size, y_min = 0, y_max = im.getSize().y;
thread t[num_th];
for (size_t i = 0; i < num_th; i++)
{
t[i] = thread(Mandelbrot, x_min, x_max, y_min, y_max, ref(im));
x_min = x_max;
x_max += size;
}
for (size_t i = 0; i<num_th; i++)
{
t[i].join();
}
}
The main function looks like this
int main()
{
Image img;
while(1)//here is while window.open()
{
th(img);
//here im drawing
}
}
So i am not getting any performance boost but it gets even slower , can anyone tell my where is the problem what im doing wrong , it happened to me before too
I sow a question what is an image , it's a class from the SFML library dont'n know if this is of any help.
Your code is incomplete to be able to answer you concretely, but there are a few suspicions:
Spawning a thread has non-trivial overhead. If the amount of work performed by the thread is not large enough, the overhead of launching it may cost more than any gains you would get through parallelism.
Excessive locking and contention. Does not look like a problem in your code, as you don't seem to use any locks at all. Be careful (though as long as they don't write to the same addresses, it should be correct.)
False sharing: Possible problem in your code. Cache lines tend to be 64 bytes. Any write to any portion of a cache line causes the whole line to be committed to memory. If two threads are looking at the same cache line and one of them writes to it, even if all the other threads use a different part of that cache line, they all will have their copy invalidated and will have to re-fetch. This can cause significant problems if multiple threads work in non-overlapping data that share a cache line and cause these invalidations. If they iterate at the same rate through the same data, it can cause this problem to recur over and over. This problem can be significant, and always worth considering.
memory layout causing your cache to be thrashed. While walking through an array, going "across" may align with actual memory layout, reading one full cacheline after another, but scanning "vertically" touches one portion of a cache line then jumps to the corresponding portion of another cache line. If this happens in many threads and you have a lot of memory to churn through, it can mean that your cache is vastly underutilized. Just something to beware of, whether your machine is row- or column- major, and write code to match it, and avoid jumping around in memory.
The code below is a demonstration of what I'm trying to do and it has the same problem as my original code (which is not included here). I have spectrogram code and I'm trying to improve its performance by using multiple threads (my computer has 4 cores). The spectrogram code basically computes an FFT over many overlapping frames (these frames correspond to sound samples at a particular time).
As an example let's say that we have 1000 frames which overlap by 50%.
If we're using 4 threads, then each thread should handle 250 frames. Overlapping frames just means that if our frames are 1024 samples in length, the first
frame has the range 0-1023, the second frame 512-1535, the third 1024-2047 etc (an overlap of 512 samples ).
The code creating and using the threads
void __fastcall TForm1::Button1Click(TObject *Sender)
{
numThreads = 4;
fftLen = 1024;
numWindows = 10000;
int startTime = GetTickCount();
numOverlappingWindows = numWindows*2;
overlap = fftLen/2;
const unsigned numElem = fftLen*numWindows+overlap;
rx = new float[numElem];
for(int i=0; i<numElem; i++) {
rx[i] = rand();
}
useThreads = true;
vWThread.reserve(numOverlappingWindows);
if(useThreads){
for(int i=0;i<numThreads;i++){
TWorkerThread *pWorkerThread = new TWorkerThread(true);
pWorkerThread->SetWorkerMethodCallback(&CalculateWindowFFTs);//this is called in TWorkerThread::Execute
vWThread.push_back(pWorkerThread);
}
pLock = new TCriticalSection();
for(int i=0;i<numThreads;i++){ //start the threads
vWThread.at(i)->Resume();
}
while(TWorkerThread::GetNumThreads()>0);
}else CalculateWindowFFTs();
int endTime = GetTickCount();
Label1->Caption = IntToStr(endTime-startTime);
}
void TForm1::CalculateWindowFFTs(){
unsigned startWnd = 0, endWnd = numOverlappingWindows, threadId;
if(useThreads){
threadId = TWorkerThread::GetCurrentThreadId();
unsigned wndPerThread = numOverlappingWindows/numThreads;
startWnd = (threadId-1)*wndPerThread;
endWnd = threadId*wndPerThread;
if(numThreads==threadId){
endWnd = numOverlappingWindows;
}
}
float *pReal, *pImg;
for(unsigned i=startWnd; i<endWnd; i++){
pReal = new float[fftLen];
pImg = new float[fftLen];
memcpy(pReal, &rx[i*overlap], fftLen*sizeof(float));
memset(pImg, '0', fftLen);
FFT(pReal, pImg, fftLen); //perform an in place FFT
pLock->Acquire();
vWndFFT.push_back(pReal);
vWndFFT.push_back(pImg);
pLock->Release();
}
}
void TForm1::FFT(float *rx, float *ix, int fftSize)
{
int i, j, k, m;
float rxt, ixt;
m = log(fftSize)/log(2);
int fftSizeHalf = fftSize/2;
j = k = fftSizeHalf;
for (i = 1; i < (fftSize-1); i++){
if (i < j) {
rxt = rx[j];
ixt = ix[j];
rx[j] = rx[i];
ix[j] = ix[i];
rx[i] = rxt;
ix[i] = ixt;
}
k = fftSizeHalf;
while (k <= j){
j = j - k;
k = k/2;
}
j = j + k;
} //end for
int le, le2, l, ip;
float sr, si, ur, ui;
for (k = 1; k <= m; k++) {
le = pow(2, k);
le2 = le/2;
ur = 1;
ui = 0;
sr = cos(PI/le2);
si = -sin(PI/le2);
for (j = 1; j <= le2; j++) {
l = j - 1;
for (i = l; i < fftSize; i += le) {
ip = i + le2;
rxt = rx[ip] * ur - ix[ip] * ui;
ixt = rx[ip] * ui + ix[ip] * ur;
rx[ip] = rx[i] - rxt;
ix[ip] = ix[i] - ixt;
rx[i] = rx[i] + rxt;
ix[i] = ix[i] + ixt;
} //end for
rxt = ur;
ur = rxt * sr - ui * si;
ui = rxt * si + ui * sr;
}
}
}
While it's easy to divide this process over multiple threads, the performance is only marginally improved compared to the single-threaded version (<10%).
Interestingly if I increase the number of threads to, say, 100, I do get an increase in speed of about 25%, which is surprising because
I'd expect that thread context-switching overhead be a factor in this case.
At first I thought that the main reason for the poor performance is a lock on writing to a vector object so I experimented with an array of vectors (a
vector per thread), thus eliminiting the need for the locks but the performance remained pretty much the same.
pVfft = new vector<float*>[numThreads];//create an array of vectors
//and then in CalculateWindowFFTs, do something like
vector<float*> &vThr = pVfft[threadId-1];
for(unsigned i=startWnd; i<endWnd; i++){
pReal = new float[fftLen];
pImg = new float[fftLen];
memcpy(pReal, &rx[i*overlap], fftLen*sizeof(float));
memset(pImg, '0', fftLen);
FFT(pReal, pImg, fftLen); //perform an in place FFT
vThr.push_back(pReal);
}
I think I'm running into caching problems here though I'm not certain how to go about changing my design in order to have a solution that scales well.
I can also provide the code for TWorkerThread if you think that's important.
Any help is much appreciated.
Thanks
UPDATE:
As suggested by 1201ProgramAlarm I removed that while loop and got about 15-20% speed improvement on my system. Now my main thread is not actively waiting for the threads to finish but rather I have TWorkerThread execute code on the main thread via TThread::Synchronize after all the worker threads have finished (i.e.when numThreads has reached 0).
While this is looking better now, it's still far from being optimal.
The locks to write to vWndFFT will hurt, as will the repeated (leaking) calls to new assigned to pReal and pImg (these should be outside the for loop).
But the real performance killer is probably your loop waiting for the threads to finish: while(TWorkerThread::GetNumThreads()>0);. This will consume one available thread in a very unfriendly way.
One quick fix (not recommended) would be to add a sleep(1) (or 2, 5, or 10) so the loop is not continuous.
A better solution would be to have the main thread be one of your calculation threads, and have a way for that thread (once it is done with all processing) to simply wait for the other thread to finish without consuming a core, using something like WaitForMultipleObjects that is available on Windows.
One simple way to try out your threaded code is simply to run threaded, but only use one thread. Performance should be about the same as the non-threaded version, and the results should match.
Trying to increase the width of an image array to return to an opencv mat. The problem is speed when the temp_mat array needs to be shifted by a certain amount as the image increases in size. See function below:
This line will run with good speed:
//temp_mat[height][width] = in_mat[i][j];
But the speed decreases by a lot when changed to:
temp_mat[height][width + int(((width - middle_point) * -1) * FLOAT_HERE)] = in_mat[i][j];
The loop takes many milliseconds longer to run. Here is the complete function, variable names have been changed.
#define D_HEIGHT 1000
#define D_WIDTH 1200
int DEFAULT_HEIGHT = 1000;
int DEFAULT_WIDTH = 1200;
float FLOAT_HERE = .04;
static int temp_mat[D_HEIGHT][D_WIDTH];
cv::Mat get_mat(int in_mat[D_HEIGHT][300]){
int height = 0;
int width = 0;
int middle_point = DEFAULT_WIDTH/2;
for(int i=0;i < DEFAULT_HEIGHT;i++){
width = 0;
for(int j =0;j < DEFAULT_WIDTH / 4;j++){
for(int il = 0; il < DEFAULT_WIDTH / (DEFAULT_WIDTH/4); il++){
//This is to slow, but what I need
temp_mat[height][width + int(((width - middle_point) * -1) * FLOAT_HERE)] = in_mat[i][j];
//This is ok
//temp_mat[height][width] = in_mat[i][j];
width++;
}
}
height++;
}
return cv::Mat(D_HEIGHT,D_WIDTH,CV_8UC4,temp_mat);
}
Any ideas to make it faster are welcome. I am hoping to avoid a new thread.
You are doing that wrong just use Affine Transformation and OpenCV will do this in fastest possible way.
Even though DEFAULT_WIDTH is not declared const it appears to be used as a constant, and the naming of the variable suggests it as well. You should probably make it constant, even though that in it self will not improve performance. I say this because you are calculating a middle_point that is then also constant, and can be pre calculated. The same goes for the FLOAT_HERE, which also appears to be constant.
Having made those constant the only variable in the calculation, which you make multiple times is the width variable. Since you are always looping the same number of iterations, you might consider pre-calculating the different values, simply creating a cache of values instead of calculating on the fly.
For each value of width you can create a corresponding calculated value, you can store this in an array where the index is the width, and the value is what is calculated:
int width_cache[DEFAULT_WIDTH];
...
for (int i = 0; i < DEFAULT_WIDTH; ++i) {
width_cache[i] = i + int(((i - middle_point) * -1) * FLOAT_HERE);
}
In your loop, you could then do:
temp_mat[height][width_cache[width]] = in_mat[i][j];
I wrote a program that loads, saves, and performs the fft and ifft on black and white png images. After much debugging headache, I finally got some coherent output only to find that it distorted the original image.
input:
fft:
ifft:
As far as I have tested, the pixel data in each array is stored and converted correctly. Pixels are stored in two arrays, 'data' which contains the b/w value of each pixel and 'complex_data' which is twice as long as 'data' and stores real b/w value and imaginary parts of each pixel in alternating indices. My fft algorithm operates on an array structured like 'complex_data'. After code to read commands from the user, here's the code in question:
if (cmd == "fft")
{
if (height > width) size = height;
else size = width;
N = (int)pow(2.0, ceil(log((double)size)/log(2.0)));
temp_data = (double*) malloc(sizeof(double) * width * 2); //array to hold each row of the image for processing in FFT()
for (i = 0; i < (int) height; i++)
{
for (j = 0; j < (int) width; j++)
{
temp_data[j*2] = complex_data[(i*width*2)+(j*2)];
temp_data[j*2+1] = complex_data[(i*width*2)+(j*2)+1];
}
FFT(temp_data, N, 1);
for (j = 0; j < (int) width; j++)
{
complex_data[(i*width*2)+(j*2)] = temp_data[j*2];
complex_data[(i*width*2)+(j*2)+1] = temp_data[j*2+1];
}
}
transpose(complex_data, width, height); //tested
free(temp_data);
temp_data = (double*) malloc(sizeof(double) * height * 2);
for (i = 0; i < (int) width; i++)
{
for (j = 0; j < (int) height; j++)
{
temp_data[j*2] = complex_data[(i*height*2)+(j*2)];
temp_data[j*2+1] = complex_data[(i*height*2)+(j*2)+1];
}
FFT(temp_data, N, 1);
for (j = 0; j < (int) height; j++)
{
complex_data[(i*height*2)+(j*2)] = temp_data[j*2];
complex_data[(i*height*2)+(j*2)+1] = temp_data[j*2+1];
}
}
transpose(complex_data, height, width);
free(temp_data);
free(data);
data = complex_to_real(complex_data, image.size()/4); //tested
image = bw_data_to_vector(data, image.size()/4); //tested
cout << "*** fft success ***" << endl << endl;
void FFT(double* data, unsigned long nn, int f_or_b){ // f_or_b is 1 for fft, -1 for ifft
unsigned long n, mmax, m, j, istep, i;
double wtemp, w_real, wp_real, wp_imaginary, w_imaginary, theta;
double temp_real, temp_imaginary;
// reverse-binary reindexing to separate even and odd indices
// and to allow us to compute the FFT in place
n = nn<<1;
j = 1;
for (i = 1; i < n; i += 2) {
if (j > i) {
swap(data[j-1], data[i-1]);
swap(data[j], data[i]);
}
m = nn;
while (m >= 2 && j > m) {
j -= m;
m >>= 1;
}
j += m;
};
// here begins the Danielson-Lanczos section
mmax = 2;
while (n > mmax) {
istep = mmax<<1;
theta = f_or_b * (2 * M_PI/mmax);
wtemp = sin(0.5 * theta);
wp_real = -2.0 * wtemp * wtemp;
wp_imaginary = sin(theta);
w_real = 1.0;
w_imaginary = 0.0;
for (m = 1; m < mmax; m += 2) {
for (i = m; i <= n; i += istep) {
j = i + mmax;
temp_real = w_real * data[j-1] - w_imaginary * data[j];
temp_imaginary = w_real * data[j] + w_imaginary * data[j-1];
data[j-1] = data[i-1] - temp_real;
data[j] = data[i] - temp_imaginary;
data[i-1] += temp_real;
data[i] += temp_imaginary;
}
wtemp = w_real;
w_real += w_real * wp_real - w_imaginary * wp_imaginary;
w_imaginary += w_imaginary * wp_real + wtemp * wp_imaginary;
}
mmax=istep;
}}
My ifft is the same only with the f_or_b set to -1 instead of 1. My program calls FFT() on each row, transposes the image, calls FFT() on each row again, then transposes back. Is there maybe an error with my indexing?
Not an actual answer as this question is Debug only so some hints instead:
your results are really bad
it should look like this:
first line is the actual DFFT result
Re,Im,Power is amplified by a constant otherwise you would see a black image
the last image is IDFFT of the original not amplified Re,IM result
the second line is the same but the DFFT result is wrapped by half size of image in booth x,y to match the common results in most DIP/CV texts
As you can see if you IDFFT back the wrapped results the result is not correct (checker board mask)
You have just single image as DFFT result
is it power spectrum?
or you forget to include imaginary part? to view only or perhaps also to computation somewhere as well?
is your 1D **DFFT working?**
for real data the result should be symmetric
check the links from my comment and compare the results for some sample 1D array
debug/repair your 1D FFT first and only then move to the next level
do not forget to test Real and complex data ...
your IDFFT looks BW (no gray) saturated
so did you amplify the DFFT results to see the image and used that for IDFFT instead of the original DFFT result?
also check if you do not round to integers somewhere along the computation
beware of (I)DFFT overflows/underflows
If your image pixel intensities are big and the resolution of image too then your computation could loss precision. Newer saw this in images but if your image is HDR then it is possible. This is a common problem with convolution computed by DFFT for big polynomials.
Thank you everyone for your opinions. All that stuff about memory corruption, while it makes a point, is not the root of the problem. The sizes of data I'm mallocing are not overly large, and I am freeing them in the right places. I had a lot of practice with this while learning c. The problem was not the fft algorithm either, nor even my 2D implementation of it.
All I missed was the scaling by 1/(M*N) at the very end of my ifft code. Because the image is 512x512, I needed to scale my ifft output by 1/(512*512). Also, my fft looks like white noise because the pixel data was not rescaled to fit between 0 and 255.
Suggest you look at the article http://www.yolinux.com/TUTORIALS/C++MemoryCorruptionAndMemoryLeaks.html
Christophe has a good point but he is wrong about it not being related to the problem because it seems that in modern times using malloc instead of new()/free() does not initialise memory or select best data type which would result in all problems listed below:-
Possibly causes are:
Sign of a number changing somewhere, I have seen similar issues when a platform invoke has been used on a dll and a value is passed by value instead of reference. It is caused by memory not necessarily being empty so when your image data enters it will have boolean maths performed on its values. I would suggest that you make sure memory is empty before you put your image data there.
Memory rotating right (ROR in assembly langauge) or left (ROL) . This will occur if data types are being used which do not necessarily match, eg. a signed value entering an unsigned data type or if the number of bits is different in one variable to another.
Data being lost due to an unsigned value entering a signed variable. Outcomes are 1 bit being lost because it will be used to determine negative or positive, or at extremes if twos complement takes place the number will become inverted in meaning, look for twos complement on wikipedia.
Also see how memory should be cleared/assigned before use. http://www.cprogramming.com/tutorial/memory_debugging_parallel_inspector.html
I'm writing a renderer using low-level SDL functions to learn how it all works. I am now trying to do polygon drawing, but I run into errors possibly due to my inexperience with C++. When running the code I get a munmap_chunk() - Invalid pointer error. Searching reveals that it is most likely due to free()-ing the memory twice. The error happens when returning from the function. I realize that the error comes from automatically free()ing memory which has been automatically free()d before, but I'm not experienced enough with C++ to spot the error. Any clues?
My code:
void DrawPolygon (const vector<vec3> & verts, vec3 color){
// 0. Project to the screen
vector<ivec2> vertices(verts.size());
for(int i = 0; i < verts.size(); i++){
VertexShader(verts.at(i), vertices.at(i));
}
// 1. Find max and min y-value of the polygon
// and compute the number of rows it occupies.
int miny = vertices[0].y;
int maxy = vertices[0].y;
for (int i = 1; i < 3; i++){
if (vertices[i].y < miny){
miny = vertices[i].y;
}
if (vertices[i].y > maxy){
maxy = vertices[i].y;
}
}
int rows = abs(maxy - miny) + 1;
// 2. Resize leftPixels and rightPixels
// so that they have an element for each row.
vector<ivec2> leftPixels(rows);
vector<ivec2> rightPixels(rows);
// 3. Initialize the x-coordinates in leftPixels
// to some really large value and the x-coordinates
// in rightPixels to some really small value.
for(int i = 0; i < rows; i++){
leftPixels[i].x = std::numeric_limits<int>::max();
rightPixels[i].x = std::numeric_limits<int>::min();
leftPixels[i].y = miny + i;
rightPixels[i].y = miny + i;
}
// 4. Loop through all edges of the polygon and use
// linear interpolation to find the x-coordinate for
// each row it occupies. Update the corresponding
// values in rightPixels and leftPixels.
for(int i = 0; i < 3; i++){
ivec2 a = vertices[i];
ivec2 b = vertices[(i+1)%3];
// find the number of pixels to draw
ivec2 delta = glm::abs(a - b);
int pixels = glm::max(delta.x, delta.y) + 1;
// interpolate to find the pixels
vector<ivec2> line (pixels);
Interpolate(a, b, line);
for(int j = 0; j < pixels; j++){
ivec2 p = line[j];
ivec2 cmpl = leftPixels[p.y - miny];
ivec2 cmpr = rightPixels[p.y - miny];
if(p.x < cmpl.x){
leftPixels[p.y - miny].x = p.x;
//leftPixels[p.y - miny] = cmpl;
}
if(p.x > cmpr.x){
rightPixels[p.y - miny].x = p.x;
//cmpr.x = p.x;
//rightPixels[p.y - miny] = cmpr;
}
}
}
for(int i = 0; i < leftPixels.size(); i++){
ivec2 l = leftPixels.at(i);
ivec2 r = rightPixels.at(i);
// y coord the same, iterate over x
int y = l.y;
for(int x = l.x; x <= r.x; x++){
PutPixelSDL(screen, x, y, color);
}
}
}
Using valgrind gives me this output (this is the first error it reports). Weirdly, the program recovers and keeps running with the expected result, apparently not getting the same error again:
==5706== Invalid write of size 4
==5706== at 0x40AD61: DrawPolygon(std::vector<glm::detail::tvec3<float>, std::allocator<glm::detail::tvec3<float> > > const&, glm::detail::tvec3<float>) (in /home/actimia/prog/dgi14/lab3/ThirdLab)
==5706== by 0x409C78: Draw() (in /home/actimia/prog/dgi14/lab3/ThirdLab)
==5706== by 0x409668: main (in /home/actimia/prog/dgi14/lab3/ThirdLab)
I think my previous post on similar topic would be useful.
https://stackoverflow.com/a/22658693/2724703
From your Valgrind report, it look like your program is doing memory corruption due to overflow. This does not seems like "double free" error(this is overflow scenario). You have mentioned that sometime valgrind is not reporting any error this makes this problem more difficult. However there is certainly a memory corruption and you must fix them. Memory error sometime occur intermittent due to various reason(different input parameter, multi-threaded, change of execution sequence).