First off, let me say that everything works like a charm, but the memory deallocation. Maybe it's pure coincidence that I do not have any problems and my basic understanding is wrong.
Also, please refrain from telling me "Use std::vector instead" or the like, as I want to learn and understand the basics first!
Now to the code:
class Test
{
public:
Test(const unsigned int xDim, const unsigned int yDim, const unsigned int zDim);
~Test();
private:
unsigned int xDim, yDim, zDim;
TestStructure **testStructure;
void init();
int to1D(int x, int y, int z) { return x + (y * xDim) + (z * xDim * yDim); }
}
cpp:
Test::Test(const unsigned int xDim, const unsigned int yDim, const unsigned int zDim)
{
this->xDim = xDim;
this->yDim = yDim;
this->zDim = zDim;
this->testStructure = new TestStructure *[xDim * yDim * zDim];
init();
}
void Test::init()
{
for(int x = 0; x < xDim; x++)
for(int y = 0; y < yDim; y++)
for(int z = 0; z < zDim; z++)
{
this->testStructure[to1D(x, y, z)] = new TestStructure(
x - xDim / 2,
y - yDim / 2,
z - zDim / 2);
...
}
Now for the destructor I tried two ways:
Test::~Test()
{
for(int x = 0; x < xDim; x++)
for(int y = 0; y < yDim; y++)
for(int z = 0; z < zDim; z++) {
free(this->testStructure[to1D(x, y, z)]);
}
free(this->testStructure);
}
and
Test::~Test()
{
for(int a = 0; a < xDim * yDim * zDim; a++) {
free(this->testStructure[a]);
}
free(this->testStructure);
}
I am pretty sure the second is like the first, but In both cases I get:
==3854==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new vs free) on 0x62100008e900
Where are my mistakes? Throw them at my face and let me learn!
edit: Basic mistake: use delete (for new) instead of free (for malloc).
Replacing the frees above with deletes leads to:
==4076==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x60600002c300 in thread T0:
edit2:
Pardon me, with delete[] for the array part it leads to:
==4138==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100008e8f8 at pc 0x000000480b3d bp 0x7ffc5bcbc0f0 sp 0x7ffc5bcbc0e0
edit3/SOLUTION: Did it the wrong way. This seems to work:
Test::~Test()
{
for(int a = 0; a < xDim * yDim * zDim; a++) {
delete(this->testStructure[a]);
}
delete[] this->testStructure;
}
Big thanks #SomeProgrammerDude
Related
This is the relevant code:
void init(int f, int h, std::vector<std::vector<double> > **offsets) {
for (int i = 0; i < 2 * h + 1; i++) { // for all frames
// vector of (x, y, d)'s
for (int y = 0; y < H; y++) {
for (int x = 0; x < W; x++) {
random_offsets(offsets[y][x]);
print_vecs(offsets[y][x]);
}
}
}
}
int main(int argc, char **argv) {
int h = 6;
Halide::Runtime::Buffer<uint8_t> frame[2*h + 1];
int f = 6;
for (int i = -h; i < h; i++) {
frame[i] = load_image("test1/small/" + std::to_string(f + i) + ".png");
}
W = frame[0].width();
H = frame[0].height();
std::vector<std::vector<double> > offsets[H][W];
init(6, h, offsets);
return 0;
}
I create an offsets 2d array and pass it into the init function. But this gives the following error upon compilation:
note: candidate function not viable: no known conversion from 'std::vector<std::vector<double> > [H][W]' to
'std::vector<std::vector<double> > **' for 3rd argument
void init(int f, int h, std::vector<std::vector<double> > **offsets) {
How can I fix this?
After following one of the comments and changing to vector ...(H, vector...)...
I have the following code altogether:
void random_offsets(std::vector<std::vector<double> > &offsets) {
for (int i = 0; i < k; i++) {
std::vector<double> offset;
offset.push_back(W/3 * ((double) rand() / (RAND_MAX)) * 2 - 1);
offset.push_back(W/3 * ((double) rand() / (RAND_MAX)) * 2 - 1);
offsets.push_back(offset);
}
}
void init(int f, int h, std::vector<std::vector<double> > &offsets) {
for (int i = 0; i < 2 * h + 1; i++) { // for all frames
// vector of (x, y, d)'s
for (int y = 0; y < H; y++) {
for (int x = 0; x < W; x++) {
random_offsets(offsets[y][x]);
print_vecs(offsets[y][x]);
}
}
}
}
int main(int argc, char **argv) {
int h = 6;
Halide::Runtime::Buffer<uint8_t> frame[2*h + 1];
int f = 6;
for (int i = -h; i < h; i++) {
frame[i] = load_image("test1/small/" + std::to_string(f + i) + ".png");
}
W = frame[0].width();
H = frame[0].height();
std::vector<std::vector<double>> offsets(H, std::vector<double>(W));
init(6, h, offsets);
return 0;
}
But I get the following error:
init.cpp:43:2: error: no matching function for call to 'random_offsets'
random_offsets(offsets[y][x]);
^~~~~~~~~~~~~~
init.cpp:14:6: note: candidate function not viable: no known conversion from 'std::__1::__vector_base<double, std::__1::allocator<double> >::value_type'
(aka 'double') to 'std::vector<std::vector<double> > &' for 1st argument
void random_offsets(std::vector<std::vector<double> > &offsets) {
^
Seems like what I really needed was, assuming that the 2d array of 2d vectors is not going to work, was:
std::vector<std::vector<std::vector<std::vector<double> > > > offsets(H, std::vector<std::vector<std::vector<double> > > (W, std::vector<std::vector<double> > (k, std::vector<double> (3))));
This std::vector<std::vector<double> > **offsets doesn't do what you think it does. This is a pointer to a pointer to a std::vector.... What you are likely looking for is pass by reference:
void init(int f, int h, std::vector<std::vector<double> > &offsets)
This fits how you are calling this function and how you use the object internally.
There is a pretty good overview of this here: What's the difference between passing by reference vs. passing by value? but I would highly recomend that you investigate a good c++ book here: The Definitive C++ Book Guide and List
I have two overloaded functions: "ChooseElements", which chooses elements from passed array, and "SortElements", which sorts elements of passed array. One pair works with INT data, and another one with FLOAT.
int * ChooseElements(int * X, int n, int & m)
{
int * Y = NULL;
for (int i = 0; i < n; i++)
{
if (X[i] > 0)
{
if (Y == NULL)
{
m = 1;
Y = new int[1];
Y[0] = X[i];
}
else
{
m++;
Y = (int *)realloc(Y, sizeof(int) * m);
Y[m - 1] = X[i];
}
}
}
return Y;
}
float * ChooseElements(float * X, int n, int & m)
{
float * Y = NULL;
for (int i = 0; i < n; i++)
{
if (X[i] > 0)
{
if (Y == NULL)
{
m = 1;
Y = new float[1];
Y[0] = X[i];
}
else
{
m++;
Y = (float *)realloc(Y, sizeof(float) * m);
Y[m - 1] = X[i];
}
}
}
return Y;
}
and
int * SortElements(int m, int *& Y)
{
for (int i = 1; i < m; i++)
{
for (int j = 0; j < m - i; j++)
{
if (Y[j] > Y[j + 1])
{
int Temp = Y[j];
Y[j] = Y[j + 1];
Y[j + 1] = Temp;
}
}
}
return Y;
}
float * SortElements(int m, float *& Y)
{
for (int i = 1; i < m; i++)
{
for (int j = 0; j < m - i; j++)
{
if (Y[j] > Y[j + 1])
{
float Temp = Y[j];
Y[j] = Y[j + 1];
Y[j + 1] = Temp;
}
}
}
return Y;
}
What I want to do is pass first function as argument to second one. Like that:
int n, m;
int * X = NULL, * Y = NULL;
/* ...
Some code in which n and X are initialized
... */
Y = SortElements(m, ChooseElements(X, n, m));
However, when I try to do that, Visual Studio 2017 tells me:
no instance of overloaded function "SortElements" matches the argument list
argument types are: (int, int *)
If I do this instead:
Y = ChooseElements(X, n, m);
Y = SortElements(m, Y);
everything works fine.
If I remove overloads and leave only INT pair and once again try
int n, m;
int * X = NULL, * Y = NULL;
/* ...
Some code in which n and X are initialized
... */
Y = SortElements(m, ChooseElements(X, n, m));
I get another problem:
int *ChooseElements(int *X, int n, int &m)
initial value of reference to non-const value must be an lvalue
What am I doing wrong? My teacher asks for a function which uses another function as an argument. What I have written does not work, and I have no idea what could be done here.
In your int * SortElements(int m, int *& Y)
function you are using : int *& Y. So you have a reference to a int pointer. My guess is that you don't need that.
You can just use int * Y as a parameter as a solution.
Int *& Y - needs an lvalue(like your variable Y) but your ChooseElements function returns only a temporary object(rvalue) because you are returning by value.
this is a part of my original code, the code is too big to put it all in here,
anyway my question is only related to Sads 4D matrix,
I don't want to use the int**** like it was suggested to me in my previous question
int main()
{
//4D matrix
int**** Sads = new int***[inputImage->HeightLines];
for (size_t i = 0; i < inputImage->HeightLines; i++)
{
Sads[i] = new int**[inputImage->WidthColumns];
for (size_t j = 0; j < inputImage->WidthColumns; j++)
{
Sads[i][j] = new int*[W_SIZE];
for (size_t k = 0; k < W_SIZE; k++)
{
Sads[i][j][k] = new int[W_SIZE];
}
}
}
ProcessRowsLoop(20, 1904, Sads);
}
void ProcessRowsLoop(int m_support, int m_height, int**** sads)
{
for (int row_in = m_support - 1; row_in < m_Height_in; row_in += BNLM_OUT_SZ)
{
ProcessRow( &Sads[indexRow]);
}
}
void ProcessRow(int**** sads)
{
int m_SAD_00[W_SIZE][W_SIZE];
int m_SAD_01[W_SIZE][W_SIZE];
int m_SAD_10[W_SIZE][W_SIZE];
int m_SAD_11[W_SIZE][W_SIZE];
RunAlgo(m_support, m_SAD_00, m_SAD_01, m_SAD_10, m_SAD_11, m_CP_00, m_CP_01, m_CP_10, m_CP_11, m_ColumnSADUp, m_ColumnSADDown);
for (size_t i = 0; i < W_SIZE; i++)
{
for (size_t j = 0; j < W_SIZE; j++)
{
Sads[0][m_col_out][i][j] = (m_SAD_00[i][j] + color_penalty_weight * m_CP_00[i][j]) / (sqrt(m_sigma_patch[0][0] / pow(mnm, 2)));
Sads[0][m_col_out + 1][i][j] = (m_SAD_01[i][j] + color_penalty_weight * m_CP_01[i][j]) / (sqrt(m_sigma_patch[0][1] / pow(mnm, 2)));
Sads[1][m_col_out][i][j] = (m_SAD_10[i][j] + color_penalty_weight + m_CP_10[i][j]) / (sqrt(m_sigma_patch[1][0] / pow(mnm, 2)));
Sads[1][m_col_out + 1][i][j] = (m_SAD_11[i][j] + color_penalty_weight + m_CP_11[i][j]) / (sqrt(m_sigma_patch[1][1] / pow(mnm, 2)));
}
}
}
In my new code I would like to replace the 4D matrix int**** Sads, with
struct VectorFourD
{
private:
int _width, _height;
int _w_size;
std::vector<int> _vec;
public:
VectorFourD(int width, int height, int size) : _width(width), _height(height), _w_size(size), _vec(totalSize())
{
}
auto totalSize() const-> int
{
return _width * _height * _w_size * _w_size;
}
int* at(int a)
{
return _vec.data() + (a * _height * _w_size * _w_size);
}
int* at(int a, int b)
{
return at(a) + (b * _w_size * _w_size);
}
int *at(int a, int b, int c)
{
return at(a, b) + (c* _w_size);
}
int& at(int a, int b, int c, int d)
{
return *(at(a, b, c) + d);
}
};
you can see that i'm iterating two lines at the same time in the function processrow()
running over
Sads[0][m_col_out][i][j], Sads[0][m_col_out + 1][i][j],
Sads[1][m_col_out][i][j], Sads[1][m_col_out + 1][i][j]
at the same time, my question is how do I change my code to work with the new 4dvector for example
int main()
{
VectorFourD SadsVec = VectorFourD(inputImage->HeightLines, inputImage->WidthColumns, W_SIZE);
ProcessRowsLoop(20, 1904, SadsVec);
}
also change the function void ProcessRowsLoop(int m_support, int m_height, VectorFourD* SadsVec)
but i don't know how to continue from here, can you please help?
My problem is as follows:
I am designing a small game; however, I have run into a very large problem, which I have been trying to fix for some time now. Essentially, I want to upgrade buildings, if the use has enough points, but the data in the Building objects are being corrupted. The only object which is as it is 'supposed' to be, is the first allocated object in the buildings vector.
Building Class:
When I run the program I am faced with a black screen (meaning it began properly); and when I debug, I get an error like such: Access violation reading location 0x00000008. Meaning a NULL value has been used.
class Building {
public:
int x = 0, y = 0;
vector<int> buildingID;
vector<int> upgradeCost;
int size = 4;
Building(vector<int> buildingID, int x, int y, vector<int> upgradeCost)
: buildingID(buildingID), x(x), y(y), upgradeCost(upgradeCost) { }
virtual void upgrade();
void drawTile(SDL_Rect, SDL_Surface*);
int buildingLevel = 1;
protected:
};
void Building::upgrade() {
if((buildingLevel+1) <= size)buildingLevel += 1;
}
void Building::drawTile(SDL_Rect drawRect, SDL_Surface* drawnTo) {
Tile::Tiles.at(buildingID[buildingLevel - 1]).drawTile(drawRect, drawnTo);
}
The function which generates the buildings:
void Level::generateTerrain() {
for (int i = 0; i < width; i++)
for (int j = 0; j < height; j++) {
int tile = rand()%100;
if (tile >= 25 && tile <= 28) this->tiles.at(i + (j*this->width)) = 2;
else if (tile < 24) this->tiles.at(i + (j*this->width)) = 1;
else if (tile == 29) {
this->addBuilding(Building(vector<int>{4, 3, 2, 1}, i * 75, j * 75, vector<int>{1, 1, 1, 1}), i, j);
}
else this->tiles.at(i + (j*this->width)) = 0;
}
}
The function which adds buildings:
void Level::addBuilding(Building building, int x, int y) {
buildings.push_back(building);
tiles.at(x + (y*this->width)) = buildID(building.buildingID[building.buildingLevel-1], &buildings.at(buildings.size()-1));
}
And lastly the function which draws the Tiles/Buildings:
void Level::drawLevel(int x, int y, int width, int height, SDL_Surface* drawnTo, int beginningX, int beginningY) {
SDL_Rect tempRect;
tempRect.w = 75;
tempRect.h = 75;
for (int i = x; i <= (x + width); i++)
for (int j = y; j <= (y + height); j++) {
if (tiles.at(i + (j*this->width)).id == 999999) continue;
tempRect.x = (i*Tile::Tiles.at(tiles.at(i + (j*this->width)).id).tileSurface->w) + beginningX;
tempRect.y = (j*Tile::Tiles.at(tiles.at(i + (j*this->width)).id).tileSurface->h) + beginningY;
Tile::Tiles.at(tiles.at(i + (j*this->width)).id).drawTile(tempRect, drawnTo);
}
}
If you require any more pieces of the code, please just ask.
Thanks for any help given.
The only part of your code that looks suspect to me is how in addBuilding() you are using the address of an element in the vector in the second parameter to buildID(). Vector class re-allocates the memory it uses when it needs to grow in capacity, so existing elements will likely no longer be at the same address your pointers point to when this happens.
I've modified a raytracer I wrote a while ago for educational purposes to take advantage of multiprocessing using OpenMP. However, I'm not seeing any profit from the parallelization.
I've tried 3 different approaches: a task-pooled environment (the draw_pooled() function), a standard OMP parallel nested for loop with image row-level parallelism (draw_parallel_for()), and another OMP parallel for with pixel-level parallelism (draw_parallel_for2()). The original, serial drawing routine is also included for reference (draw_serial()).
I'm running a 2560x1920 render on an Intel Core 2 Duo E6750 (2 cores # 2,67GHz each w/Hyper-Threading) and 4GB of RAM under Linux, binary compiled by gcc with libgomp. The scene takes an average of:
120 seconds to render in series,
but 196 seconds (sic!) to do so in parallel in 2 threads (the default - number of CPU cores), regardless of which of the three particular methods above I choose,
if I override OMP's default thread number with 4 to take HT into account, the parallel render times drop to 177 seconds.
Why is this happening? I can't see any obvious bottlenecks in the parallel code.
EDIT: Just to clarify - the task pool is only one of the implementations, please do read the question - scroll down to see the parallel fors. Thing is, they are just as slow as the task pool!
void draw_parallel_for(int w, int h, const char *fname) {
unsigned char *buf;
buf = new unsigned char[w * h * 3];
Scene::GetInstance().PrepareRender(w, h);
for (int y = 0; y < h; ++y) {
#pragma omp parallel for num_threads(4)
for (int x = 0; x < w; ++x)
Scene::GetInstance().RenderPixel(x, y, buf + (y * w + x) * 3);
}
write_png(buf, w, h, fname);
delete [] buf;
}
void draw_parallel_for2(int w, int h, const char *fname) {
unsigned char *buf;
buf = new unsigned char[w * h * 3];
Scene::GetInstance().PrepareRender(w, h);
int x, y;
#pragma omp parallel for private(x, y) num_threads(4)
for (int xy = 0; xy < w * h; ++xy) {
x = xy % w;
y = xy / w;
Scene::GetInstance().RenderPixel(x, y, buf + (y * w + x) * 3);
}
write_png(buf, w, h, fname);
delete [] buf;
}
void draw_parallel_for3(int w, int h, const char *fname) {
unsigned char *buf;
buf = new unsigned char[w * h * 3];
Scene::GetInstance().PrepareRender(w, h);
#pragma omp parallel for num_threads(4)
for (int y = 0; y < h; ++y) {
for (int x = 0; x < w; ++x)
Scene::GetInstance().RenderPixel(x, y, buf + (y * w + x) * 3);
}
write_png(buf, w, h, fname);
delete [] buf;
}
void draw_serial(int w, int h, const char *fname) {
unsigned char *buf;
buf = new unsigned char[w * h * 3];
Scene::GetInstance().PrepareRender(w, h);
for (int y = 0; y < h; ++y) {
for (int x = 0; x < w; ++x)
Scene::GetInstance().RenderPixel(x, y, buf + (y * w + x) * 3);
}
write_png(buf, w, h, fname);
delete [] buf;
}
std::queue< std::pair<int, int> * > task_queue;
void draw_pooled(int w, int h, const char *fname) {
unsigned char *buf;
buf = new unsigned char[w * h * 3];
Scene::GetInstance().PrepareRender(w, h);
bool tasks_issued = false;
#pragma omp parallel shared(buf, tasks_issued, w, h) num_threads(4)
{
#pragma omp master
{
for (int y = 0; y < h; ++y) {
for (int x = 0; x < w; ++x)
task_queue.push(new std::pair<int, int>(x, y));
}
tasks_issued = true;
}
while (true) {
std::pair<int, int> *coords;
#pragma omp critical(task_fetch)
{
if (task_queue.size() > 0) {
coords = task_queue.front();
task_queue.pop();
} else
coords = NULL;
}
if (coords != NULL) {
Scene::GetInstance().RenderPixel(coords->first, coords->second,
buf + (coords->second * w + coords->first) * 3);
delete coords;
} else {
#pragma omp flush(tasks_issued)
if (tasks_issued)
break;
}
}
}
write_png(buf, w, h, fname);
delete [] buf;
}
You have a critical section inside your innermost loop. In other words, you're hitting a synchronization primitive per pixel. That's going to kill performance.
Better split the scene in tiles and work one on each thread. That way, you have a longer time (a whole tile's worth of processing) between synchronizations.
If the pixels are independent you don't actually need any locking. You can just divide up the image into rows or columns and let the threads work on their own. For example, you could have each thread operate on every nth row (pseudocode):
for(int y = TREAD_NUM; y < h; y += THREAD_COUNT)
for(int x = 0; x < w; ++x)
render_pixel(x,y);
Where THREAD_NUM is a unique number for each thread such that 0 <= THREAD_NUM < THREAD_COUNT. Then after you join your threadpool, perform the png conversion.
There is always an performance overhead while creating threads. OMP Parallel inside a for loop will obviously generate lot of overhead. For example, in your code
void draw_parallel_for(int w, int h, const char *fname) {
for (int y = 0; y < h; ++y) {
// Here There is a lot of overhead
#pragma omp parallel for num_threads(4)
for (int x = 0; x < w; ++x)
Scene::GetInstance().RenderPixel(x, y, buf + (y * w + x) * 3);
}
}
It can be re-written as
void draw_parallel_for(int w, int h, const char *fname) {
#pragma omp parallel for num_threads(4)
for (int y = 0; y < h; ++y) {
for (int x = 0; x < w; ++x)
Scene::GetInstance().RenderPixel(x, y, buf + (y * w + x) * 3);
}
}
or
void draw_parallel_for(int w, int h, const char *fname) {
#pragma omp parallel num_threads(4)
for (int y = 0; y < h; ++y) {
#pragma omp for
for (int x = 0; x < w; ++x)
Scene::GetInstance().RenderPixel(x, y, buf + (y * w + x) * 3);
}
}
By this way, you will eliminate the overhead