AVX512 illegal instruction - c++

In my previous post I explain that I am starting with AVX to speed up my code (please, note that although there are parts in common this post refers to AVX512 and the previous one to AVX2 which as far as I know are slightly different and need different compiling flags). After experimenting with AVX2 I decided to try with AVX512 and changed my AVX2 function:
void getDataAVX2(u_char* data, size_t cols, std::vector<double>& info)
{
__m256d dividend = _mm256_set_pd(1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0);
info.resize(cols);
__m256d result;
for (size_t i = 0; i < cols / 4; i++)
{
__m256d divisor = _mm256_set_pd((double(data[4 * i + 3 + cols] << 8) + double(data[4 * i + 2 * cols + 3])),
(double(data[4 * i + 2 + cols] << 8) + double(data[4 * i + 2 * cols + 2])),
(double(data[4 * i + 1 + cols] << 8) + double(data[4 * i + 2 * cols + 1])),
(double(data[4 * i + cols] << 8) + double(data[4 * i + 2 * cols])));
result = _mm256_sqrt_pd(_mm256_mul_pd(divisor, dividend));
info[size_t(4 * i)] = result[0];
info[size_t(4 * i + 1)] = result[1];
info[size_t(4 * i + 2)] = result[2];
info[size_t(4 * i + 3)] = result[3];
}
}
for what I think should be its equivalent:
void getDataAVX512(u_char* data, size_t cols, std::vector<double>& info)
{
__m512d dividend = _mm512_set_pd(1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0);
info.resize(cols);
__m512d result;
for (size_t i = 0; i < cols / 8; i++)
{
__m512d divisor = _mm512_set_pd((double(data[4 * i + 7 + cols] << 8) + double(data[4 * i + 2 * cols + 7])),
(double(data[4 * i + 6 + cols] << 8) + double(data[4 * i + 2 * cols + 6])),
(double(data[4 * i + 5 + cols] << 8) + double(data[4 * i + 2 * cols + 5])),
(double(data[4 * i + 4 + cols] << 8) + double(data[4 * i + 2 * cols + 4])),
(double(data[4 * i + 3 + cols] << 8) + double(data[4 * i + 2 * cols + 3])),
(double(data[4 * i + 2 + cols] << 8) + double(data[4 * i + 2 * cols + 2])),
(double(data[4 * i + 1 + cols] << 8) + double(data[4 * i + 2 * cols + 1])),
(double(data[4 * i + cols] << 8) + double(data[4 * i + 2 * cols])));
result = _mm512_sqrt_pd(_mm512_mul_pd(divisor, dividend));
info[size_t(4 * i)] = result[0];
info[size_t(4 * i + 1)] = result[1];
info[size_t(4 * i + 2)] = result[2];
info[size_t(4 * i + 3)] = result[3];
info[size_t(4 * i + 4)] = result[4];
info[size_t(4 * i + 5)] = result[5];
info[size_t(4 * i + 6)] = result[6];
info[size_t(4 * i + 7)] = result[7];
}
}
which in a non AVX form is:
void getData(u_char* data, size_t cols, std::vector<double>& info)
{
info.resize(cols);
for (size_t i = 0; i < cols; i++)
{
info[i] = sqrt((double(data[cols + i] << 8) + double(data[2 * cols + i])) / 64.0);
;
}
}
After compiling the code I get the following error:
Illegal instruction (core dumped)
To my surprise, this error occurs in the call of sqrt in the getData function. If I remove the sqrt call then the error appears further forward, in the __m512d divisor = _mm512_set_pd((d..... Any ideas on what is happening?
Here is the full example.
Thank you very much.
I am compiling with c++ (7.3.0) with the following options -std=c++17 -Wall -Wextra -O3 -fno-tree-vectorize -mavx512f. I have checked as explained here and my CPU (Intel(R) Core(TM) i7-4710HQ CPU # 2.50GHz) supports AVX2. Should the list have AVX-512 to indicate support for this?

I don't think AVX-512 instructions are supported on your system (CPU). Taking the official documentation into consideration; it only mentions AVX-2. A newer CPU would indicate AVX-512 perfectly fine. Both can be found under the "Instruction Set Extensions" section.

Related

illegal instructruction interrupt intrinsics on Visual studio [duplicate]

In my previous post I explain that I am starting with AVX to speed up my code (please, note that although there are parts in common this post refers to AVX512 and the previous one to AVX2 which as far as I know are slightly different and need different compiling flags). After experimenting with AVX2 I decided to try with AVX512 and changed my AVX2 function:
void getDataAVX2(u_char* data, size_t cols, std::vector<double>& info)
{
__m256d dividend = _mm256_set_pd(1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0);
info.resize(cols);
__m256d result;
for (size_t i = 0; i < cols / 4; i++)
{
__m256d divisor = _mm256_set_pd((double(data[4 * i + 3 + cols] << 8) + double(data[4 * i + 2 * cols + 3])),
(double(data[4 * i + 2 + cols] << 8) + double(data[4 * i + 2 * cols + 2])),
(double(data[4 * i + 1 + cols] << 8) + double(data[4 * i + 2 * cols + 1])),
(double(data[4 * i + cols] << 8) + double(data[4 * i + 2 * cols])));
result = _mm256_sqrt_pd(_mm256_mul_pd(divisor, dividend));
info[size_t(4 * i)] = result[0];
info[size_t(4 * i + 1)] = result[1];
info[size_t(4 * i + 2)] = result[2];
info[size_t(4 * i + 3)] = result[3];
}
}
for what I think should be its equivalent:
void getDataAVX512(u_char* data, size_t cols, std::vector<double>& info)
{
__m512d dividend = _mm512_set_pd(1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0);
info.resize(cols);
__m512d result;
for (size_t i = 0; i < cols / 8; i++)
{
__m512d divisor = _mm512_set_pd((double(data[4 * i + 7 + cols] << 8) + double(data[4 * i + 2 * cols + 7])),
(double(data[4 * i + 6 + cols] << 8) + double(data[4 * i + 2 * cols + 6])),
(double(data[4 * i + 5 + cols] << 8) + double(data[4 * i + 2 * cols + 5])),
(double(data[4 * i + 4 + cols] << 8) + double(data[4 * i + 2 * cols + 4])),
(double(data[4 * i + 3 + cols] << 8) + double(data[4 * i + 2 * cols + 3])),
(double(data[4 * i + 2 + cols] << 8) + double(data[4 * i + 2 * cols + 2])),
(double(data[4 * i + 1 + cols] << 8) + double(data[4 * i + 2 * cols + 1])),
(double(data[4 * i + cols] << 8) + double(data[4 * i + 2 * cols])));
result = _mm512_sqrt_pd(_mm512_mul_pd(divisor, dividend));
info[size_t(4 * i)] = result[0];
info[size_t(4 * i + 1)] = result[1];
info[size_t(4 * i + 2)] = result[2];
info[size_t(4 * i + 3)] = result[3];
info[size_t(4 * i + 4)] = result[4];
info[size_t(4 * i + 5)] = result[5];
info[size_t(4 * i + 6)] = result[6];
info[size_t(4 * i + 7)] = result[7];
}
}
which in a non AVX form is:
void getData(u_char* data, size_t cols, std::vector<double>& info)
{
info.resize(cols);
for (size_t i = 0; i < cols; i++)
{
info[i] = sqrt((double(data[cols + i] << 8) + double(data[2 * cols + i])) / 64.0);
;
}
}
After compiling the code I get the following error:
Illegal instruction (core dumped)
To my surprise, this error occurs in the call of sqrt in the getData function. If I remove the sqrt call then the error appears further forward, in the __m512d divisor = _mm512_set_pd((d..... Any ideas on what is happening?
Here is the full example.
Thank you very much.
I am compiling with c++ (7.3.0) with the following options -std=c++17 -Wall -Wextra -O3 -fno-tree-vectorize -mavx512f. I have checked as explained here and my CPU (Intel(R) Core(TM) i7-4710HQ CPU # 2.50GHz) supports AVX2. Should the list have AVX-512 to indicate support for this?
I don't think AVX-512 instructions are supported on your system (CPU). Taking the official documentation into consideration; it only mentions AVX-2. A newer CPU would indicate AVX-512 perfectly fine. Both can be found under the "Instruction Set Extensions" section.

What is wrong with my perlin noise generator?

I am trying to generate perlin noise for a math essay for school, and i have some difficulties figuring out the math behind it. This is my perlin class. The perlin noise function generates ( should generate) a number between 0 and 1, that i then multiply by 255 to apply color to every pixel on the screen, please help!
#include "perlinnoise.h"
perlinnoise::perlinnoise()
{
srand(time(NULL));
double random = rand() % 1000;
for (int i = 0; i < (651 * 2); i = i + 2)
{
random = (rand() % 1000);
vecGrad[i] = random / 1000;
vecGrad[i + 1] = vecGrad[i];
vecGrad[i] = cos(vecGrad[i] * 2 * 3.1416);
vecGrad[i + 1] = sin(vecGrad[i + 1] * 2 * 3.1416);
}
}
int perlinnoise::perlinNoise(int x, int y)
{
//20 pixel in each case
//30 boxes in width and 20 boxes in height
//651 vectors to create
sf::Vector2i boxXY;
boxXY.x = ((x / 20));
boxXY.y = ((y / 20));
sf::Vector2i displacement1; displacement1.x = x - boxXY.x * 20; displacement1.y = y - boxXY.y * 20;
sf::Vector2i displacement2; displacement2.x = x - (boxXY.x * 20 + 20); displacement2.y = y - boxXY.y * 20;
sf::Vector2i displacement3; displacement3.x = x - boxXY.x * 20; displacement3.y = y - (boxXY.y * 20 + 20);
sf::Vector2i displacement4; displacement4.x = x - (boxXY.x * 20 + 20); displacement4.y = y - (boxXY.y * 20 + 20);
/*std::cout << displacement1.x << std::endl; std::cout << displacement1.y << std::endl;
std::cout << displacement2.x << std::endl; std::cout << displacement2.y << std::endl;
std::cout << displacement3.x << std::endl; std::cout << displacement3.y << std::endl;
std::cout << displacement4.x << std::endl; std::cout << displacement4.y << std::endl;*/
double dotP1 = (vecGrad[((boxXY.y * 30) + boxXY.x)] * displacement1.x) + (vecGrad[(boxXY.y * 30) + boxXY.x + 1] * displacement1.y);
double dotP2 = (vecGrad[((boxXY.y * 30) + boxXY.x + 3)] * displacement2.x) + (vecGrad[(boxXY.y * 30) + boxXY.x + 4] * displacement2.y);
double dotP3 = (vecGrad[((boxXY.y * 30 + 1) + boxXY.x)] * displacement3.x) + (vecGrad[(boxXY.y * 30) + boxXY.x + 1] * displacement3.y);
double dotP4 = (vecGrad[((boxXY.y * 30 + 1) + boxXY.x + 3)] * displacement4.x) + (vecGrad[(boxXY.y * 30) + boxXY.x + 4] * displacement4.y);
This is where i have some troubles ( I think)
int intensity = 0;
double Sx = (3 * (x - boxXY.x * 20) * (x - boxXY.x * 20)) - (2 * (x - boxXY.x * 20) * (x - boxXY.x * 20) * (x - boxXY.x * 20));
double Sy = (3 * (y - boxXY.y * 20) * (y - boxXY.y * 20)) - (2 * (y - boxXY.y * 20) * (y - boxXY.y * 20) * (y - boxXY.y * 20));
double a = dotP1 + (Sx * (dotP2 - dotP1));
double b = dotP3 + (Sx * (dotP4 - dotP3));
double aa = dotP1 + (Sy * (dotP2 - dotP1));
double bb = dotP3 + (Sy * (dotP4 - dotP3));
intensity = (a+b+aa+bb)/4;
//Should generate number between 0 and 1, but doesn't :/
return intensity;
}
perlinnoise::~perlinnoise()
{
}
I've been reading lots of articles, and they are all very unclear about the math used.I ended up generating a grid with 20*20 pixels in each, with each cross section in the grid having a randomly generated gradient vector. I then calculate the displacement vectors and then do the dot product on the four corners with displacement and gradient vectors. This first part is a bit messy as i am not very experienced, but the last part is a bit more straightforward. I use a smoothing function on the x and y axis and use that number to generate a, b, aa and bb, and i then take the average of that. This is what i thought i understood from the articles i read, but apparently it's wrong :/ Any help please?
Thanks in advance!

C++/3D Terrain: std::vector pushback() crashes with c0000374

When attempted to push back a vector of UINT, the progrma crashes with Critical error detected c0000374. Below is the initial code:
void Terrain::CreateIndexList(UINT Width, UINT Height){
UINT sz_iList = (Width - 1)*(Height - 1) * 6;
UINT *iList = new UINT[sz_iList];
for (int i = 0; i < Width; i++){
for (int j = 0; j < Height; j++){
iList[(i + j * (Width - 1)) * 6] = ((UINT)(2 * i));
iList[(i + j * (Width - 1)) * 6 + 1] = (UINT)(2 * i + 1);
iList[(i + j * (Width - 1)) * 6 + 2] = (UINT)(2 * i + 2);
iList[(i + j * (Width - 1)) * 6 + 3] = (UINT)(2 * i + 2);
iList[(i + j * (Width - 1)) * 6 + 4] = (UINT)(2 * i + 1);
iList[(i + j * (Width - 1)) * 6 + 5] = (UINT)(2 * i + 3);
}
}
for (int i = 0; i < sz_iList; i++){
Geometry.IndexVertexData.push_back(iList[i]);
}
delete[] iList;
}
The goal is to take the generated indices from the iList array and fill the Geometry.IndexVertexData vector array. While debugging this, I've created several other implementations of this:
//After creating the iList array:
Geometry.IndexVertexData.resize(sz_iList); //Fails with "Vector subscript out of range?"
UINT in = 0;
for (int i = 0; i < Width; i++){
for (int j = 0; j < Height; j++){
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6] = iList[in];
in++;
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6 + 1] = iList[in];
in++;
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6 + 2] = iList[in];
in++;
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6 + 3] = iList[in];
in++;
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6 + 4] = iList[in];
in++;
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6 + 5] = iList[in];
in++;
}
}
And a final, direct to vector implementation:
Geometry.IndexVertexData.reserve(sz_iList);
for (int index = 0; index < sz_iList; index+=6) {
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6] = ((UINT)(2 * i));
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6 + 1] = (UINT)(2 * i + 1);
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6 + 2] = (UINT)(2 * i + 2);
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6 + 3] = (UINT)(2 * i + 2);
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6 + 4] = (UINT)(2 * i + 1);
Geometry.IndexVertexData[(i + j*(Width - 1)) * 6 + 5] = (UINT)(2 * i + 3);
}
sz_iList has a final value of 2166, resultant from a grid of 20x20 (400 total points) and is used to initialize sizes. In all cases, the vector array would not fully fill, crashing with Critical error detected c0000374. Am I doing something wrong?
Your sz_iList doesn't appear to be big enough. Let's use a simple example of Width = Height = 2;, then sz_iList = (2 - 1) * (2 - 1) * 6 = 6, right? But in your nested loops, the last iteration occurs when i = j = 1 (i is one less than Width and j is one less than Height), where (in the last line of your loop), you try to access element (i + j * (Width - 1)) * 6 + 5 = (1 + 1 * (2 - 1)) * 6 + 5 = (1 + 1 * 1) * 6 + 5 = 2 * 6 + 5 = 17, which is bigger than the size of your array. This results in undefined behavior.

CUDA: working with arrays of different sizes

In this example, I am trying to create an 10x8 array using values from a 10x9 array. It looks like I am accessing memory incorrectly but I am not sure where my error is.
The code in C++ would be something like
for (int h = 0; h < height; h++){
for (int i = 0; i < (width-2); i++)
dd[h*(width-2)+i] = hi[h*(width-1)+i] + hi[h*(width-1)+i+1];
}
This is what I am trying in CUDA:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <stdint.h>
#include <iostream>
#define TILE_WIDTH 4
using namespace std;
__global__ void cudaOffsetArray(int height, int width, float *HI, float *DD){
int x = blockIdx.x * blockDim.x + threadIdx.x; // Col // width
int y = blockIdx.y * blockDim.y + threadIdx.y; // Row // height
int grid_width = gridDim.x * blockDim.x;
//int index = y * grid_width + x;
if ((x < (width - 2)) && (y < (height)))
DD[y * (grid_width - 2) + x] = (HI[y * (grid_width - 1) + x] + HI[y * (grid_width - 1) + x + 1]);
}
int main(){
int height = 10;
int width = 10;
float *HI = new float [height * (width - 1)];
for (int i = 0; i < height; i++){
for (int j = 0; j < (width - 1); j++)
HI[i * (width - 1) + j] = 1;
}
float *gpu_HI;
float *gpu_DD;
cudaMalloc((void **)&gpu_HI, (height * (width - 1) * sizeof(float)));
cudaMalloc((void **)&gpu_DD, (height * (width - 2) * sizeof(float)));
cudaMemcpy(gpu_HI, HI, (height * (width - 1) * sizeof(float)), cudaMemcpyHostToDevice);
dim3 dimGrid((width - 1) / TILE_WIDTH + 1, (height - 1)/TILE_WIDTH + 1, 1);
dim3 dimBlock(TILE_WIDTH, TILE_WIDTH, 1);
cudaOffsetArray<<<dimGrid,dimBlock>>>(height, width, gpu_HI, gpu_DD);
float *result = new float[height * (width - 2)];
cudaMemcpy(result, gpu_DD, (height * (width - 2) * sizeof(float)), cudaMemcpyDeviceToHost);
for (int i = 0; i < height; i++){
for (int j = 0; j < (width - 2); j++)
cout << result[i * (width - 2) + j] << " ";
cout << endl;
}
cudaFree(gpu_HI);
cudaFree(gpu_DD);
delete[] result;
delete[] HI;
system("pause");
}
I've also tried this in the global function:
if ((x < (width - 2)) && (y < (height)))
DD[y * (grid_width - 2) + (blockIdx.x - 2) * blockDim.x + threadIdx.x] =
(HI[y * (grid_width - 1) + (blockIdx.x - 1) * blockDim.x + threadIdx.x] +
HI[y * (grid_width - 1) + (blockIdx.x - 1) * blockDim.x + threadIdx.x + 1]);
To "fix" your code, change each use of grid_width to width in this line in your kernel:
DD[y * (grid_width - 2) + x] = (HI[y * (grid_width - 1) + x] + HI[y * (grid_width - 1) + x + 1]);
Like this:
DD[y * (width - 2) + x] = (HI[y * (width - 1) + x] + HI[y * (width - 1) + x + 1]);
Explanation:
Your grid_width:
dim3 dimGrid((width * 2 - 1) / TILE_WIDTH + 1, (height - 1)/TILE_WIDTH + 1, 1);
dim3 dimBlock(TILE_WIDTH, TILE_WIDTH, 1);
doesn't actually correspond to your array size (10x10, or 10x9, or 10x8). I"m not sure why you're launching 2*width threads in the x dimension, but this means that your thread array is considerably larger than your data array.
So when you use grid_width in the kernel:
DD[y * (grid_width - 2) + x] = (HI[y * (grid_width - 1) + x] + HI[y * (grid_width - 1) + x + 1]);
the indexing will be a problem. If you instead change each instance of grid_width above to just width (which corresponds to the actual width of your data array) you'll get better indexing, I think. Normally it's not an issue to launch "extra threads" because you have a thread check line in your kernel:
if ((x < (width - 2)) && (y < (height)))
but when you launch extra threads, it is making your grid larger, and so you can't use grid dimensions to index properly into your data array.

Help with Diamond Square algorithm implementation

I'm trying to implement the Diamond-square algorithm, but the problem is only part of the bitmap is being filled and I'm not sure what's wrong. I'm doing it recursively:
GLuint CreateDsquare()
{
std::vector<GLubyte> pdata(256 * 256 * 4);
vector2i loc;
vector2i sz;
GLubyte val;
sz.x = 256;
sz.y = 256;
val = rand() % 255;
loc = vector2i(0,0);
pdata[loc.y * 4 * sz.x + loc.x * 4 + 0] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 1] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 2] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 3] = 255;
loc.x = sz.x - 1;
loc.y = 0;
val = rand() % 255;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 0] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 1] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 2] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 3] = 255;
loc.x = sz.x - 1;
loc.y = sz.y - 1;
val = rand() % 255;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 0] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 1] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 2] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 3] = 255;
loc.x = 0;
loc.y = sz.y - 1;
val = rand() % 255;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 0] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 1] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 2] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 3] = 255;
RescursiveDiamond(pdata,sz,vector2i(0,0));
return CreateTexture(pdata,256,256);
}
void RescursiveDiamond(std::vector<GLubyte> &pdata,vector2i psz, vector2i offset)
{
int val;
int newnum;
if(psz.x < 2 && psz.y < 2)
{
return;
}
vector2i loc;
vector2i sz = psz;
std::vector<int> pvertz(4,0);
loc = offset;
pvertz[0] = pdata[loc.y * 4 * sz.x + loc.x * 4 + 0];
loc.x = offset.x + (psz.x - 1);
loc.y = offset.y;
pvertz[1] = pdata[loc.y * 4 * sz.x + loc.x * 4 + 0];
loc.x = offset.x + (psz.x - 1);
loc.y = offset.y + (psz.y - 1);
pvertz[2] = pdata[loc.y * 4 * sz.x + loc.x * 4 + 0];
loc.x = offset.x;
loc.y = offset.y + (psz.y - 1);
pvertz[3] = pdata[loc.y * 4 * sz.x + loc.x * 4 + 0];
val = (pvertz[0] + pvertz[1]) / 2;
val += 255;
loc.x = (offset.x + (sz.x - 1)) / 2;
loc.y = offset.y;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 0] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 1] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 2] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 3] = 255;
val = (pvertz[1] + pvertz[2]) / 2;
val += 255;
loc.x = (offset.x + (sz.x)) - 1;
loc.y = ((offset.y + (sz.y)) / 2) - 1;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 0] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 1] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 2] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 3] = 255;
val = (pvertz[3] + pvertz[2]) / 2;
val += 255;
loc.x = ((offset.x + (sz.x)) / 2) - 1;
loc.y = (offset.y + (sz.y)) - 1 ;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 0] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 1] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 2] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 3] = 255;
val = (pvertz[0] + pvertz[3]) / 2;
val += 255;
loc.x = offset.x;
loc.y = (offset.y + (sz.y)) - 1 ;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 0] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 1] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 2] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 3] = 255;
//center
val = (pdata[(offset.y) * 4 * sz.x + ((offset.x + (sz.x - 1)) / 2) * 4 + 0] +
pdata[(offset.y + (sz.y - 1)) * 4 * sz.x + ((offset.x + (sz.x - 1)) / 2) * 4 + 0]) / 2;
int ad = (rand() % 12) - 6;
if(val + ad < 0)
{
val = 0;
}
else
{
val += ad;
}
val += 255;
loc.x = ((offset.x + (sz.x) ) / 2) - 1;
loc.y = ((offset.y + (sz.y)) / 2) - 1;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 0] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 1] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 2] = val;
pdata[loc.y * 4 * sz.x + loc.x * 4 + 3] = 255;
vector2i newoffset;
vector2i newparentsz;
newoffset = offset;
newparentsz = (psz / 2);
RescursiveDiamond(pdata,newparentsz,newoffset);
newoffset.x = offset.x + (newparentsz.x);
newoffset.y = offset.y;
RescursiveDiamond(pdata,newparentsz,newoffset);
newoffset.x = offset.x;
newoffset.y = offset.y + (newparentsz.y);
RescursiveDiamond(pdata,newparentsz,newoffset);
newoffset.x = offset.x + (newparentsz.x);
newoffset.y = offset.y + (newparentsz.y);
RescursiveDiamond(pdata,newparentsz,newoffset);
}
I suspect that I might be recalling the function with the wrong offset or something.
offset is like the top left and then there is the size, together these nake the square.
what could be wrong here?
Thanks
Ok, first, Let's start with cleaning up the violations of D-R-Y, your code should read more along the lines of this:
int position( _y, _x, _offset ){
return _y * _x * 4 + _x * 4 + _offset;
}
void adjust(vector<GLubyte> &pdata, _x, _y){
GLubyte val = rand() % 255;
for(int j=0; j < 3; ++j){
pdata[ position( _y, _x, j ) ] = val;
}
pdata[ position( _y, _x, 3 ) ] = 255;
}
GLuint CreateDsquare(){
vector2i sz;
sz.x = 256;
sz.y = 256;
adjust( pdata, 0, 0 );
adjust( pdata, sz.x - 1, 0 );
adjust( pdata, sz.x -1, sz.y - 1 );
adjust( pdata, 0, sz.y - 1 );
RescursiveDiamond(pdata,sz,vector2i(0,0));
return CreateTexture(pdata,256,256);
}
Can you format the rest of it down so it's more readable/understandable? Then I'll update so that I can better answer your question (if someone hasn't beaten me to it or the woman decides I've had enough computer time.)
When you are calculating the offset into your height map for each row (y * pitch) you are using the current size of the square you are calculating instead of the actual pitch which is 256. The deeper you go into the recursion you are writing into your height map as if it was smaller and smaller until the last step of your recursion is writing into pixel (0, 0).