Why doesn't my OpenCL 3d image lookup work? - c++

I have been having trouble with an OpenCL kernel which I've written producing incorrect results (compared to a reference brute-force CPU implementation).
I tracked the problem down to a 3D lookup table I'm using which seems to be returning garbage results, rather than the values which I passed in.
I have the following (simplified) OpenCL kernel for reading a precomputed function from a 3D image type:
__constant sampler_t legSampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_LINEAR;
inline float normalizedLegendre(int n, int m, float z, image3d_t legendreLUT)
{
float nCoord = (((float) n) / get_image_width(legendreLUT));
float mCoord = (((float) m) / get_image_height(legendreLUT));
float zCoord = ((z + 1.0f) / 2.0f);
float4 coord = (float4)(floor(nCoord) + 0.5f, floor(mCoord) + 0.5f, zCoord, 0.0f);
return read_imagef(legendreLUT, legSampler, coord).x;
}
_kernel void noiseMain(__read_only image3d_t legendreLUT, __global float* outLegDump)
{
//k is the linear index into the array.
int k = get_global_id(0);
if(k < get_image_depth(legendreLUT))
{
float z = ((float) k / (float) get_image_depth(legendreLUT)) * 2.0 - 1.0;
float legLookup = normalizedLegendre(5, 4, z, legendreLUT);
float texCoord = ((float) k / 1024.0) * 2 - 1;
outLegDump = legLookup;
}
}
On the host side, I generate the 3D image, legendreLUT, using the following code:
static const size_t NLEGPOLYBINS = 1024;
static const size_t NLEGPOLYORDERS = 16;
boost::scoped_array<float> legendreHostBuffer(new float[NLEGPOLYORDERS * NLEGPOLYORDERS * NLEGPOLYBINS]);
float stepSize = 1.0 / (((float) NLEGPOLYBINS/2.0) - 0.5);
float z = -1.0;
std::cout << "Generating legendre polynomials..." << std::endl;
for(size_t n = 0; n < NLEGPOLYORDERS; n++)
{
for(size_t m = 0; m < NLEGPOLYORDERS; m++)
{
for(size_t zI = 0; zI < NLEGPOLYBINS; zI++)
{
using namespace boost::math;
size_t index = (n * NLEGPOLYORDERS * NLEGPOLYBINS) + (m * NLEGPOLYBINS) + zI;
//-1..1 in NLEGPOLYBINS steps...
float val;
if(m > n)
{
legendreHostBuffer[index] = 0;
continue;
}
else
{
//boost::math::legendre_p
val = legendre_p<float>(n,m,z);
}
float nPm = n+m;
float nMm = n-m;
float factNum;
float factDen;
factNum = factorial<float>(n-m);
factDen = factorial<float>(n+m);
float nrmTerm;
nrmTerm = pow(-1.0, m) * sqrt((n + 0.5) * (factNum/factDen));
legendreHostBuffer[index] = val;
z += stepSize;
if(z > 1.0) z + 1.0;
}
z = -1.0;
}
}
//DEBUGGING STEP: Dump everything we've just generated for m = 4, n = 5, z=-1..1
std::ofstream legDump("legDump.txt");
for(size_t i = 0; i < NLEGPOLYBINS; i++)
{
int n =5; int m = 4;
size_t index = (n * NLEGPOLYORDERS * NLEGPOLYBINS) + (m * NLEGPOLYBINS) + i;
float texCoord = ((float) i / (float) NLEGPOLYBINS) * 2 - 1;
legDump << i << " " << texCoord << " " << legendreHostBuffer[index] << std::endl;
}
legDump.close();
std::cout << "Creating legendre polynomial look up table image..." << std::endl;
cl::ImageFormat legFormat(CL_R, CL_FLOAT);
//Generate out legendre polynomials image...
m_legendreTable = cl::Image3D(m_clContext,
CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
legFormat,
NLEGPOLYORDERS,
NLEGPOLYORDERS,
NLEGPOLYBINS,
0,
0,
legendreHostBuffer.get());
Other than the index, the actual generation of the values is more or less irrelevant, but I've included it here for completeness.
And here is how I execute the kernel and read back the results:
cl::Buffer outLegDump = cl::Buffer(m_clContext, CL_MEM_WRITE_ONLY, NLEGPOLYBINS * sizeof(float));
//Create out kernel...
cl::Kernel kernel(m_program, "noiseMain");
kernel.setArg(0, m_legendreTable);
kernel.setArg(1, outLegDump);
size_t kernelSize = 1024;
cl::NDRange globalRange(kernelSize);
cl::NDRange localRange(1);
m_commandQueue.enqueueNDRangeKernel(kernel, cl::NullRange, globalRange, cl::NullRange);
m_commandQueue.finish();
boost::scoped_array<float> legDumpHost(new float[NLEGPOLYBINS]);
m_commandQueue.enqueueReadBuffer(outLegDump, CL_TRUE, 0, NLEGPOLYBINS * sizeof(float), legDumpHost.get());
std::ofstream legreadback("legreadback.txt");
for(size_t i = 0; i < NLEGPOLYBINS; i++)
{
legreadback << i << " " << legDumpHost[i] << std::endl;
}
legreadback.close();
When I look at the dumped data (i.e. that put out in legdump.txt from the host-side buffer), I get the expected data. However, when I compare it to the data received back from the device side (i.e. that looked up by the kernel and put out in legreadback.txt), I get incorrect values.
Since I'm calculating 1024 values in both cases, I'll spare everyone the whole dump, however, here are the first few/last values of each:
legdump.txt (host side sanity check):
0 -0
1 -0.0143913
2 -0.0573401
3 -0.12851
4 -0.227566
5 -0.354175
..
..
1020 0.12859
1021 0.0144185
1022 0.0144185
1023 1.2905e-8
legreadback.txt (device-side lookup and readback)
0 1
1 1
2 1
3 1
4 0.5
5 0
..
..
1020 7.74249e+11
1021 -1.91171e+15
1022 -3.81029e+15
1023 -1.91173e+15
Note that these values are the same across multiple runs of the code, so I don't think it's an initialization problem.
I can only assume that I'm calculating indices wrong somewhere, but I don't know where. I've checked the calculation of the Z coordinate (which naturally is defined on -1..1), its conversion to texture coordinates (0..1 range), and the conversion of M and N to texture coordinates (which should be done without interpolation), and found nothing to be wrong.
So my question is thus:
What is the proper way to create and index a 3D lookup table in OpenCL?

As expected, the problem turned out to be in the indexing on the host-side used to generate the lookup table.
The previous index calculation:
size_t index = (n * NLEGPOLYORDERS * NLEGPOLYBINS) + (m * NLEGPOLYBINS) + zI;
Was based on C++ 3D array indexing, which is not the way addressing works in OpenCL for a 3D image. A 3D image can be thought of as a "stack" of 2D images on top of each other, where the depth coordinate (Z in this case) selects the image, and the horizontal and vertical coordinates (m and n in this case) select the pixel within the selected image.
The correct indexing calculation is:
size_t index = m * NLEGPOLYORDERS + n + (zI * NLEGPOLYORDERS * NLEGPOLYORDERS);
As one can see, this new approach fits the "stacked image" layout described previously, whereas the previous calculation does not.

Related

How to fix my block and grid layout to handle large data?

I'm trying to find the closest pair naive algorithm which has 3D coord.
Input is two files which contains 3 floats in one line.
I handled inputs with float3* type variable.
float3* teamA;
float3* teamB;
float3* results;
handleFileInput(argv[1], argv[2], teamA, teamB, numPoints);
results = new float3[numPoints[0]];
after this, I allocated and copied host data to device like this
#define CHECKERROR(val) { if (val != cudaSuccess) {fprintf(stderr, "Error %s at line %d in file %s\n", cudaGetErrorString(val), __LINE__, __FILE__); exit(1);} }
CHECKERROR(cudaMalloc(&d_tA, sizeof(float3) * numPoints[0]));
CHECKERROR(cudaMemset(d_tA, 0, sizeof(float3) * numPoints[0]));
CHECKERROR(cudaMalloc(&d_tB, sizeof(float3) * numPoints[1]));
CHECKERROR(cudaMemset(d_tB, 0, sizeof(float3) * numPoints[1]));
CHECKERROR(cudaMalloc(&d_results, sizeof(float3) * numPoints[0]));
CHECKERROR(cudaMemset(d_results, 0, sizeof(float3) * numPoints[0]));
CHECKERROR(cudaMemcpy(d_tA, teamA, sizeof(float3) * numPoints[0], cudaMemcpyHostToDevice));
CHECKERROR(cudaMemcpy(d_tB, teamB, sizeof(float3) * numPoints[1], cudaMemcpyHostToDevice));
I set my block, grid like this.
dim3 block(512);
dim3 grid(ceil((float)numPoints[0] / 512);
naive_algorithm <<< block, grid >>> (d_tA, d_tB, d_results, numPoints[0], numPoints[1]);
My kernel code is simple like this
__global__ void naive_algorithm(float3* d_tA, float3* d_tB, float3* d_r, int a_size, int b_size)
{
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx < a_size)
{
float min_distance = -1;
for (int y = 0; y < b_size; y++)
{
float i = MUL(SUB(d_tA[idx].x, d_tB[y].x), SUB(d_tA[idx].x, d_tB[y].x));
float j = MUL(SUB(d_tA[idx].y, d_tB[y].y), SUB(d_tA[idx].y, d_tB[y].y));
float k = MUL(SUB(d_tA[idx].z, d_tB[y].z), SUB(d_tA[idx].z, d_tB[y].z));
float distance = SQRT(ADD(ADD(i, j), k));
if (min_distance > distance || min_distance == -1)
{
d_r[idx].x = (float)idx;
d_r[idx].y = (float)y;
d_r[idx].z = distance;
min_distance = distance;
}
}
__syncthreads();
}
}
Environment : RTX 2080Ti
There are five different of data samples :
Team A - 1000000 points / Team B - 500000 points -> Test Failed
Team A - 700000 points / Team B - 500000 points -> Test Failed
Team A - 500000 points / Team B - 300000 points -> Test OK!
Team A - 500000 points / Team B - 100000 points -> Test OK!
Team A - 300000 points / Team B - 100000 points -> Test OK!
In my opinion this caused from thread layout.
Do I have to change the block / grid layout 1D by 1D -> 2D by 2D?
Then how should I set my grid layout?
Like Robert Crovella said this was just typo error my mistake.
Because one block can handle to 1024 and grid's one dimension can handle to 65535,
if the grid's x dimension : numPoints[0] / BLOCK_SIZE is bigger than 1024 it doesn't works.
Thanks a lot to check my code!

Why isn't my 4 thread implementation faster than the single thread one?

I don't know much about multi-threading and I have no idea why this is happening so I'll just get to the point.
I'm processing an image and divide the image in 4 parts and pass each part to each thread(essentially I pass the indices of the first and last pixel rows of each part). For example, if the image has 1000 rows, each thread will process 250 of them. I can go in details about my implementation and what I'm trying to achieve in case it can help you. For now I provide the code executed by the threads in case you can detect why this is happening. I don't know if it's relevant but in both cases(1 thread or 4 threads) the process takes around 15ms and pfUMap and pbUMap are unordered maps.
void jacobiansThread(int start, int end,vector<float> &sJT,vector<float> &sJTJ) {
uchar* rgbPointer;
float* depthPointer;
float* sdfPointer;
float* dfdxPointer; float* dfdyPointer;
float fov = radians(45.0);
float aspect = 4.0 / 3.0;
float focal = 1 / (glm::tan(fov / 2));
float fu = focal * cols / 2 / aspect;
float fv = focal * rows / 2;
float strictFu = focal / aspect;
float strictFv = focal;
vector<float> pixelJacobi(6, 0);
for (int y = start; y <end; y++) {
rgbPointer = sceneImage.ptr<uchar>(y);
depthPointer = depthBuffer.ptr<float>(y);
dfdxPointer = dfdx.ptr<float>(y);
dfdyPointer = dfdy.ptr<float>(y);
sdfPointer = sdf.ptr<float>(y);
for (int x = roiX.x; x <roiX.y; x++) {
float deltaTerm;// = deltaPointer[x];
float raw = sdfPointer[x];
if (raw > 8.0)continue;
float dirac = (1.0f / float(CV_PI)) * (1.2f / (raw * 1.44f * raw + 1.0f));
deltaTerm = dirac;
vec3 rgb(rgbPointer[x * 3], rgbPointer[x * 3+1], rgbPointer[x * 3+2]);
vec3 bin = rgbToBin(rgb, numberOfBins);
int indexOfColor = bin.x * numberOfBins * numberOfBins + bin.y * numberOfBins + bin.z;
float s3 = glfwGetTime();
float pF = pfUMap[indexOfColor];
float pB = pbUMap[indexOfColor];
float heavisideTerm;
heavisideTerm = HEAVISIDE(raw);
float denominator = (heavisideTerm * pF + (1 - heavisideTerm) * pB) + 0.000001;
float commonFirstTerm = -(pF - pB) / denominator * deltaTerm;
if (pF == pB)continue;
vec3 pixel(x, y, depthPointer[x]);
float dfdxTerm = dfdxPointer[x];
float dfdyTerm = -dfdyPointer[x];
if (pixel.z == 1) {
cv::Point c = findClosestContourPoint(cv::Point(x, y), dfdxTerm, -dfdyTerm, abs(raw));
if (c.x == -1)continue;
pixel = vec3(c.x, c.y, depthBuffer.at<float>(cv::Point(c.x, c.y)));
}
vec3 point3D = pixel;
pixelToViewFast(point3D, cols, rows, strictFu, strictFv);
float Xc = point3D.x; float Xc2 = Xc * Xc; float Yc = point3D.y; float Yc2 = Yc * Yc; float Zc = point3D.z; float Zc2 = Zc * Zc;
pixelJacobi[0] = dfdyTerm * ((fv * Yc2) / Zc2 + fv) + (dfdxTerm * fu * Xc * Yc) / Zc2;
pixelJacobi[1] = -dfdxTerm * ((fu * Xc2) / Zc2 + fu) - (dfdyTerm * fv * Xc * Yc) / Zc2;
pixelJacobi[2] = -(dfdyTerm * fv * Xc) / Zc + (dfdxTerm * fu * Yc) / Zc;
pixelJacobi[3] = -(dfdxTerm * fu) / Zc;
pixelJacobi[4] = -(dfdyTerm * fv) / Zc;
pixelJacobi[5] = (dfdyTerm * fv * Yc) / Zc2 + (dfdxTerm * fu * Xc) / Zc2;
float weightingTerm = -1.0 / log(denominator);
for (int i = 0; i < 6; i++) {
pixelJacobi[i] *= commonFirstTerm;
sJT[i] += pixelJacobi[i];
}
for (int i = 0; i < 6; i++) {
for (int j = i; j < 6; j++) {
sJTJ[i * 6 + j] += weightingTerm * pixelJacobi[i] * pixelJacobi[j];
}
}
}
}
}
This is the part where I call each thread:
vector<std::thread> myThreads;
float step = (roiY.y - roiY.x) / numberOfThreads;
vector<vector<float>> tsJT(numberOfThreads, vector<float>(6, 0));
vector<vector<float>> tsJTJ(numberOfThreads, vector<float>(36, 0));
for (int i = 0; i < numberOfThreads; i++) {
int start = roiY.x+i * step;
int end = start + step;
if (end > roiY.y)end = roiY.y;
myThreads.push_back(std::thread(&pwp3dV2::jacobiansThread, this,start,end,std::ref(tsJT[i]), std::ref(tsJTJ[i])));
}
vector<float> sJT(6, 0);
vector<float> sJTJ(36, 0);
for (int i = 0; i < numberOfThreads; i++)myThreads[i].join();
Other Notes
To measure time I used glfwGetTime() before and right after the second code snippet. The measurements vary but the average is about 15ms as I mentioned, for both implementations.
Starting a thread has significant overhead, which might not be worth the time if you have only 15 milliseconds worth of work.
The common solution is to keep threads running in the background and send them data when you need them, instead of calling the std::thread constructor to create a new thread every time you have some work to do.
Pure spectaculation but two things might be preventing the full power of parallelization.
Processing speed is limited by the memory bus. Cores will wait until data is loaded before continuing.
Data sharing between cores. Some caches are core specific. If memory is shared between cores, data must traverse down to shared cache before loading.
On Linux you can use Perf to check for cache misses.
if you wanna better time you need to split a cycle runs from a counter, for this you need to do some preprocessing. some fast stuff like make an array of structures with headers for each segment or so. if say you can't mind anything better you can just do vector<int> with values of a counter. Then do for_each(std::execution::par,...) on that. way much faster.
for timings there's
auto t2 = std::chrono::system_clock::now();
std::chrono::milliseconds f = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1);

Alpha tested text rendering of signed distance field creates wiggly edges

I am trying to render the text in my application using a glyph atlas represented as a signed distance field texture, which means a texture which stores the distance to the nearest outline in each pixel.
This distance field texture is generated from the original binary glyph atlas texture (0 - outside of glyph outline, 1 - inside glyph outline) with this algorithm, which is searching an incremental radius around every pixel in the texture until a pixel with the opposite state is found, then stores the distance at which that pixel was found. Later, all distances are mapped between 0 and 1.
//set size of signed distance field atlas
atlas.width = binaryAtlasWidth* pDistanceFieldResolution;
atlas.height = binaryAtlasHeight * pDistanceFieldResolution;
const unsigned int atlasPixelCount = (atlas.width * atlas.height);
atlas.buffer.resize(atlasPixelCount);
//temporary buffer for the distances of each pixel
std::vector<float> unmappedBuffer;
unmappedBuffer.resize(atlasPixelCount);
//support multisampling
unsigned int samplesPerOutPixel = ceil(1.0 / pDistanceFieldResolution);
//for mapping the values between 0 and 1 later
float maxDistance = 0.0;
float minDistance = 0.0;
for (unsigned int outPixel = 0; outPixel < atlasPixelCount; ++outPixel) {
//coordinate of the input sample
unsigned int outPixelX = outPixel%atlas.width;
unsigned int outPixelY = outPixel/atlas.width;
float distanceSum = 0.0f;
for (unsigned int sampleY = 0; sampleY < samplesPerOutPixel; ++sampleY) {
for (unsigned int sampleX = 0; sampleX < samplesPerOutPixel; ++sampleX) {
glm::uvec2 sampleCoord = glm::uvec2(outPixelX * samplesPerOutPixel+ sampleX, outPixelY * samplesPerOutPixel+ sampleY);
unsigned int samplePos = sampleCoord.x + sampleCoord.y*binaryAtlasWidth;
unsigned char sampleVal = buffer[samplePos];
//inital distance is maximum search radius(outside of glyph)
float dist = spread;
int found = 0;
unsigned int rad = 0;
while(!found && (rad*(!sampleVal)) < spread_pixels) {
//if sampleVal is 1(inside), search until found
float radius = (float)rad + 1.0f;
unsigned int compareCount = round(2.0f*radius*M_PI);
float step = 1.0 / (float)compareCount;
for (unsigned int t = 0; t < compareCount && !found; ++t) {
float theta = step*(float)t*360.0f;
glm::vec2 compareLocalCoord = glm::vec2(std::cos(theta), std::sin(theta))*radius;
glm::uvec2 compareCoord = sampleCoord + glm::uvec2(compareLocalCoord);
int comparePos = compareCoord.x + compareCoord.y*binaryAtlasWidth;
if (compareCoord.x >= 0 && compareCoord.x < binaryAtlasWidth&& compareCoord.y >= 0 && compareCoord.y < binaryAtlasHeight) {
unsigned char compareVal = buffer[comparePos];
if (compareVal != sampleVal ) {
float distance = sqrt(pow(compareLocalCoord.x, 2) + pow(compareLocalCoord.y, 2));
found = 1;
dist = std::min(distance * (1 - (sampleVal * 2)) , dist) ;
}
}
}
++rad;
}
distanceSum += dist;
}
}
float avgDistance = distanceSum / (float)(samplesPerOutPixel*samplesPerOutPixel);
printf("pixel %i of %i has %f distance\n", outPixel, atlasPixelCount, avgDistance);
unmappedBuffer[outPixel] = avgDistance;
maxDistance = std::max(maxDistance, avgDistance);
minDistance = std::min(minDistance, avgDistance);
}
minDistance *= -1.0;
float diff = maxDistance + minDistance;
//map all values between 0 and 255
for(unsigned int p = 0; p < atlasPixelCount; ++p) {
float toMap = unmappedBuffer[p];
float mappedDistance = 1.0f - (toMap + minDistance) / diff;
atlas.buffer[p] = mappedDistance * 255;
}
this algorithm creates these results:
266 x 183 input texture
SDF result without downsampling (still 266 x 183)
SDF result with downsampling (106 x 73)
Render results with alpha testing on(pass when alpha is greater than 0.5):
no downsampling, nearest filtering
downsampled, nearest filtering
no downsampling, linear filtering
downsampled, linear filtering
I mean, I am getting there, but actually I expected accurate edges as shown in valves paper. What am i missing for accurate edges?
PS: my fragment shader currently only uses the distance texture value as an alpha value. (color = vec4(1, 1, 1, distance);)

DirectX/C++: Marching Cubes Indexing

I've implemented the Marching Cube algorithm in a DirectX environment (To test and have fun). Upon completion, I noticed that the resulting model looks heavily distorted, as if the indices were off.
I've attempted to extract the indices, but I think the vertices are ordered correctly already, using the lookup tables, examples at http://paulbourke.net/geometry/polygonise/ . The current build uses a 15^3 volume.
Marching cubes iterates over the array as normal:
for (float iX = 0; iX < CellFieldSize.x; iX++){
for (float iY = 0; iY < CellFieldSize.y; iY++){
for (float iZ = 0; iZ < CellFieldSize.z; iZ++){
MarchCubes(XMFLOAT3(iX*StepSize, iY*StepSize, iZ*StepSize), StepSize);
}
}
}
The MarchCube function is called:
void MC::MarchCubes(){
...
int Corner, Vertex, VertexTest, Edge, Triangle, FlagIndex, EdgeFlags;
float Offset;
XMFLOAT3 Color;
float CubeValue[8];
XMFLOAT3 EdgeVertex[12];
XMFLOAT3 EdgeNorm[12];
//Local copy
for (Vertex = 0; Vertex < 8; Vertex++) {
CubeValue[Vertex] = (this->*fSample)(
in_Position.x + VertexOffset[Vertex][0] * Scale,
in_Position.y + VertexOffset[Vertex][1] * Scale,
in_Position.z + VertexOffset[Vertex][2] * Scale
);
}
FlagIndex = 0;
Intersection calculations:
...
//Test vertices for intersection.
for (VertexTest = 0; VertexTest < 8; VertexTest++){
if (CubeValue[VertexTest] <= TargetValue)
FlagIndex |= 1 << VertexTest;
}
//Find which edges are intersected by the surface.
EdgeFlags = CubeEdgeFlags[FlagIndex];
if (EdgeFlags == 0){
return;
}
for (Edge = 0; Edge < 12; Edge++){
if (EdgeFlags & (1 << Edge)) {
Offset = GetOffset(CubeValue[EdgeConnection[Edge][0]], CubeValue[EdgeConnection[Edge][1]], TargetValue); // Get offset function definition. Needed!
EdgeVertex[Edge].x = in_Position.x + VertexOffset[EdgeConnection[Edge][0]][0] + Offset * EdgeDirection[Edge][0] * Scale;
EdgeVertex[Edge].y = in_Position.y + VertexOffset[EdgeConnection[Edge][0]][1] + Offset * EdgeDirection[Edge][1] * Scale;
EdgeVertex[Edge].z = in_Position.z + VertexOffset[EdgeConnection[Edge][0]][2] + Offset * EdgeDirection[Edge][2] * Scale;
GetNormal(EdgeNorm[Edge], EdgeVertex[Edge].x, EdgeVertex[Edge].y, EdgeVertex[Edge].z); //Need normal values
}
}
And the original implementation gets pushed into a holding struct for DirectX.
for (Triangle = 0; Triangle < 5; Triangle++) {
if (TriangleConnectionTable[FlagIndex][3 * Triangle] < 0) break;
for (Corner = 0; Corner < 3; Corner++) {
Vertex = TriangleConnectionTable[FlagIndex][3 * Triangle + Corner];3 * Triangle + Corner]);
GetColor(Color, EdgeVertex[Vertex], EdgeNorm[Vertex]);
Data.VertexData.push_back(XMFLOAT3(EdgeVertex[Vertex].x, EdgeVertex[Vertex].y, EdgeVertex[Vertex].z));
Data.NormalData.push_back(XMFLOAT3(EdgeNorm[Vertex].x, EdgeNorm[Vertex].y, EdgeNorm[Vertex].z));
Data.ColorData.push_back(XMFLOAT4(Color.x, Color.y, Color.z, 1.0f));
}
}
(This is the same ordering as the original GL implementation)
Turns out, I missed a parenthesis showing operator precedence.
EdgeVertex[Edge].x = in_Position.x + (VertexOffset[EdgeConnection[Edge][0]][0] + Offset * EdgeDirection[Edge][0]) * Scale;
EdgeVertex[Edge].y = in_Position.y + (VertexOffset[EdgeConnection[Edge][0]][1] + Offset * EdgeDirection[Edge][1]) * Scale;
EdgeVertex[Edge].z = in_Position.z + (VertexOffset[EdgeConnection[Edge][0]][2] + Offset * EdgeDirection[Edge][2]) * Scale;
Corrected, obtained Visine; resumed fun.

Using glColorPointer with glDrawElements results in nothing being drawn

I'm working on just making uniformly colors spheres for a project and I'm running into an issue. The spheres run fine but when I try to color them with glColorPointer they stop appearing. OpenGL isn't showing any errors when I call glGetError so I'm at a loss for why this would happen.
The code to generate the vertices, colors etc:
void SphereObject::setupVertices()
{
//determine the array sizes
//vertices per row (+1 for the repeated one at the end) * three for each coordinate
//times the number of rows
int arraySize = myNumVertices * 3;
myNumIndices = (myVerticesPerRow + 1) * myRows * 2;
myVertices = new GLdouble[arraySize];
myIndices = new GLuint[myNumIndices];
myNormals = new GLdouble[arraySize];
myColors = new GLint[myNumVertices * 4];
//use spherical coordinates to calculate the vertices
double phiIncrement = 360 / myVerticesPerRow;
double thetaIncrement = 180 / (double)myRows;
int arrayIndex = 0;
int colorArrayIndex = 0;
int indicesIndex = 0;
double x, y, z = 0;
for(double theta = 0; theta <= 180; theta += thetaIncrement)
{
//loop including the repeat for the last vertex
for(double phi = 0; phi <= 360; phi += phiIncrement)
{
//make sure that the last vertex is repeated
if(360 - phi < phiIncrement)
{
x = myRadius * sin(radians(theta)) * cos(radians(0));
y = myRadius * sin(radians(theta)) * sin(radians(0));
z = myRadius * cos(radians(theta));
}
else
{
x = myRadius * sin(radians(theta)) * cos(radians(phi));
y = myRadius * sin(radians(theta)) * sin(radians(phi));
z = myRadius * cos(radians(theta));
}
myColors[colorArrayIndex] = myColor.getX();
myColors[colorArrayIndex + 1] = myColor.getY();
myColors[colorArrayIndex + 2] = myColor.getZ();
myColors[colorArrayIndex + 3] = 1;
myVertices[arrayIndex] = x;
myVertices[arrayIndex + 1] = y;
myVertices[arrayIndex + 2] = z;
if(theta <= 180 - thetaIncrement)
{
myIndices[indicesIndex] = arrayIndex / 3;
myIndices[indicesIndex + 1] = (arrayIndex / 3) + myVerticesPerRow + 1;
indicesIndex += 2;
}
arrayIndex += 3;
colorArrayIndex += 4;
}
}
}
And the code to actually render the thing
void SphereObject::render()
{
glPushMatrix();
glPushClientAttrib(GL_CLIENT_VERTEX_ARRAY_BIT);
glEnableClientState(GL_COLOR_ARRAY);
glColorPointer(4, GL_INT, 0, myColors);
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(3, GL_DOUBLE, 0, myVertices);
glDrawElements(GL_QUAD_STRIP, myNumIndices, GL_UNSIGNED_INT, myIndices);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);
glPopClientAttrib();
glPopMatrix();
}
Any and all help would be appreciated. I'm really having a hard time for some reason.
When you use GL_INT (or any integer type) for color pointer, it linearly maps the largest possible integer value to 1.0f (maximum color), and 0 to 0.0f (minimum color).
Therefore unless your values of RGB and A are in the billions, they will likely appear completely black (or transparent if that's enabled). I see that you've got alpha = 1, which will essentially be zero after conversion to float.