Anyone offer me an example using glMultiDrawArraysIndirect with JOGL? - opengl

I learned the method glMultiDrawArraysIndirect() from OpenGL, and I want to use this method call with JOGL. In the sample code, a C++ struct is used and the data in that struct is stored in a buffer by using glMapBufferRange(). I was wondering how it is down with JOGL? It would be great if there is any example.
Here is the sample code using OpenGL:
struct DrawArraysIndirectCommand
{
GLuint count;
GLuint primCount;
GLuint first;
GLuint baseInstance;
};
load_shaders();
object.load("media/objects/asteroids.sbm");
glGenBuffers(1, &indirect_draw_buffer);
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, indirect_draw_buffer);
glBufferData(GL_DRAW_INDIRECT_BUFFER,
NUM_DRAWS * sizeof(DrawArraysIndirectCommand),
NULL,
GL_STATIC_DRAW);
DrawArraysIndirectCommand * cmd = (DrawArraysIndirectCommand *)
glMapBufferRange(GL_DRAW_INDIRECT_BUFFER,
0,
NUM_DRAWS * sizeof(DrawArraysIndirectCommand),
GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
for (i = 0; i < NUM_DRAWS; i++)
{
object.get_sub_object_info(i % object.get_sub_object_count(),
cmd[i].first,
cmd[i].count);
cmd[i].primCount = 1;
cmd[i].baseInstance = i;
}
glUnmapBuffer(GL_DRAW_INDIRECT_BUFFER);

You can see the struct is just 4 gluint one after the other, you will need to imitate that:
ByteBuffer cmd = gl.glMapBufferRange(GL_DRAW_INDIRECT_BUFFER,
0,
NUM_DRAWS * 4 * 4,
GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
for (int i = 0; i < NUM_DRAWS; i++)
{
int first = object.get_sub_object_info_first(i % object.get_sub_object_count());
int count = object.get_sub_object_info_count(i % object.get_sub_object_count());
int primCount = 1;
int baseInstance = i;
cmd.putInt(count);
cmd.putInt(primCount);
cmd.putInt(first);
cmd.putInt(basInstance);
}

Related

OpenGL buffer problem when adding >= 2^16 numbers

I'm facing some strange difficulties with OpenGL buffer. I tried to shrunk the problem to the minimum source code, so I created program that increment each number of the FloatBuffer in each iteration. When I am adding less than 2^16 float numbers to the FloatBuffer, everything works just fine, but when I add >= 2^16 numbers, then the numbers are not incrementing and stays the same in each iteration.
Renderer:
public class Renderer extends AbstractRenderer {
int computeShaderProgram;
int[] locBuffer = new int[2];
FloatBuffer data;
int numbersCount = 65_536, round = 0; // 65_535 - OK, 65_536 - wrong
#Override
public void init() {
computeShaderProgram = ShaderUtils.loadProgram(null, null, null, null, null,
"/main/computeBuffer");
glGenBuffers(locBuffer);
// dataSizeInBytes = count of numbers to sort * (float=4B + padding=3*4B)
int dataSizeInBytes = numbersCount * (1 + 3) * 4;
data = ByteBuffer.allocateDirect(dataSizeInBytes)
.order(ByteOrder.nativeOrder())
.asFloatBuffer();
initBuffer();
printBuffer(data);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, locBuffer[0]);
glBufferData(GL_SHADER_STORAGE_BUFFER, data, GL_DYNAMIC_DRAW);
glShaderStorageBlockBinding(computeShaderProgram, 0, 0);
glViewport(0, 0, width, height);
}
private void initBuffer() {
data.rewind();
Random r = new Random();
for (int i = 0; i < numbersCount; i++) {
data.put(i*4, r.nextFloat());
}
}
#Override
public void display() {
if (round < 5) {
glUseProgram(computeShaderProgram);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, locBuffer[0]);
glDispatchCompute(numbersCount, 1, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, data);
printBuffer(data);
round++;
}
}
...
}
Compute buffer
#version 430
#extension GL_ARB_compute_shader : enable
#extension GL_ARB_shader_storage_buffer_object : enable
layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
layout(binding = 0) buffer Input {
float elements[];
}input_data;
void main () {
input_data.elements[gl_WorkGroupID.x ] = input_data.elements[gl_WorkGroupID.x] + 1;
}
glDispatchCompute(numbersCount, 1, 1);
You must not dispatch a compute shader workgroup count exceeding the corresponding GL_MAX_GL_MAX_COMPUTE_WORK_GROUP_COUNT for each dimension. The spec guarantees that limit to be at least 65535, so it is very likely that you just exceed the limit on your implementation. Actually, you should be getting a GL_INVALID_VALUE error for that call, and you should really consider using a debug context and debug message callback to have such obvious errors easily spotted during development.

glMultiDrawElementsIndirect - draws broken mesh... offsets issue?

I'm trying to build the command array but I keep on getting "broken" mesh draws. here is the struct I try to populate:
My Vertex/indices are stored in buffers as :
QVector<QVector3D> mVertex; // all meshes in 1 vector
QVector<unsigned int> mIndices; // all meshes in 1 vector
int mIndicesCount = mIndices.size(); // this is per mesh accessible
int mVertexCount = mVertex.size(); // this is per mesh accessible
The loop :
int skip =0;
int offset =0;
for (size_t u = 0; u < jobSize; ++u) {
DrawElementsIndirectCommand *cmd = &dstCmds[u];
cmd->count = mNodeList[u]->mIndicesCount;
cmd->instanceCount = 1;
cmd->firstIndex = skip;
cmd->baseVertex = offset;
cmd->baseInstance = 1;
skip += (mNodeList[u]->mIndicesCount * sizeof(unsigned int));
offset += (mNodeList[u]->mVertexCount / sizeof(unsigned int));
}
Does any1 see any errors here? I'm lost.
Also tried this :
skip += (mNodeList[u]->mIndicesCount / sizeof(unsigned int));
offset += (mNodeList[u]->mVertexCount);
based on > OpenGL glMultiDrawElementsIndirect with Interleaved Buffers
EDIT 2
I could not get it to work with the suggesions in comments, or I did somethingw rong... here is the main code responsible for building the buffers & commands.
PS. this exercise is about trying to follow AZDO -
https://github.com/nvMcJohn/apitest/blob/master/src/solutions/untexturedobjects/gl/bufferstorage.cpp
int jobSize = mNodeList.size();
QVector<QVector3D> mVertex;
QVector<QVector3D> mNormals;
QVector<unsigned int> mIndices;
for (auto &node:mNodeList) {
mVertex.append(node->mVertex);
mNormals.append(node->mVertexNormal);
mIndices.append(node->mIndices);
}
glBindVertexArray(m_varray);
glBindBuffer(GL_ARRAY_BUFFER, m_vb);
glBufferData(GL_ARRAY_BUFFER, mVertex.size() * sizeof(QVector3D), &mVertex[0], GL_STATIC_DRAW);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, m_ib);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, mIndices.size() * sizeof(unsigned int), &mIndices[0], GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
mShader->enableAttributeArray("In_v3Pos");
mShader->setAttributeBuffer("In_v3Pos", GL_FLOAT, 0, 3, sizeof(QVector3D));
glBindBuffer(GL_ARRAY_BUFFER, m_vn);
glBufferData(GL_ARRAY_BUFFER, mNormals.size() * sizeof(QVector3D), &mNormals[0], GL_STATIC_DRAW);
mShader->enableAttributeArray("In_v3Color");
mShader->setAttributeBuffer("In_v3Color", GL_FLOAT, 0, 3, sizeof(QVector3D));
const GLbitfield mapFlags = GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT;
const GLbitfield createFlags = mapFlags | GL_DYNAMIC_STORAGE_BIT;
mCommands.Destroy();
mCommands.Create(BufferStorage::PersistentlyMappedBuffer, GL_DRAW_INDIRECT_BUFFER, 3 * jobSize, createFlags, mapFlags);
mTransformBuffer.Destroy();
mTransformBuffer.Create(BufferStorage::PersistentlyMappedBuffer, GL_SHADER_STORAGE_BUFFER, 3 * jobSize, createFlags, mapFlags);
glBindVertexArray(0);
DrawElementsIndirectCommand *dstCmds = mCommands.Reserve(jobSize);
int skip = 0;
int offset = 0;
for (size_t u = 0; u < jobSize; ++u) {
DrawElementsIndirectCommand *cmd = &dstCmds[u];
cmd->count = mNodeList[u]->mIndicesCount;
cmd->instanceCount = 1;
cmd->firstIndex = skip*sizeof(unsigned int);
cmd->baseVertex = offset;
cmd->baseInstance = 0;
skip += mNodeList[u]->mIndicesCount ;
offset += mNodeList[u]->mVertexCount;
}

Ray Tracing using Nvidia Optix with Open Asset Import Library (assimp) - rendering multiple meshes

I'm trying to combine the versatility of Open Asset Import Library (reading in a variety of 3D model filetypes) with NVidia Optix ray tracing to render the models.
So far, it is working whenever the model I'm rendering is made up of a single mesh. When I try to render a file with more than one mesh, I get only partial results. I can't narrow down where the issue is, looking for some insight. Relevant code here:
Loading a file using assimp importer and creating the optix buffers:
int loadAsset(const char* path)
{
Assimp::Importer importer;
scene = importer.ReadFile(
path,
aiProcess_Triangulate
//| aiProcess_JoinIdenticalVertices
| aiProcess_SortByPType
| aiProcess_ValidateDataStructure
| aiProcess_SplitLargeMeshes
| aiProcess_FixInfacingNormals
);
if (scene) {
getBoundingBox(&scene_min, &scene_max);
scene_center.x = (scene_min.x + scene_max.x) / 2.0f;
scene_center.y = (scene_min.y + scene_max.y) / 2.0f;
scene_center.z = (scene_min.z + scene_max.z) / 2.0f;
float3 optixMin = { scene_min.x, scene_min.y, scene_min.z };
float3 optixMax = { scene_max.x, scene_max.y, scene_max.z };
aabb.set(optixMin, optixMax);
unsigned int numVerts = 0;
unsigned int numFaces = 0;
if (scene->mNumMeshes > 0) {
printf("Number of meshes: %d\n", scene->mNumMeshes);
// get the running total number of vertices & faces for all meshes
for (unsigned int i = 0; i < scene->mNumMeshes; i++) {
numVerts += scene->mMeshes[i]->mNumVertices;
numFaces += scene->mMeshes[i]->mNumFaces;
}
printf("Found %d Vertices and %d Faces\n", numVerts, numFaces);
// set up buffers
optix::Buffer vertices = context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_FLOAT3, numVerts);
optix::Buffer normals = context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_FLOAT3, numVerts);
optix::Buffer faces = context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_UNSIGNED_INT3, numFaces);
optix::Buffer materials = context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_UNSIGNED_INT, numVerts);
// unused buffer
Buffer tbuffer = context->createBuffer(RT_BUFFER_INPUT, RT_FORMAT_FLOAT2, 0);
// create material
std::string defaultPtxPath = "C:\\ProgramData\\NVIDIA Corporation\\OptiX SDK 4.1.0\\SDK\\build\\lib\\ptx\\";
Program phong_ch = context->createProgramFromPTXFile(defaultPtxPath + "optixPrimitiveIndexOffsets_generated_phong.cu.ptx", "closest_hit_radiance");
Program phong_ah = context->createProgramFromPTXFile(defaultPtxPath + "optixPrimitiveIndexOffsets_generated_phong.cu.ptx", "any_hit_shadow");
Material matl = context->createMaterial();
matl->setClosestHitProgram(0, phong_ch);
matl->setAnyHitProgram(1, phong_ah);
matl["Kd"]->setFloat(0.7f, 0.7f, 0.7f);
matl["Ka"]->setFloat(1.0f, 1.0f, 1.0f);
matl["Kr"]->setFloat(0.0f, 0.0f, 0.0f);
matl["phong_exp"]->setFloat(1.0f);
std::string triangle_mesh_ptx_path(ptxPath("triangle_mesh.cu"));
Program meshIntersectProgram = context->createProgramFromPTXFile(triangle_mesh_ptx_path, "mesh_intersect");
Program meshBboxProgram = context->createProgramFromPTXFile(triangle_mesh_ptx_path, "mesh_bounds");
optix::float3 *vertexMap = reinterpret_cast<optix::float3*>(vertices->map());
optix::float3 *normalMap = reinterpret_cast<optix::float3*>(normals->map());
optix::uint3 *faceMap = reinterpret_cast<optix::uint3*>(faces->map());
unsigned int *materialsMap = static_cast<unsigned int*>(materials->map());
context["vertex_buffer"]->setBuffer(vertices);
context["normal_buffer"]->setBuffer(normals);
context["index_buffer"]->setBuffer(faces);
context["texcoord_buffer"]->setBuffer(tbuffer);
context["material_buffer"]->setBuffer(materials);
Group group = createSingleGeometryGroup(meshIntersectProgram, meshBboxProgram, vertexMap,
normalMap, faceMap, materialsMap, matl);
context["top_object"]->set(group);
context["top_shadower"]->set(group);
vertices->unmap();
normals->unmap();
faces->unmap();
materials->unmap();
}
return 0;
}
return 1;
}
And the relevant function for creating the geometries and filling the buffers:
Group createSingleGeometryGroup(Program meshIntersectProgram, Program meshBboxProgram, optix::float3 *vertexMap,
optix::float3 *normalMap, optix::uint3 *faceMap, unsigned int *materialsMap, Material matl) {
Group group = context->createGroup();
optix::Acceleration accel = context->createAcceleration("Trbvh");
group->setAcceleration(accel);
std::vector<GeometryInstance> gis;
unsigned int vertexOffset = 0u;
unsigned int faceOffset = 0u;
for (unsigned int m = 0; m < scene->mNumMeshes; m++) {
aiMesh *mesh = scene->mMeshes[m];
if (!mesh->HasPositions()) {
throw std::runtime_error("Mesh contains zero vertex positions");
}
if (!mesh->HasNormals()) {
throw std::runtime_error("Mesh contains zero vertex normals");
}
printf("Mesh #%d\n\tNumVertices: %d\n\tNumFaces: %d\n", m, mesh->mNumVertices, mesh->mNumFaces);
// add points
for (unsigned int i = 0u; i < mesh->mNumVertices; i++) {
aiVector3D pos = mesh->mVertices[i];
aiVector3D norm = mesh->mNormals[i];
vertexMap[i + vertexOffset] = optix::make_float3(pos.x, pos.y, pos.z) + aabb.center();
normalMap[i + vertexOffset] = optix::normalize(optix::make_float3(norm.x, norm.y, norm.z));
materialsMap[i + vertexOffset] = 0u;
}
// add faces
for (unsigned int i = 0u; i < mesh->mNumFaces; i++) {
aiFace face = mesh->mFaces[i];
// add triangles
if (face.mNumIndices == 3) {
faceMap[i + faceOffset] = optix::make_uint3(face.mIndices[0], face.mIndices[1], face.mIndices[2]);
}
else {
printf("face indices != 3\n");
faceMap[i + faceOffset] = optix::make_uint3(-1);
}
}
// create geometry
optix::Geometry geometry = context->createGeometry();
geometry->setPrimitiveCount(mesh->mNumFaces);
geometry->setIntersectionProgram(meshIntersectProgram);
geometry->setBoundingBoxProgram(meshBboxProgram);
geometry->setPrimitiveIndexOffset(faceOffset);
optix::GeometryInstance gi = context->createGeometryInstance(geometry, &matl, &matl + 1);
gis.push_back(gi);
vertexOffset += mesh->mNumVertices;
faceOffset += mesh->mNumFaces;
}
printf("VertexOffset: %d\nFaceOffset: %d\n", vertexOffset, faceOffset);
// add all geometry instances to a geometry group
GeometryGroup gg = context->createGeometryGroup();
gg->setChildCount(static_cast<unsigned int>(gis.size()));
for (unsigned i = 0u; i < gis.size(); i++) {
gg->setChild(i, gis[i]);
}
Acceleration a = context->createAcceleration("Trbvh");
gg->setAcceleration(a);
group->setChildCount(1);
group->setChild(0, gg);
return group;
}
Running the above code on a sample file from assimp (using the dwarf.x, file contains 2 meshes) yields this result:
You can see only part of the second mesh (the dwarf's body) is rendered. I tried rendering each mesh separately, one at a time, and they render in full. But when putting them together I get this.
I'm thinking the issue is either with creating the geometry, perhaps I have these lines wrong:
geometry->setPrimitiveCount(mesh->mNumFaces);
geometry->setPrimitiveIndexOffset(faceOffset);
or the assimp postprocessing flags
scene = importer.ReadFile(
path,
aiProcess_Triangulate
//| aiProcess_JoinIdenticalVertices
| aiProcess_SortByPType
| aiProcess_ValidateDataStructure
| aiProcess_SplitLargeMeshes
| aiProcess_FixInfacingNormals
);
(note above, I had to comment out JoinIdenticalVertices because it gave me a horribly wrong result shown below):
Has anyone been able to successfully combine nvidia optix with open asset import library for rendering files with multiple meshes?
I found a solution, although not sure how optimal.
Each mesh still gets its own geometry, however instead of creating single vertex, index and normal buffers which are shared among all geometries, I create separate buffers for each geometry.
Then, instead of
context["vertex_buffer"]->setBuffer(vertices);
context["normal_buffer"]->setBuffer(normals);
context["index_buffer"]->setBuffer(faces);
context["texcoord_buffer"]->setBuffer(tbuffer);
context["material_buffer"]->setBuffer(materials);
I use
geometry["vertex_buffer"]->setBuffer(vertices);
geometry["normal_buffer"]->setBuffer(normals);
geometry["index_buffer"]->setBuffer(faces);
geometry["texcoord_buffer"]->setBuffer(tbuffer);
geometry["material_buffer"]->setBuffer(materials);
The result:

CImg Image is Colorless

Currently I am in the process of refining a function in my basic level editor program that allows me to save the maps I create. It spits out a .bmp image of the map produced. It does this through a library I've just discovered called CImg, which I know next to nothing about. Everything seems to work, but the resulting .bmp image is not colored, appearing instead in different shades of black and white. Like I said, I know basically nothing about the library, so if you know what the problem could be here, I would appreciate some help.
Here's the save function:
void Map::Save() {
Vertex top_left_most, top_right_most, bottom_left_most;
int img_w = 0, img_h = 0;
std::vector<std::pair<GLuint, GLuint>>::iterator tl = bufferIDs.begin(); //This little block gives the _most variables valid starting vals
glBindBuffer(GL_ARRAY_BUFFER, tl->second);
glGetBufferSubData(GL_ARRAY_BUFFER, sizeof(TextureCoord), sizeof(Vertex), &top_left_most);
top_right_most = bottom_left_most = top_left_most;
for (auto i = bufferIDs.begin(); i != bufferIDs.end(); ++i) { //SEEKS TOP LEFT MOST TILE ON MAP
Vertex current_coord;
glBindBuffer(GL_ARRAY_BUFFER, i->second);
glGetBufferSubData(GL_ARRAY_BUFFER, sizeof(TextureCoord), sizeof(Vertex), &current_coord);
if ((current_coord.x < top_left_most.x && current_coord.y < top_left_most.y) ||
(current_coord.x == top_left_most.x && current_coord.y < top_left_most.y) ||
(current_coord.x < top_left_most.x && current_coord.y == top_left_most.y)) {
top_left_most = current_coord;
}
}
for (auto i = bufferIDs.begin(); i != bufferIDs.end(); ++i) { //SEEKS TOP RIGHT MOST TILE ON MAP
Vertex current_coord;
glBindBuffer(GL_ARRAY_BUFFER, i->second);
glGetBufferSubData(GL_ARRAY_BUFFER, sizeof(TextureCoord), sizeof(Vertex), &current_coord);
if ((current_coord.x > top_right_most.x && current_coord.y < top_right_most.y) ||
(current_coord.x == top_right_most.x && current_coord.y < top_right_most.y) ||
(current_coord.x > top_right_most.x && current_coord.y == top_right_most.y)) {
top_right_most = current_coord;
}
}
for (auto i = bufferIDs.begin(); i != bufferIDs.end(); ++i) { //SEEKS BOTTOM LEFT MOST TILE ON MAP
Vertex current_coord;
glBindBuffer(GL_ARRAY_BUFFER, i->second);
glGetBufferSubData(GL_ARRAY_BUFFER, sizeof(TextureCoord), sizeof(Vertex), &current_coord);
if ((current_coord.x < bottom_left_most.x && current_coord.y > bottom_left_most.y) ||
(current_coord.x == bottom_left_most.x && current_coord.y > bottom_left_most.y) ||
(current_coord.x < bottom_left_most.x && current_coord.y == bottom_left_most.y)) {
bottom_left_most = current_coord;
}
}
img_w = (top_right_most.x + 64) - top_left_most.x; //Calculating image dimensions for the buffer
img_h = (bottom_left_most.y + 64) - top_left_most.y;
GLuint *image = new GLuint[img_w * img_h]; //Creating the image buffer
int int_start_x = 0; //start_x and y that will be used in buffer pointer positioning computations
int int_start_y = 0;
//these nested fors fill the buffer
for (GLfloat start_y = top_left_most.y; start_y != bottom_left_most.y + 64; start_y += 64) {
for (GLfloat start_x = top_left_most.x; start_x != top_right_most.x + 64; start_x += 64) {
bool in_map = false;
std::vector<std::pair<GLuint, GLuint>>::iterator valid_tile;
for (auto i = bufferIDs.begin(); i != bufferIDs.end(); ++i) { //This for checks to see if tile corresponding to start_x & y is present in map
Vertex current_tile_pos;
glBindBuffer(GL_ARRAY_BUFFER, i->second);
glGetBufferSubData(GL_ARRAY_BUFFER, sizeof(TextureCoord), sizeof(Vertex), &current_tile_pos);
if (current_tile_pos.x == start_x && current_tile_pos.y == start_y) {
in_map = true;
valid_tile = i;
break;
}
}
GLuint *imagepos = image; //Repositioning the pointer into the final image's buffer
imagepos += int_start_x + (int_start_y * img_w);
if (in_map) { //if in map, that tile's texture is used to fill the corresponding part of the image buffer
GLuint *texture = new GLuint[64 * 64];
glBindTexture(GL_TEXTURE_2D, valid_tile->first);
glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_UNSIGNED_BYTE, texture);
GLuint *texturepos = texture;
for (GLuint ypos = 0; ypos != 64; ++ypos) {
std::memcpy(imagepos, texturepos, 64 * 4);
texturepos += 64;
imagepos += img_w;
}
if (texture)
delete[] texture;
}
else { //otherwise, a default all-black array is used to fill the corresponding untiled part of the image buffer
GLuint *black_buffer = new GLuint[64 * 64];
GLuint *blackpos = black_buffer;
GLuint solid_black;
char *p = (char *)&solid_black;
p[0] = 0;
p[1] = 0;
p[2] = 0;
p[3] = 255;
for (GLuint i = 0; i != 64 * 64; ++i) {
black_buffer[i] = solid_black;
}
for (GLuint ypos = 0; ypos != 64; ++ypos) {
std::memcpy(imagepos, blackpos, 64 * 4);
blackpos += 64;
imagepos += img_w;
}
if (black_buffer)
delete[] black_buffer;
}
int_start_x += 64;
}
int_start_x = 0;
int_start_y += 64;
}
cimg_library::CImg<GLuint> final_image(image, img_w, img_h); //no color!!
final_image.save_bmp("map.bmp");
if (image)
delete[] image;
}
In case some explanation would be helpful, Vertex is a simple struct of two GLfloats (as is TextureCoord), bufferIDs is an std::vector of std::pairs of GLuints, the first representing a texture ID, and the second representing a VBO ID.
Here are the requested sample images:
what the image should look like (this is in monochrome)
Same exact image as above, but created using the reinterpret_cast method
Your line
cimg_library::CImg<GLuint> final_image(image, img_w, img_h);
is wrong if you are expecting a colour image because that creates a single channel image. You need a 3 at the end to make 3 channels - one for Red, one for Green and one for Blue.
Also, your data is stored in GLuint which means that a 4x2 pixel image will be stored like this, i.e. band-interleaved-by-pixel:
RGBA RGBA RGBA RGBA
RGBA RGBA RGBA RGBA
whereas CImg wants to store that in a band-interleaved-by-plane fashion:
RRRRRRRR
GGGGGGGG
BBBBBBBB
AAAAAAAA
This link explains the layout of CImg memory buffers.

How do I create a cudaTextureObject_t from linear memory?

I cannot get bindless textures referencing linear memory to work -- the result is always a zero/black read. My initialization code:
The buffer:
int const num = 4 * 16;
int const size = num * sizeof(float);
cudaMalloc(buffer, size);
auto b = new float[num];
for (int i = 0; i < num; ++i)
{
b[i] = i % 4 == 0 ? 1 : 1;
}
cudaMemcpy(*buffer, b, size, cudaMemcpyHostToDevice);
The texture object:
cudaTextureDesc td;
memset(&td, 0, sizeof(td));
td.normalizedCoords = 0;
td.addressMode[0] = cudaAddressModeClamp;
td.addressMode[1] = cudaAddressModeClamp;
td.addressMode[2] = cudaAddressModeClamp;
td.readMode = cudaReadModeElementType;
td.sRGB = 0;
td.filterMode = cudaFilterModePoint;
td.maxAnisotropy = 16;
td.mipmapFilterMode = cudaFilterModePoint;
td.minMipmapLevelClamp = 0;
td.maxMipmapLevelClamp = 0;
td.mipmapLevelBias = 0;
struct cudaResourceDesc resDesc;
memset(&resDesc, 0, sizeof(resDesc));
resDesc.resType = cudaResourceTypeLinear;
resDesc.res.linear.devPtr = *buffer;
resDesc.res.linear.sizeInBytes = size;
resDesc.res.linear.desc.f = cudaChannelFormatKindFloat;
resDesc.res.linear.desc.x = 32;
resDesc.res.linear.desc.y = 32;
resDesc.res.linear.desc.z = 32;
resDesc.res.linear.desc.w = 32;
checkCudaErrors(cudaCreateTextureObject(texture, &resDesc, &td, nullptr));
The kernel:
__global__ void
d_render(uchar4 *d_output, uint imageW, uint imageH, float* buffer, cudaTextureObject_t texture)
{
uint x = blockIdx.x * blockDim.x + threadIdx.x;
uint y = blockIdx.y * blockDim.y + threadIdx.y;
if ((x < imageW) && (y < imageH))
{
// write output color
uint i = y * imageW + x;
//auto f = make_float4(buffer[0], buffer[1], buffer[2], buffer[3]);
auto f = tex1D<float4>(texture, 0);
d_output[i] = to_uchar4(f * 255);
}
}
The texture object is initialized with something sensible (4099) when given to the kernel. The Buffer version works flawlessly.
Why does the texture object return zero/black?
As per the CUDA programming reference guide You need to use tex1Dfetch() to read from one-dimensional textures bound to linear texture memory, and tex1D to read from one-dimensional textures bound to CUDA arrays. This applies to both CUDA texture references and CUDA textures passed by object.
The difference between the two APIs is the coordinate argument. Textures bound to linear memory can only be addressed in texture coordinates (hence the integer coordinate argument in text1Dfetch()), whereas arrays support both texture and normalised coordinates (thus the float coordinate argument in tex1D).