Related
I'm hoping to create a simple computer vision library in C++/CUDA C++ that allows me to do the following:
Grab some RGB data from the host memory. This data will come in a BGR byte array, 8 bits per channel per pixel.
Process that data in a CUDA kernel.
Write the output of that kernel back into some host memory.
Render the output in an OpenGL texture for easy viewing.
These functions would go inside a class like so:
class Processor{
public:
setInput(const byte* data, int imageWidth, int imageHeight);
void processData();
GLuint getInputTexture();
GLuint getOutputTexture();
void writeOutputTo(byte* destination);
}
setInput() is going to be called with every frame of a video (hundreds or thousands of images of the same dimensions).
How can I write the Processor class so that setInput() can efficiently update an instance's internal CUDA array and processData() can synchronize the CUDA array with the OpenGL texture?
Below is my attempt at implementing such a class, contained in one CUDA C++ file along with a simple test. (Requires GLFW and GLAD.) With this implementation, I can provide some input image data, run a CUDA kernel that produces an output image, and visualize both with OpenGL textures. But it's extremely inefficient because every time setInput() is called, two OpenGL textures and two CUDA surface objects need to be created. And if more than one image is processed, two OpenGL textures and two CUDA surface objects also have to be destroyed.
#include <glad/glad.h>
#include <GLFW/glfw3.h>
#include <cudaGL.h>
#include <cuda_gl_interop.h>
#include <iostream>
/** Macro for checking if CUDA has problems */
#define cudaCheckError() { \
cudaError_t err = cudaGetLastError(); \
if(err != cudaSuccess) { \
printf("Cuda error: %s:%d: %s\n", __FILE__, __LINE__, cudaGetErrorString(err)); \
exit(1); \
} \
}
/*Window dimensions*/
const int windowWidth = 1280, windowHeight = 720;
/*Window address*/
GLFWwindow* currentGLFWWindow = 0;
/**
* A simple image processing kernel that copies the inverted data from the input surface to the output surface.
*/
__global__ void kernel(cudaSurfaceObject_t input, cudaSurfaceObject_t output, int width, int height) {
//Get the pixel index
unsigned int xPx = threadIdx.x + blockIdx.x * blockDim.x;
unsigned int yPx = threadIdx.y + blockIdx.y * blockDim.y;
//Don't do any computation if this thread is outside of the surface bounds.
if (xPx >= width || yPx >= height) return;
//Copy the contents of input to output.
uchar4 pixel = { 255,128,0,255 };
//Read a pixel from the input. Disable to default to the flat orange color above
surf2Dread<uchar4>(&pixel, input, xPx * sizeof(uchar4), yPx, cudaBoundaryModeClamp);
//Invert the color
pixel.x = ~pixel.x;
pixel.y = ~pixel.y;
pixel.z = ~pixel.z;
//Write the new pixel color to the
surf2Dwrite(pixel, output, xPx * sizeof(uchar4), yPx);
}
class Processor {
public:
void setInput( uint8_t* const data, int imageWidth, int imageHeight);
void processData();
GLuint getInputTexture();
GLuint getOutputTexture();
void writeOutputTo(uint8_t* destination);
private:
/**
* #brief True if the textures and surfaces are initialized.
*
* Prevents memory leaks
*/
bool surfacesInitialized = false;
/**
* #brief The width and height of a texture/surface pair.
*
*/
struct ImgDim { int width, height; };
/**
* #brief Creates a CUDA surface object, CUDA resource, and OpenGL texture from some data.
*/
void createTextureSurfacePair(const ImgDim& dimensions, uint8_t* const data, GLuint& textureOut, cudaGraphicsResource_t& graphicsResourceOut, cudaSurfaceObject_t& surfaceOut);
/**
* #brief Destroys every CUDA surface object, CUDA resource, and OpenGL texture created by this instance.
*/
void destroyEverything();
/**
* #brief The dimensions of an image and its corresponding texture.
*
*/
ImgDim imageInputDimensions, imageOutputDimensions;
/**
* #brief A CUDA surface that can be read to, written from, or synchronized with a Mat or
* OpenGL texture
*
*/
cudaSurfaceObject_t d_imageInputTexture = 0, d_imageOutputTexture = 0;
/**
* #brief A CUDA resource that's bound to an array in CUDA memory
*/
cudaGraphicsResource_t d_imageInputGraphicsResource, d_imageOutputGraphicsResource;
/**
* #brief A renderable OpenGL texture that is synchronized with the CUDA data
* #see d_imageInputTexture, d_imageOutputTexture
*/
GLuint imageInputTexture = 0, imageOutputTexture = 0;
/** Returns true if nothing can be rendered */
bool empty() { return imageInputTexture == 0; }
};
void Processor::setInput(uint8_t* const data, int imageWidth, int imageHeight)
{
//Same-size images don't need texture regeneration, so skip that.
if (imageHeight == imageInputDimensions.height && imageWidth == imageInputDimensions.width) {
/*
Possible shortcut: we know the input is the same size as the texture and CUDA surface object.
So instead of destroying the surface and texture, why not just overwrite them?
That's what I try to do in the following block, but because "data" is BGR and the texture
is RGBA, the channels get all messed up.
*/
/*
//Use the input surface's CUDAResourceDesc to gain access to the surface data array
struct cudaResourceDesc resDesc;
memset(&resDesc, 0, sizeof(resDesc));
cudaGetSurfaceObjectResourceDesc(&resDesc, d_imageInputTexture);
cudaCheckError();
//Copy the data from the input array to the surface
cudaMemcpyToArray(resDesc.res.array.array, 0, 0, input.data, imageInputDimensions.width * imageInputDimensions.height * 3, cudaMemcpyHostToDevice);
cudaCheckError();
//Set status flags
surfacesInitialized = true;
return;
*/
}
//Clear everything that originally existed in the texture/surface
destroyEverything();
//Get the size of the image and place it here.
imageInputDimensions.width = imageWidth;
imageInputDimensions.height = imageHeight;
imageOutputDimensions.width = imageWidth;
imageOutputDimensions.height = imageHeight;
//Create the input surface/texture pair
createTextureSurfacePair(imageInputDimensions, data, imageInputTexture, d_imageInputGraphicsResource, d_imageInputTexture);
//Create the output surface/texture pair
uint8_t* outData = new uint8_t[imageOutputDimensions.width * imageOutputDimensions.height * 3];
createTextureSurfacePair(imageOutputDimensions, outData, imageOutputTexture, d_imageOutputGraphicsResource, d_imageOutputTexture);
delete outData;
//Set status flags
surfacesInitialized = true;
}
void Processor::processData()
{
const int threadsPerBlock = 128;
//Call the algorithm
//Set the number of blocks to call the kernel with.
dim3 blocks((unsigned int)ceil((float)imageInputDimensions.width / threadsPerBlock), imageInputDimensions.height);
kernel <<<blocks, threadsPerBlock >>> (d_imageInputTexture, d_imageOutputTexture, imageInputDimensions.width, imageInputDimensions.height);
//Sync the surface with the texture
cudaDeviceSynchronize();
cudaCheckError();
}
GLuint Processor::getInputTexture()
{
return imageInputTexture;
}
GLuint Processor::getOutputTexture()
{
return imageOutputTexture;
}
void Processor::writeOutputTo(uint8_t* destination)
{
//Haven't figured this out yet
}
void Processor::createTextureSurfacePair(const Processor::ImgDim& dimensions, uint8_t* const data, GLuint& textureOut, cudaGraphicsResource_t& graphicsResourceOut, cudaSurfaceObject_t& surfaceOut) {
// Create the OpenGL texture that will be displayed with GLAD and GLFW
glGenTextures(1, &textureOut);
// Bind to our texture handle
glBindTexture(GL_TEXTURE_2D, textureOut);
// Set texture interpolation methods for minification and magnification
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
// Set texture clamping method
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);
// Create the texture and its attributes
glTexImage2D(GL_TEXTURE_2D, // Type of texture
0, // Pyramid level (for mip-mapping) - 0 is the top level
GL_RGBA, // Internal color format to convert to
dimensions.width, // Image width i.e. 640 for Kinect in standard mode
dimensions.height, // Image height i.e. 480 for Kinect in standard mode
0, // Border width in pixels (can either be 1 or 0)
GL_BGR, // Input image format (i.e. GL_RGB, GL_RGBA, GL_BGR etc.)
GL_UNSIGNED_BYTE, // Image data type.
data); // The actual image data itself
//Note that the type of this texture is an RGBA UNSIGNED_BYTE type. When CUDA surfaces
//are synchronized with OpenGL textures, the surfaces will be of the same type.
//They won't know or care about their data types though, for they are all just byte arrays
//at heart. So be careful to ensure that any CUDA kernel that handles a CUDA surface
//uses it as an appropriate type. You will see that the update_surface kernel (defined
//above) treats each pixel as four unsigned bytes along the X-axis: one for red, green, blue,
//and alpha respectively.
//Create the CUDA array and texture reference
cudaArray* bitmap_d;
//Register the GL texture with the CUDA graphics library. A new cudaGraphicsResource is created, and its address is placed in cudaTextureID.
//Documentation: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__OPENGL.html#group__CUDART__OPENGL_1g80d12187ae7590807c7676697d9fe03d
cudaGraphicsGLRegisterImage(&graphicsResourceOut, textureOut, GL_TEXTURE_2D,
cudaGraphicsRegisterFlagsNone);
cudaCheckError();
//Map graphics resources for access by CUDA.
//Documentation: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__INTEROP.html#group__CUDART__INTEROP_1gad8fbe74d02adefb8e7efb4971ee6322
cudaGraphicsMapResources(1, &graphicsResourceOut, 0);
cudaCheckError();
//Get the location of the array of pixels that was mapped by the previous function and place that address in bitmap_d
//Documentation: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__INTEROP.html#group__CUDART__INTEROP_1g0dd6b5f024dfdcff5c28a08ef9958031
cudaGraphicsSubResourceGetMappedArray(&bitmap_d, graphicsResourceOut, 0, 0);
cudaCheckError();
//Create a CUDA resource descriptor. This is used to get and set attributes of CUDA resources.
//This one will tell CUDA how we want the bitmap_surface to be configured.
//Documentation for the struct: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaResourceDesc.html#structcudaResourceDesc
struct cudaResourceDesc resDesc;
//Clear it with 0s so that some flags aren't arbitrarily left at 1s
memset(&resDesc, 0, sizeof(resDesc));
//Set the resource type to be an array for convenient processing in the CUDA kernel.
//List of resTypes: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g067b774c0e639817a00a972c8e2c203c
resDesc.resType = cudaResourceTypeArray;
//Bind the new descriptor with the bitmap created earlier.
resDesc.res.array.array = bitmap_d;
//Create a new CUDA surface ID reference.
//This is really just an unsigned long long.
//Docuentation: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gbe57cf2ccbe7f9d696f18808dd634c0a
surfaceOut = 0;
//Create the surface with the given description. That surface ID is placed in bitmap_surface.
//Documentation: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__SURFACE__OBJECT.html#group__CUDART__SURFACE__OBJECT_1g958899474ab2c5f40d233b524d6c5a01
cudaCreateSurfaceObject(&surfaceOut, &resDesc);
cudaCheckError();
}
void Processor::destroyEverything()
{
if (surfacesInitialized) {
//Input image CUDA surface
cudaDestroySurfaceObject(d_imageInputTexture);
cudaGraphicsUnmapResources(1, &d_imageInputGraphicsResource);
cudaGraphicsUnregisterResource(d_imageInputGraphicsResource);
d_imageInputTexture = 0;
//Output image CUDA surface
cudaDestroySurfaceObject(d_imageOutputTexture);
cudaGraphicsUnmapResources(1, &d_imageOutputGraphicsResource);
cudaGraphicsUnregisterResource(d_imageOutputGraphicsResource);
d_imageOutputTexture = 0;
//Input image GL texture
glDeleteTextures(1, &imageInputTexture);
imageInputTexture = 0;
//Output image GL texture
glDeleteTextures(1, &imageOutputTexture);
imageOutputTexture = 0;
surfacesInitialized = false;
}
}
/** A way to initialize OpenGL with GLFW and GLAD */
void initGL() {
// Setup window
if (!glfwInit())
return;
// Decide GL+GLSL versions
#if __APPLE__
// GL 3.2 + GLSL 150
const char* glsl_version = "#version 150";
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 2);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE); // 3.2+ only
glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE); // Required on Mac
#else
// GL 3.0 + GLSL 130
const char* glsl_version = "#version 130";
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 0);
//glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE); // 3.2+ only
//glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE); // 3.0+ only
#endif
// Create window with graphics context
currentGLFWWindow = glfwCreateWindow(windowWidth, windowHeight, "Output image (OpenGL + GLFW)", NULL, NULL);
if (currentGLFWWindow == NULL)
return;
glfwMakeContextCurrent(currentGLFWWindow);
glfwSwapInterval(3); // Enable vsync
if (!gladLoadGL()) {
// GLAD failed
printf( "GLAD failed to initialize :(" );
return;
}
//Change GL settings
glViewport(0, 0, windowWidth, windowHeight); // use a screen size of WIDTH x HEIGHT
glMatrixMode(GL_PROJECTION); // Make a simple 2D projection on the entire window
glLoadIdentity();
glOrtho(0.0, windowWidth, windowHeight, 0.0, 0.0, 100.0);
glMatrixMode(GL_MODELVIEW); // Set the matrix mode to object modeling
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
glClearDepth(0.0f);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // Clear the window
}
/** Renders the textures on the GLFW window and requests GLFW to update */
void showTextures(GLuint top, GLuint bottom) {
// Clear color and depth buffers
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glMatrixMode(GL_MODELVIEW); // Operate on model-view matrix
glBindTexture(GL_TEXTURE_2D, top);
/* Draw top quad */
glEnable(GL_TEXTURE_2D);
glBegin(GL_QUADS);
glTexCoord2i(0, 0); glVertex2i(0, 0);
glTexCoord2i(0, 1); glVertex2i(0, windowHeight/2);
glTexCoord2i(1, 1); glVertex2i(windowWidth, windowHeight / 2);
glTexCoord2i(1, 0); glVertex2i(windowWidth, 0);
glEnd();
glDisable(GL_TEXTURE_2D);
/* Draw top quad */
glBindTexture(GL_TEXTURE_2D, bottom);
glEnable(GL_TEXTURE_2D);
glBegin(GL_QUADS);
glTexCoord2i(0, 0); glVertex2i(0, windowHeight / 2);
glTexCoord2i(0, 1); glVertex2i(0, windowHeight);
glTexCoord2i(1, 1); glVertex2i(windowWidth, windowHeight);
glTexCoord2i(1, 0); glVertex2i(windowWidth, windowHeight / 2);
glEnd();
glDisable(GL_TEXTURE_2D);
glfwSwapBuffers(currentGLFWWindow);
glfwPollEvents();
}
int main() {
initGL();
int imageWidth = windowWidth;
int imageHeight = windowHeight / 2;
uint8_t* imageData = new uint8_t[imageWidth * imageHeight * 3];
Processor p;
while (!glfwWindowShouldClose(currentGLFWWindow))
{
//Process the image here
p.setInput(imageData, imageWidth, imageHeight);
p.processData();
showTextures(p.getInputTexture(), p.getOutputTexture());
}
}
TL;DR: I can see at least 2 ways forward here, either convert your data to 4 byte pixels (somehow) and use cudaMemcpy2DToArray, or allow the CUDA kernel to take in raw data (instead of using a surface as input). I'll try to demonstrate both, although I don't wish to put in a large effort at polishing this, so really just demonstrating ideas.
This answer is working off the code you provided in an edit which is not your latest. However in the subsequent edits, mainly you seem to be just ripping out OpenCV, which I would normally applaud. However, since I've worked off your edit that had OpenCV in it, I've elected to use an OpenCV "test case" of my own.
Using 4 byte-per-pixel data, and cudaMemcpy2DToArray: This seems to adhere most closely to what you have demonstrated, albeit commented-out. The idea is we will access the input data by copying it to the CUDA array (acquired from the interop mechanism) directly. As you had previously pointed out, cudaMemcpyToArray is deprecated, so we won't use that. Furthermore, our data format (bytes per pixel) has to match what is in the array. I think there are a number of ways to solve this, depending on your overall pipeline, but the approach I show here isn't efficient, it's just to demonstrate that the method is "workable". If there is a way to use 4 byte per pixel data in your pipeline, however, you may be able to get rid of the "inefficiency" here. To use this method, compile the code with the -DUSE_1 switch.
Input of the data through the kernel. We can skip the inefficiency of the first case by just allowing the kernel to do the 3-byte to 4-byte conversion of data on the fly. Either way, there is a copy of data from host to device, but this method doesn't require 4 byte per pixel input data.
Here is code demonstrating both options:
//nvcc -arch=sm_35 -o t19 glad/src/glad.c t19.cu -lGL -lGLU -I./glad/include -lglfw -std=c++11 -lopencv_core -lopencv_highgui -lopencv_imgcodecs -Wno-deprecated-gpu-targets
#include <glad/glad.h>
#include <GLFW/glfw3.h>
#include <cudaGL.h>
#include <cuda_gl_interop.h>
#include <iostream>
#include <opencv2/highgui.hpp>
/** Macro for checking if CUDA has problems */
#define cudaCheckError() { \
cudaError_t err = cudaGetLastError(); \
if(err != cudaSuccess) { \
printf("Cuda error: %s:%d: %s\n", __FILE__, __LINE__, cudaGetErrorString(err)); \
exit(1); \
} \
}
/*Window dimensions*/
//const int windowWidth = 1280, windowHeight = 720;
/*Window address*/
GLFWwindow* currentGLFWWindow = 0;
/**
* A simple image processing kernel that copies the inverted data from the input surface to the output surface.
*/
__global__ void kernel(cudaSurfaceObject_t input, cudaSurfaceObject_t output, int width, int height, uint8_t *data) {
//Get the pixel index
unsigned int xPx = threadIdx.x + blockIdx.x * blockDim.x;
unsigned int yPx = threadIdx.y + blockIdx.y * blockDim.y;
//Don't do any computation if this thread is outside of the surface bounds.
if (xPx >= width || yPx >= height) return;
//Copy the contents of input to output.
#ifdef USE_1
uchar4 pixel = { 255,128,0,255 };
//Read a pixel from the input. Disable to default to the flat orange color above
surf2Dread<uchar4>(&pixel, input, xPx * sizeof(uchar4), yPx, cudaBoundaryModeClamp);
#else
uchar4 pixel;
pixel.x = data[(xPx+yPx*width)*3 + 0];
pixel.y = data[(xPx+yPx*width)*3 + 1];
pixel.z = data[(xPx+yPx*width)*3 + 2];
pixel.w = 255;
surf2Dwrite(pixel, input, xPx * sizeof(uchar4), yPx);
#endif
//Invert the color
pixel.x = ~pixel.x;
pixel.y = ~pixel.y;
pixel.z = ~pixel.z;
//Write the new pixel color to the
surf2Dwrite(pixel, output, xPx * sizeof(uchar4), yPx);
}
class Processor {
public:
void setInput( uint8_t* const data, int imageWidth, int imageHeight);
void processData(uint8_t *data, uint8_t *d_data);
GLuint getInputTexture();
GLuint getOutputTexture();
void writeOutputTo(uint8_t* destination);
private:
/**
* #brief True if the textures and surfaces are initialized.
*
* Prevents memory leaks
*/
bool surfacesInitialized = false;
/**
* #brief The width and height of a texture/surface pair.
*
*/
struct ImgDim { int width, height; };
/**
* #brief Creates a CUDA surface object, CUDA resource, and OpenGL texture from some data.
*/
void createTextureSurfacePair(const ImgDim& dimensions, uint8_t* const data, GLuint& textureOut, cudaGraphicsResource_t& graphicsResourceOut, cudaSurfaceObject_t& surfaceOut);
/**
* #brief Destroys every CUDA surface object, CUDA resource, and OpenGL texture created by this instance.
*/
void destroyEverything();
/**
* #brief The dimensions of an image and its corresponding texture.
*
*/
ImgDim imageInputDimensions, imageOutputDimensions;
/**
* #brief A CUDA surface that can be read to, written from, or synchronized with a Mat or
* OpenGL texture
*
*/
cudaSurfaceObject_t d_imageInputTexture = 0, d_imageOutputTexture = 0;
/**
* #brief A CUDA resource that's bound to an array in CUDA memory
*/
cudaGraphicsResource_t d_imageInputGraphicsResource, d_imageOutputGraphicsResource;
/**
* #brief A renderable OpenGL texture that is synchronized with the CUDA data
* #see d_imageInputTexture, d_imageOutputTexture
*/
GLuint imageInputTexture = 0, imageOutputTexture = 0;
/** Returns true if nothing can be rendered */
bool empty() { return imageInputTexture == 0; }
};
void Processor::setInput(uint8_t* const data, int imageWidth, int imageHeight)
{
//Same-size images don't need texture regeneration, so skip that.
if (imageHeight == imageInputDimensions.height && imageWidth == imageInputDimensions.width) {
/*
Possible shortcut: we know the input is the same size as the texture and CUDA surface object.
So instead of destroying the surface and texture, why not just overwrite them?
That's what I try to do in the following block, but because "data" is BGR and the texture
is RGBA, the channels get all messed up.
*/
//Use the input surface's CUDAResourceDesc to gain access to the surface data array
#ifdef USE_1
struct cudaResourceDesc resDesc;
memset(&resDesc, 0, sizeof(resDesc));
cudaGetSurfaceObjectResourceDesc(&resDesc, d_imageInputTexture);
cudaCheckError();
uint8_t *data4 = new uint8_t[imageInputDimensions.width*imageInputDimensions.height*4];
for (int i = 0; i < imageInputDimensions.width*imageInputDimensions.height; i++){
data4[i*4+0] = data[i*3+0];
data4[i*4+1] = data[i*3+1];
data4[i*4+2] = data[i*3+2];
data4[i*4+3] = 255;}
//Copy the data from the input array to the surface
// cudaMemcpyToArray(resDesc.res.array.array, 0, 0, data, imageInputDimensions.width * imageInputDimensions.height * 3, cudaMemcpyHostToDevice);
cudaMemcpy2DToArray(resDesc.res.array.array, 0, 0, data4, imageInputDimensions.width*4, imageInputDimensions.width*4, imageInputDimensions.height, cudaMemcpyHostToDevice);
cudaCheckError();
delete[] data4;
#endif
//Set status flags
surfacesInitialized = true;
return;
}
//Clear everything that originally existed in the texture/surface
destroyEverything();
//Get the size of the image and place it here.
imageInputDimensions.width = imageWidth;
imageInputDimensions.height = imageHeight;
imageOutputDimensions.width = imageWidth;
imageOutputDimensions.height = imageHeight;
//Create the input surface/texture pair
createTextureSurfacePair(imageInputDimensions, data, imageInputTexture, d_imageInputGraphicsResource, d_imageInputTexture);
//Create the output surface/texture pair
uint8_t* outData = new uint8_t[imageOutputDimensions.width * imageOutputDimensions.height * 3];
createTextureSurfacePair(imageOutputDimensions, outData, imageOutputTexture, d_imageOutputGraphicsResource, d_imageOutputTexture);
delete outData;
//Set status flags
surfacesInitialized = true;
}
void Processor::processData(uint8_t *data, uint8_t *d_data)
{
const int threadsPerBlock = 128;
//Call the algorithm
//Set the number of blocks to call the kernel with.
dim3 blocks((unsigned int)ceil((float)imageInputDimensions.width / threadsPerBlock), imageInputDimensions.height);
#ifndef USE_1
cudaMemcpy(d_data, data, imageInputDimensions.width*imageInputDimensions.height*3, cudaMemcpyHostToDevice);
#endif
kernel <<<blocks, threadsPerBlock >>> (d_imageInputTexture, d_imageOutputTexture, imageInputDimensions.width, imageInputDimensions.height, d_data);
//Sync the surface with the texture
cudaDeviceSynchronize();
cudaCheckError();
}
GLuint Processor::getInputTexture()
{
return imageInputTexture;
}
GLuint Processor::getOutputTexture()
{
return imageOutputTexture;
}
void Processor::writeOutputTo(uint8_t* destination)
{
//Haven't figured this out yet
}
void Processor::createTextureSurfacePair(const Processor::ImgDim& dimensions, uint8_t* const data, GLuint& textureOut, cudaGraphicsResource_t& graphicsResourceOut, cudaSurfaceObject_t& surfaceOut) {
// Create the OpenGL texture that will be displayed with GLAD and GLFW
glGenTextures(1, &textureOut);
// Bind to our texture handle
glBindTexture(GL_TEXTURE_2D, textureOut);
// Set texture interpolation methods for minification and magnification
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
// Set texture clamping method
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);
// Create the texture and its attributes
glTexImage2D(GL_TEXTURE_2D, // Type of texture
0, // Pyramid level (for mip-mapping) - 0 is the top level
GL_RGBA, // Internal color format to convert to
dimensions.width, // Image width i.e. 640 for Kinect in standard mode
dimensions.height, // Image height i.e. 480 for Kinect in standard mode
0, // Border width in pixels (can either be 1 or 0)
GL_BGR, // Input image format (i.e. GL_RGB, GL_RGBA, GL_BGR etc.)
GL_UNSIGNED_BYTE, // Image data type.
data); // The actual image data itself
//Note that the type of this texture is an RGBA UNSIGNED_BYTE type. When CUDA surfaces
//are synchronized with OpenGL textures, the surfaces will be of the same type.
//They won't know or care about their data types though, for they are all just byte arrays
//at heart. So be careful to ensure that any CUDA kernel that handles a CUDA surface
//uses it as an appropriate type. You will see that the update_surface kernel (defined
//above) treats each pixel as four unsigned bytes along the X-axis: one for red, green, blue,
//and alpha respectively.
//Create the CUDA array and texture reference
cudaArray* bitmap_d;
//Register the GL texture with the CUDA graphics library. A new cudaGraphicsResource is created, and its address is placed in cudaTextureID.
//Documentation: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__OPENGL.html#group__CUDART__OPENGL_1g80d12187ae7590807c7676697d9fe03d
cudaGraphicsGLRegisterImage(&graphicsResourceOut, textureOut, GL_TEXTURE_2D,
cudaGraphicsRegisterFlagsNone);
cudaCheckError();
//Map graphics resources for access by CUDA.
//Documentation: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__INTEROP.html#group__CUDART__INTEROP_1gad8fbe74d02adefb8e7efb4971ee6322
cudaGraphicsMapResources(1, &graphicsResourceOut, 0);
cudaCheckError();
//Get the location of the array of pixels that was mapped by the previous function and place that address in bitmap_d
//Documentation: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__INTEROP.html#group__CUDART__INTEROP_1g0dd6b5f024dfdcff5c28a08ef9958031
cudaGraphicsSubResourceGetMappedArray(&bitmap_d, graphicsResourceOut, 0, 0);
cudaCheckError();
//Create a CUDA resource descriptor. This is used to get and set attributes of CUDA resources.
//This one will tell CUDA how we want the bitmap_surface to be configured.
//Documentation for the struct: https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaResourceDesc.html#structcudaResourceDesc
struct cudaResourceDesc resDesc;
//Clear it with 0s so that some flags aren't arbitrarily left at 1s
memset(&resDesc, 0, sizeof(resDesc));
//Set the resource type to be an array for convenient processing in the CUDA kernel.
//List of resTypes: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g067b774c0e639817a00a972c8e2c203c
resDesc.resType = cudaResourceTypeArray;
//Bind the new descriptor with the bitmap created earlier.
resDesc.res.array.array = bitmap_d;
//Create a new CUDA surface ID reference.
//This is really just an unsigned long long.
//Docuentation: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1gbe57cf2ccbe7f9d696f18808dd634c0a
surfaceOut = 0;
//Create the surface with the given description. That surface ID is placed in bitmap_surface.
//Documentation: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__SURFACE__OBJECT.html#group__CUDART__SURFACE__OBJECT_1g958899474ab2c5f40d233b524d6c5a01
cudaCreateSurfaceObject(&surfaceOut, &resDesc);
cudaCheckError();
}
void Processor::destroyEverything()
{
if (surfacesInitialized) {
//Input image CUDA surface
cudaDestroySurfaceObject(d_imageInputTexture);
cudaGraphicsUnmapResources(1, &d_imageInputGraphicsResource);
cudaGraphicsUnregisterResource(d_imageInputGraphicsResource);
d_imageInputTexture = 0;
//Output image CUDA surface
cudaDestroySurfaceObject(d_imageOutputTexture);
cudaGraphicsUnmapResources(1, &d_imageOutputGraphicsResource);
cudaGraphicsUnregisterResource(d_imageOutputGraphicsResource);
d_imageOutputTexture = 0;
//Input image GL texture
glDeleteTextures(1, &imageInputTexture);
imageInputTexture = 0;
//Output image GL texture
glDeleteTextures(1, &imageOutputTexture);
imageOutputTexture = 0;
surfacesInitialized = false;
}
}
/** A way to initialize OpenGL with GLFW and GLAD */
void initGL(int windowWidth, int windowHeight) {
// Setup window
if (!glfwInit())
return;
// Decide GL+GLSL versions
#if __APPLE__
// GL 3.2 + GLSL 150
const char* glsl_version = "#version 150";
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 2);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE); // 3.2+ only
glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE); // Required on Mac
#else
// GL 3.0 + GLSL 130
//const char* glsl_version = "#version 130";
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 0);
//glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE); // 3.2+ only
//glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE); // 3.0+ only
#endif
// Create window with graphics context
currentGLFWWindow = glfwCreateWindow(windowWidth, windowHeight, "Output image (OpenGL + GLFW)", NULL, NULL);
if (currentGLFWWindow == NULL)
return;
glfwMakeContextCurrent(currentGLFWWindow);
glfwSwapInterval(3); // Enable vsync
if (!gladLoadGL()) {
// GLAD failed
printf( "GLAD failed to initialize :(" );
return;
}
//Change GL settings
glViewport(0, 0, windowWidth, windowHeight); // use a screen size of WIDTH x HEIGHT
glMatrixMode(GL_PROJECTION); // Make a simple 2D projection on the entire window
glLoadIdentity();
glOrtho(0.0, windowWidth, windowHeight, 0.0, 0.0, 100.0);
glMatrixMode(GL_MODELVIEW); // Set the matrix mode to object modeling
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
glClearDepth(0.0f);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // Clear the window
}
/** Renders the textures on the GLFW window and requests GLFW to update */
void showTextures(GLuint top, GLuint bottom, int windowWidth, int windowHeight) {
// Clear color and depth buffers
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glMatrixMode(GL_MODELVIEW); // Operate on model-view matrix
glBindTexture(GL_TEXTURE_2D, top);
/* Draw top quad */
glEnable(GL_TEXTURE_2D);
glBegin(GL_QUADS);
glTexCoord2i(0, 0); glVertex2i(0, 0);
glTexCoord2i(0, 1); glVertex2i(0, windowHeight/2);
glTexCoord2i(1, 1); glVertex2i(windowWidth, windowHeight / 2);
glTexCoord2i(1, 0); glVertex2i(windowWidth, 0);
glEnd();
glDisable(GL_TEXTURE_2D);
/* Draw bottom quad */
glBindTexture(GL_TEXTURE_2D, bottom);
glEnable(GL_TEXTURE_2D);
glBegin(GL_QUADS);
glTexCoord2i(0, 0); glVertex2i(0, windowHeight / 2);
glTexCoord2i(0, 1); glVertex2i(0, windowHeight);
glTexCoord2i(1, 1); glVertex2i(windowWidth, windowHeight);
glTexCoord2i(1, 0); glVertex2i(windowWidth, windowHeight / 2);
glEnd();
glDisable(GL_TEXTURE_2D);
glfwSwapBuffers(currentGLFWWindow);
glfwPollEvents();
}
int main() {
using namespace cv;
using namespace std;
// initGL();
std::string filename = "./lena.pgm";
Mat image;
image = imread(filename, CV_LOAD_IMAGE_COLOR); // Read the file
if(! image.data ) // Check for invalid input
{
cout << "Could not open or find the image" << std::endl ;
return -1;
}
int windoww = 1280;
int windowh = 720;
initGL(windoww,windowh);
uint8_t *d_data;
cudaMalloc(&d_data, image.cols*image.rows*3);
Processor p;
for (int i = 0; i < image.cols; i++)
{
image.data[i*3+0] = 0;
image.data[i*3+1] = 0;
image.data[i*3+2] = 0;
//Process the image here
p.setInput(image.data, image.cols, image.rows);
p.processData(image.data, d_data);
showTextures(p.getInputTexture(), p.getOutputTexture(), windoww, windowh);
}
}
Notes:
The compilation command is given in the comment in the first line
I created a "video" of sorts using a single image. The "video" will show the image with a black or white line moving horizontally from left to right in the top pixel row of the image. The input image is lena.pgm which can be found in the CUDA samples (for example, at /usr/local/cuda-10.1/samples/3_Imaging/SobelFilter/data/lena.pgm).
It looks to me like you are "sharing" resources between OpenGL and CUDA. This doesn't look like the right map/unmap sequence to me, but it seems to be working, and it doesn't seem to be the focus of your question. I haven't spent any time investigating. I may have missed something.
I'm not suggesting this code is defect free or suitable for any particular purpose. It is mostly your code. I've modified it slightly to demonstrate some ideas described in the text.
There shouldn't be any visual difference in the output whether you compile with -DUSE_1 or not.
This is an useful feature that came across first in (https://www.3dgep.com/opengl-interoperability-with-cuda/), and I have improved upon it to use latest CUDA APIs and flow. You can refer to these 2 functions in cudammf.
https://github.com/prabindh/cudammf/blob/5f93358784fcbaae7eea0850424c59d2ed057dab/cuda_postproces.cu#L119
https://github.com/prabindh/cudammf/blob/5f93358784fcbaae7eea0850424c59d2ed057dab/decoder3.cpp#L507
Basic working is as below:
Create a regular GL texture (GLTextureId). Map it for CUDA access, via cudaGraphicsGLRegisterImage
Do some CUDA processing, and result is in a CUDA buffer
USe cudaMemcpyToArray to transfer between the above 2 device memories
If your output is coming from a Nvidia codec output, you should also refer to the AppDecGL sample in the Nvidia Video SDK (https://developer.nvidia.com/nvidia-video-codec-sdk).
Trying to colour terrain points based on texture colour (currently hard coded to vec2(0.5, 0.5) for test purposes - which should be light blue) but all the points are grey. glGetError returns 0 throughout the whole process. I think I might be doing the render process wrong or have a problem with my shaders(?)
Vertex Shader:
void main(){
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
}
Fragment Shader:
uniform sampler2D myTextureSampler;
void main(){
gl_FragColor = texture2D(myTextureSampler, vec2(0.5, 0.5));
}
Terrain Class:
class Terrain
{
public:
Terrain(GLuint pProgram, char* pHeightmap, char* pTexture){
if(!LoadTerrain(pHeightmap))
{
OutputDebugString("Loading terrain failed.\n");
}
if(!LoadTexture(pTexture))
{
OutputDebugString("Loading terrain texture failed.\n");
}
mProgram = pProgram;
mTextureLocation = glGetUniformLocation(pProgram, "myTextureSampler");
};
~Terrain(){};
void Draw()
{
glEnableClientState(GL_VERTEX_ARRAY); // Uncommenting this causes me to see nothing at all
glBindBuffer(GL_ARRAY_BUFFER, mVBO);
glVertexPointer(3, GL_FLOAT, 0, 0);
glEnable( GL_TEXTURE_2D );
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, mBMP);
glProgramUniform1i(mProgram, mTextureLocation, 0);
GLenum a = glGetError();
glPointSize(5.0f);
glDrawArrays(GL_POINTS, 0, mNumberPoints);
a = glGetError();
glBindBuffer(GL_ARRAY_BUFFER, 0);
glDisable( GL_TEXTURE_2D );
glDisableClientState(GL_VERTEX_ARRAY);
}
private:
GLuint mVBO, mBMP, mUV, mTextureLocation, mProgram;
int mWidth;
int mHeight;
int mNumberPoints;
bool LoadTerrain(char* pFile)
{
/* Definitely no problem here - Vertex data is fine and rendering nice and dandy */
}
// TEXTURES MUST BE POWER OF TWO!!
bool LoadTexture(char *pFile)
{
unsigned char header[54]; // Each BMP file begins by a 54-bytes header
unsigned int dataPos; // Position in the file where the actual data begins
unsigned int width, height;
unsigned int imageSize;
unsigned char * data;
FILE * file = fopen(pFile, "rb");
if(!file)
return false;
if(fread(header, 1, 54, file) != 54)
{
fclose(file);
return false;
}
if ( header[0]!='B' || header[1]!='M' )
{
fclose(file);
return false;
}
// Read ints from the byte array
dataPos = *(int*)&(header[0x0A]);
imageSize = *(int*)&(header[0x22]);
width = *(int*)&(header[0x12]);
height = *(int*)&(header[0x16]);
// Some BMP files are misformatted, guess missing information
if (imageSize==0) imageSize=width*height*3; // 3 : one byte for each Red, Green and Blue component
if (dataPos==0) dataPos=54; // The BMP header is done that way
// Create a buffer
data = new unsigned char [imageSize];
// Read the actual data from the file into the buffer
fread(data,1,imageSize,file);
//Everything is in memory now, the file can be closed
fclose(file);
// Create one OpenGL texture
glGenTextures(1, &mBMP);
// "Bind" the newly created texture : all future texture functions will modify this texture
glBindTexture(GL_TEXTURE_2D, mBMP);
// Give the image to OpenGL
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, GL_BGR, GL_UNSIGNED_BYTE, data);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexEnvf( GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE );
glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT );
glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT );
delete [] data;
data = 0;
return true;
}
};
Answering own question incase anyone has a similar problem:
I had tested this with multiple images - but it turns out theres a bug in my graphics application of choice; which has been exporting 8-bit Bitmap's even though I explicitally told it to export 24-bit Bitmap's. So basically - reverting back to MS Paint solved my solution. 3 cheers for MS Paint.
I'm rendering a scene using opengl with textures loaded using some sample code and the FreeImage API.
Here is a link to what i'm seeing
[Image Removed]
I can confirm that all texture coordinates are provided to glTexCoord2f between 0.0f and 1.0f as required along with the vertex coordinates.
Each of the rendered triangles appears to have the full texture pasted across it (and repeated) not the area of the texture specified by the coordinates.
The texture is 1024x1024.
This is the function for loading the texture
bool loadImageToTexture(const char* image_path, unsigned int &handle)
{
if(!image_path)
return false;
FREE_IMAGE_FORMAT fif = FIF_UNKNOWN;
FIBITMAP* dib(0);
BYTE* bits(0);
unsigned int width(0), height(0);
//open the file
fif = FreeImage_GetFileType(image_path, 0);
if(fif == FIF_UNKNOWN)
fif = FreeImage_GetFIFFromFilename(image_path);
if(fif == FIF_UNKNOWN)
return false;
if(FreeImage_FIFSupportsReading(fif))
dib = FreeImage_Load(fif, image_path);
if(!dib)
return false;
//is the file of the correct type
FREE_IMAGE_COLOR_TYPE type = FreeImage_GetColorType(dib);
if(FIC_RGBALPHA != type)
{
//convert to type
FIBITMAP* ndib = FreeImage_ConvertTo32Bits(dib);
dib = ndib;
}
//get data for glTexImage2D
bits = FreeImage_GetBits(dib);
width = FreeImage_GetWidth(dib);
height = FreeImage_GetHeight(dib);
if((bits == 0) || (width == 0) || (height == 0))
return false;
//create the texture in open gl
glGenTextures(1, &handle);
glBindTexture(GL_TEXTURE_2D, handle);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
glPixelStorei(GL_TEXTURE_2D, 4);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height,
0, GL_RGBA, GL_UNSIGNED_BYTE, bits);
//unload the image now loaded
FreeImage_Unload(dib);
return true;
}
These are the functions for rendering
inline void glTexture(float textureSize, float t, float u)
{
glTexCoord2f(u/textureSize, (textureSize - t)/textureSize);
}
void processTriangle(const btVector3* triangle, const textureCoord* tc, const btVector3& normal,
const int partId, const int triangleIndex, const bool wireframe)
{
if(wireframe)
{
glBegin(GL_LINES);
glColor3f(1, 0, 0);
glVertex3d(triangle[0].getX(), triangle[0].getY(), triangle[0].getZ());
glVertex3d(triangle[1].getX(), triangle[1].getY(), triangle[1].getZ());
glColor3f(0, 1, 0);
glVertex3d(triangle[2].getX(), triangle[2].getY(), triangle[2].getZ());
glVertex3d(triangle[1].getX(), triangle[1].getY(), triangle[1].getZ());
glColor3f(0, 0, 1);
glVertex3d(triangle[2].getX(), triangle[2].getY(), triangle[2].getZ());
glVertex3d(triangle[0].getX(), triangle[0].getY(), triangle[0].getZ());
glEnd();
}
else
{
glBegin(GL_TRIANGLES);
glColor3f(1, 1, 1);
//Normals are per triangle
glNormal3f(normal.getX(), normal.getY(), normal.getZ());
glTexture(1024.0f, tc[0].t, tc[0].u);
glVertex3d(triangle[0].getX(), triangle[0].getY(), triangle[0].getZ());
glTexture(1024.0f, tc[1].t, tc[1].u);
glVertex3d(triangle[1].getX(), triangle[1].getY(), triangle[1].getZ());
glTexture(1024.0f, tc[2].t, tc[2].u);
glVertex3d(triangle[2].getX(), triangle[2].getY(), triangle[2].getZ());
glEnd();
}
}
void processAllTriangles(const btVector3& worldBoundsMin, const btVector3& worldBoundsMax)
{
btVector3 triangle[3];
textureCoord tc[3];
//go through the index list build triangles and draw them
unsigned int k = 0;
for(unsigned int i = 0; i<dVertices.size(); i+=3, k++)
{
//first vertex
triangle[0] = dVertices[i];
tc[0] = dTexCoords[i];
//second vertex
triangle[1] = dVertices[i+1];
tc[1] = dTexCoords[i+1];
//third vertex
triangle[2] = dVertices[i+2];
tc[2] = dTexCoords[i+2];
processTriangle(triangle, tc, dNormals[k], 0, 0, false);
}
}
//draw the world given the players position
void render(btScalar* m, const btCollisionShape* shape, const btVector3& color, int debugMode,
const btVector3& worldBoundsMin, const btVector3& worldBoundsMax)
{
//render the world using the generated OpenGL lists
//enable and specify pointers to vertex arrays
glPushMatrix();
glMultMatrixf(m);
glMatrixMode(GL_TEXTURE);
glLoadIdentity();
glMatrixMode(GL_MODELVIEW);
glEnable(GL_TEXTURE_2D);
glShadeModel(GL_SMOOTH);
glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
glBindTexture(GL_TEXTURE_2D, m_texturehandle);
processAllTriangles(worldBoundsMin, worldBoundsMax);
glPopMatrix();
}
I'm working in a larger code base and texturing is being done differently in different areas mean state from those other textured objects was not being disabled.
Before per vertex texturing is done make sure to turn off other varieties of texturing.
glDisable(GL_TEXTURE_GEN_S);
glDisable(GL_TEXTURE_GEN_T);
glDisable(GL_TEXTURE_GEN_R);
I want to draw a 2D array of pixel data (RGB / grayscale values) on the screen as fast as possible, using OpenGL. The pixel data changes frequently.
I had hoped that I would find a simple function that would let me push in a pointer to an array representing the pixel data, since this is probably the fastest approach. Unfortunately, I have found no such function.
What is the best way to accomplish this task?
Maybe glDrawPixels is the function you are looking for? Though if the data is static it would be better to create a texture with it, and then draw that each frame.
I recently had a similar problem, as I am trying to render a video to screen (ie repeatedly upload pixel data to the VRAM), my approach is:
use glTexImage2D and glTexSubImage2D to upload the data to the texture (ie bind the texture (and texture unit, if applicable) before calling that)
in my case as the video frame rate (usually about 24 fps) is lower than the framerate of my application (aimed at 60 fps), in order to avoid uploading the same data again I use a framebuffer object (check out glGenFramebuffers/glBindFramebuffer/glDeleteFramebuffers) and link my texture with the framebuffer (glFramebufferTexture2D). I then upload that texture once, and draw the same frame multiple times (just normal texture access with glBindTexture)
I don't know which platform you are using, but as I am targetting Mac I use some Apple extensions to ensure the data transfer to the VRAM happens through DMA (ie make glTexSubImage2D return immediately to let the CPU do other work) - please feel free to ask me for more info if you are using Mac too
also as you are using just grayscale, you might want to consider just using a GL_LUMINANCE texture (ie 1 byte per pixel) rather than RGB based format to make the upload faster (but that depends on the size of your texture data, I was streaming HD 1920x1080 video so I needed to make sure to keep it down)
also be aware of the format your hardware is using to avoid unnecessary data conversions (ie normally it seems better to use BGRA data than for example just RGB)
finally, in my code I replaced all the fixed pipeline functionality with shaders (in particular the conversion of the data from grayscale or YUV format to RGB), but again all that depends on the size of your data, and the workload of your CPU or GPU
Hope this helps, feel free to message me if you need further info
I would think the fastest way would be to draw a screen sized quad with ortho projection and use a pixel shader and Texture Buffer Object to draw directly to the texture in the pixel shader. Due to latency transferring to/from the TBO you may want to see if double buffering would help.
If speed isn't much of a concern (you just need fairly interactive framerates) glDrawPixels is easy to use and works well enough for many purposes.
My solution for getting dynamically changing image data to the screen in OpenGL,
#define WIN32_LEAN_AND_MEAN
#include "wx/wx.h"
#include "wx/sizer.h"
#include "wx/glcanvas.h"
#include "BasicGLPane.h"
// include OpenGL
#ifdef __WXMAC__
#include "OpenGL/glu.h"
#include "OpenGL/gl.h"
#else
#include <GL/glu.h>
#include <GL/gl.h>
#endif
#include "ORIScanMainFrame.h"
BEGIN_EVENT_TABLE(BasicGLPane, wxGLCanvas)
EVT_MOTION(BasicGLPane::mouseMoved)
EVT_LEFT_DOWN(BasicGLPane::mouseDown)
EVT_LEFT_UP(BasicGLPane::mouseReleased)
EVT_RIGHT_DOWN(BasicGLPane::rightClick)
EVT_LEAVE_WINDOW(BasicGLPane::mouseLeftWindow)
EVT_SIZE(BasicGLPane::resized)
EVT_KEY_DOWN(BasicGLPane::keyPressed)
EVT_KEY_UP(BasicGLPane::keyReleased)
EVT_MOUSEWHEEL(BasicGLPane::mouseWheelMoved)
EVT_PAINT(BasicGLPane::render)
END_EVENT_TABLE()
// Test data for image generation. floats range 0.0 to 1.0, in RGBRGBRGB... order.
// Array is 1024 * 3 long. Note that 32 * 32 is 1024 and is the largest image we can randomly generate.
float* randomFloatRGB;
float* randomFloatRGBGrey;
BasicGLPane::BasicGLPane(wxFrame* parent, int* args) :
wxGLCanvas(parent, wxID_ANY, args, wxDefaultPosition, wxDefaultSize, wxFULL_REPAINT_ON_RESIZE)
{
m_context = new wxGLContext(this);
randomFloatRGB = new float[1024 * 3];
randomFloatRGBGrey = new float[1024 * 3];
// In GL images 0,0 is in the lower left corner so the draw routine does a vertical flip to get 'regular' images right side up.
for (int i = 0; i < 1024; i++) {
// Red
randomFloatRGB[i * 3] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
// Green
randomFloatRGB[i * 3 + 1] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
// Blue
randomFloatRGB[i * 3 + 2] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
// Telltale 2 white pixels in 0,0 corner.
if (i < 2) {
randomFloatRGB[i * 3] = randomFloatRGB[i * 3 + 1] = randomFloatRGB[i * 3 + 2] = 1.0f;
}
randomFloatRGBGrey[i * 3] = randomFloatRGB[i * 3];
randomFloatRGBGrey[i * 3 + 1] = randomFloatRGB[i * 3];
randomFloatRGBGrey[i * 3 + 2] = randomFloatRGB[i * 3];
}
// To avoid flashing on MSW
SetBackgroundStyle(wxBG_STYLE_CUSTOM);
}
BasicGLPane::~BasicGLPane()
{
delete m_context;
}
void BasicGLPane::resized(wxSizeEvent& evt)
{
// wxGLCanvas::OnSize(evt);
Refresh();
}
int BasicGLPane::getWidth()
{
return GetSize().x;
}
int BasicGLPane::getHeight()
{
return GetSize().y;
}
void BasicGLPane::render(wxPaintEvent& evt)
{
assert(GetParent());
assert(GetParent()->GetParent());
ORIScanMainFrame* mf = dynamic_cast<ORIScanMainFrame*>(GetParent()->GetParent());
assert(mf);
switch (mf->currentMainView) {
case ORIViewSelection::ViewCamera:
renderCamera(evt);
break;
case ORIViewSelection::ViewDepth:
renderDepth(evt);
break;
case ORIViewSelection::ViewPointCloud:
renderPointCloud(evt);
break;
case ORIViewSelection::View3DModel:
render3DModel(evt);
break;
default:
renderNone(evt);
}
}
void BasicGLPane::renderNone(wxPaintEvent& evt) {
if (!IsShown())
return;
SetCurrent(*(m_context));
glPushAttrib(GL_ALL_ATTRIB_BITS);
glClearColor(0.08f, 0.11f, 0.15f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glFlush();
SwapBuffers();
glPopAttrib();
}
GLuint makeOpenGlTextureFromDataLuninanceFloats(int width, int height, float* f) {
GLuint textureID;
glEnable(GL_TEXTURE_2D);
glGenTextures(1, &textureID);
// "Bind" the newly created texture : all future texture functions will modify this texture
glBindTexture(GL_TEXTURE_2D, textureID);
// Give the image to OpenGL
glTexImage2D(GL_TEXTURE_2D, 0, GL_FLOAT, width, height, 0, GL_FLOAT, GL_LUMINANCE, f);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
return textureID;
}
GLuint makeOpenGlTextureFromRGBInts(int width, int height, unsigned int* f) {
GLuint textureID;
glEnable(GL_TEXTURE_2D);
glGenTextures(1, &textureID);
// "Bind" the newly created texture : all future texture functions will modify this texture
glBindTexture(GL_TEXTURE_2D, textureID);
// Give the image to OpenGL
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_INT, f);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
return textureID;
}
/// <summary>
/// Range of each float is 0.0f to 1.0f
/// </summary>
/// <param name="width"></param>
/// <param name="height"></param>
/// <param name="floatRGB"></param>
/// <returns></returns>
GLuint makeOpenGlTextureFromRGBFloats(int width, int height, float* floatRGB) {
GLuint textureID;
// 4.6.0 NVIDIA 457.30 (R Keene machine, 11/25/2020)
// auto sss = glGetString(GL_VERSION);
glGenTextures(1, &textureID);
// "Bind" the newly created texture : all future texture functions will modify this texture
glBindTexture(GL_TEXTURE_2D, textureID);
// Give the image to OpenGL
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, GL_RGB, GL_FLOAT, floatRGB);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
return textureID;
}
void BasicGLPane::DrawTextureToScreenFloat(int w, int h, float* floatDataPtr, GLuint (*textureFactory)(int width, int height, float* floatRGB)) {
if (w <= 0 || h <= 0 || floatDataPtr == NULL || w > 5000 || h > 5000) {
assert(false);
return;
}
SetCurrent(*(m_context));
glPushAttrib(GL_ALL_ATTRIB_BITS);
glPushMatrix();
glPushClientAttrib(GL_CLIENT_ALL_ATTRIB_BITS);
glClearColor(0.15f, 0.11f, 0.02f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glEnable(GL_TEXTURE_2D);
glLoadIdentity();
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
// 4.6.0 NVIDIA 457.30 (R Keene machine, 11/25/2020)
// auto sss = glGetString(GL_VERSION);
float onePixelW = (float)getWidth() / (float)w;
float onePixelH = (float)getHeight() / (float)h;
float orthoW = w;
float orthoH = h;
if (onePixelH > onePixelW) {
orthoH = h * onePixelH / onePixelW;
}
else {
orthoW = w * onePixelW / onePixelH;
}
// We want the image at the top of the window, not the bottom if the window is too tall.
int topOfScreen = (float)getHeight() / onePixelH;
// If the winjdow resizes after creation you need to change the viewport.
glViewport(0, 0, getWidth(), getHeight());
gluOrtho2D(0.0, orthoW, (double)topOfScreen - (double)orthoH, topOfScreen);
GLuint myTextureName = textureFactory(w, h, floatDataPtr);
glBegin(GL_QUADS);
{
// This order of UV coords and verticies will do the vertical flip of the image to get the 'regular' image 0,0
// in the top left corner.
glTexCoord2f(0.0f, 1.0f); glVertex3f(0.0f, 0.0f, 0.0f);
glTexCoord2f(1.0f, 1.0f); glVertex3f(0.0f + w, 0.0f, 0.0f);
glTexCoord2f(1.0f, 0.0f); glVertex3f(0.0f + w, 0.0f + h, 0.0f);
glTexCoord2f(0.0f, 0.0f); glVertex3f(0.0f, 0.0f + h, 0.0f);
}
glEnd();
glDeleteTextures(1, &myTextureName);
glFlush();
SwapBuffers();
glPopClientAttrib();
glPopMatrix();
glPopAttrib();
}
void BasicGLPane::DrawTextureToScreenMat(wxPaintEvent& evt, cv::Mat m, float brightness) {
m.type();
if (m.empty()) {
renderNone(evt);
return;
}
if (m.type() == CV_32FC1) { // Grey scale.
DrawTextureToScreenFloat(m.cols, m.rows, (float*)m.data, makeOpenGlTextureFromDataLuninanceFloats);
}
if (m.type() == CV_32FC3) { // Color.
DrawTextureToScreenFloat(m.cols, m.rows, (float*)m.data, makeOpenGlTextureFromRGBFloats);
}
else {
renderNone(evt);
}
}
void BasicGLPane::renderCamera(wxPaintEvent& evt) {
if (!IsShown())
return;
DrawTextureToScreenMat(evt, ORITopControl::Instance->im_white);
}
void BasicGLPane::renderDepth(wxPaintEvent& evt) {
if (!IsShown())
return;
DrawTextureToScreenMat(evt, ORITopControl::Instance->depth_map);
}
void BasicGLPane::render3DModel(wxPaintEvent& evt) {
if (!IsShown())
return;
SetCurrent(*(m_context));
glPushAttrib(GL_ALL_ATTRIB_BITS);
glPushMatrix();
glClearColor(0.08f, 0.11f, 0.15f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glFlush();
SwapBuffers();
glPopMatrix();
glPopAttrib();
}
void BasicGLPane::renderPointCloud(wxPaintEvent& evt) {
if (!IsShown())
return;
boost::unique_lock<boost::mutex> lk(ORITopControl::Instance->pointCloudCacheMutex);
SetCurrent(*(m_context));
glPushAttrib(GL_ALL_ATTRIB_BITS);
glPushMatrix();
glLoadIdentity();
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glViewport(0, 0, getWidth(), getHeight());
glClearColor(0.08f, 0.11f, 0.15f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
if (ORITopControl::Instance->pointCloudCache.size() > 0) {
glMatrixMode(GL_PROJECTION);
gluPerspective( /* field of view in degree */ 40.0,
/* aspect ratio */ 1.0,
/* Z near */ 1.0, /* Z far */ 500.0);
glMatrixMode(GL_MODELVIEW);
gluLookAt(100, 70, 200, // Eye
25, 25, 25, // Look at pt
0, 0, 1); // Up Vector
glPointSize(2.0);
glBegin(GL_POINTS);
// Use explicit for loop because pointCloudFragments can grow asynchronously.
for (int i = 0; i < ORITopControl::Instance->pointCloudCache.size(); i++) {
auto frag = ORITopControl::Instance->pointCloudCache[i];
auto current_point_cloud_ptr = frag->cloud;
glPushMatrix();
// glMultMatrixf(frag->xform.data());
for (size_t n = 0; n < current_point_cloud_ptr->size(); n++) {
glColor3ub(255, 255, 255);
glVertex3d(current_point_cloud_ptr->points[n].x, current_point_cloud_ptr->points[n].y, current_point_cloud_ptr->points[n].z);
}
glPopMatrix();
}
glEnd();
}
glFlush();
SwapBuffers();
glPopMatrix();
glPopAttrib();
}
I've trying to develop a simple 2D game engine and I need to be able to draw text, so I decided to use the famous, freetype library.
My current goal is to render every character from 0 to 255, and store the appropriate information in a glyph object, top and left positions of the character, width and height and an OpenGL texture. (In the future I wish to remove having 1 texture per glyph, but before starting to work on that I need my characters to be displaying correctly).
First I got all characters garbled but then I checked out this question and figured out I had to copy the bitmap buffer into a new one with OpenGL required texture sizes(i.e. a power of two for both width and height). From there on, every character looked great, so I tried different font sizes to see if everything was working out OK, guess what...it wasn't, on small font sizes(i.e <30) every character with a small width ('l', 'i', ';', ':') showed up garbled.
http://img17.imageshack.us/img17/2963/helloworld.png
All distorted characters had a width of 1, which came out to two, because that's the first power of two, here's the code for rendering each character.
bool Font::Render(int PixelSize)
{
if( FT_Set_Pixel_Sizes( mFace, 0, PixelSize ) != 0)
return false;
for(int i = 0; i < 256; ++i)
{
FT_UInt GlyphIndex;
GlyphIndex = FT_Get_Char_Index( mFace, i );
if( FT_Load_Glyph( mFace, GlyphIndex, FT_LOAD_RENDER ) != 0)
return false;
int BitmapWidth = mFace->glyph->bitmap.width;
int BitmapHeight = mFace->glyph->bitmap.rows;
int TextureWidth = NextP2(BitmapWidth);
int TextureHeight= NextP2(BitmapHeight);
printf("Glyph is %c, BW: %i, BH: %i, TW: %i, TH: %i\n", i, BitmapWidth, BitmapHeight, TextureWidth, TextureHeight);
mGlyphs[i].SetAdvance(mFace->glyph->advance.x >> 6);
mGlyphs[i].SetLeftTop(mFace->glyph->bitmap_left, mFace->glyph->bitmap_top);
GLubyte * TextureBuffer = new GLubyte[ TextureWidth * TextureHeight ];
for(int j = 0; j < TextureHeight; ++j)
{
for(int i = 0; i < TextureWidth; ++i)
{
TextureBuffer[ j*TextureWidth + i ] = (j >= BitmapHeight || i >= BitmapWidth ? 0 : mFace->glyph->bitmap.buffer[ j*BitmapWidth + i ]);
}
}
for(int k = 0; k < TextureWidth * TextureHeight; ++k)
printf("Buffer is %i\n", TextureBuffer[k]);
Texture GlyphTexture;
GLuint Handle;
glGenTextures( 1, &Handle );
GlyphTexture.SetHandle(Handle);
GlyphTexture.SetWidth(TextureWidth);
GlyphTexture.SetHeight(TextureHeight);
glBindTexture(GL_TEXTURE_2D, Handle);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexEnvf(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE);
glTexImage2D(GL_TEXTURE_2D, 0, GL_ALPHA, TextureWidth, TextureHeight, 0, GL_ALPHA, GL_UNSIGNED_BYTE, TextureBuffer);
mGlyphs[i].SetTexture(GlyphTexture);
delete [] TextureBuffer;
}
return true;
}
The only solution I could find was a little if statement, changing the TextureWidth to 4 if it has the value of 2, and that way the text looks fine in every font size. But this doesn't make any sense to me, why would OpenGL reject width of 2 textures? Oh, and I've determined that the problem does not lie within the buffer, I printed it out and it looks just fine.
Can you think of any reason why this is happening?
Here's the rest of the (extremely) messy code, in case the problem does not lie within the Render function.
int NextP2(int Val)
{
int RVal = 2;
while(RVal < Val) RVal <<= 1;
return RVal;
}
class Texture
{
public:
void Free() { glDeleteTextures(1, &mHandle); }
void SetHandle(GLuint Handle) { mHandle = Handle; }
void SetWidth(int Width) { mWidth = Width; }
void SetHeight(int Height) { mHeight = Height; }
inline GLuint Handle() const { return mHandle; }
inline int Width() const { return mWidth; }
inline int Height() const { return mHeight; }
private:
GLuint mHandle;
int mWidth;
int mHeight;
};
class Glyph
{
public:
void SetAdvance(int Advance) { mAdvance = Advance; }
void SetLeftTop(int Left, int Top) { mLeft = Left; mTop = Top; }
void SetTexture(Texture texture) { mTexture = texture; }
Texture & GetTexture() { return mTexture; }
int GetAdvance() const { return mAdvance; }
int GetLeft() const { return mLeft; }
int GetTop() const { return mTop; }
private:
int mAdvance;
int mLeft;
int mTop;
Texture mTexture;
};
class Font
{
public:
~Font() { for(int i = 0; i < 256; ++i) mGlyphs[i].GetTexture().Free(); }
bool Load(const std::string & File);
bool Render(int PixelSize);
Glyph & GetGlyph(unsigned char CharCode) { return mGlyphs[CharCode]; }
private:
FT_Face mFace;
Glyph mGlyphs[256];
};
I think that's about everything, please let me know if you've found the problem.
Best Regards and Thanks in Advance,
I bet I stumbled upon exactly the same problem with freetype and OpenGL last week.
OpenGL default is to consider that texture data rows are aligned on 4 bytes boundaries. This causes problems when texture width becomes 2 or 1 which are valid power-of-2 sizes.
Does setting the row alignment to 1 byte boundaries fix your problem ? Using
glPixelStore(GL_UNPACK_ALIGNMENT, 1)
HTH
Prepare yourself to support >256 characters (That is history)
and Right-to-left and left-to-right text flow as well as topdown flow.
Just a different topic hint for english speakers.
You probably ran into a texture-wrapping issue.
The pixels rendered on one border of your texture 'bleed' over to the other side in OpenGls attempt to tile the texture. correctly.
If i am correct, changing you wrapping parameters to
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);
should make opengl clamp your texture at its edges, and draw your small textures correctly.
Update
I found something on a tutorial page:
If a small font's bitmaps are not
aligned with the screen pixels,
bilinear interpolation will cause
artifacts. An easy fix for this is to
change the interpolation method from
GL_LINEAR to GL_NEAREST, though this
may lead to additional artifacts in
the case of rotated text. (If rotating
text is important to you, you may want
to try telling OpenGL to super sample
from larger font bitmaps.)
Use Freetype with ICU. Freetype alone can't render the glyphs properly. Use ICU for that.