I am trying to create an OpenCL raycaster. Therefore I am drawing to an OpenGL texture many times a second. However queue.enqueueNDRangeKernel eventually returns -9999. If I remove write_imagef from my kernel code, it works, so i figured this causes the problem.
OpenCL kernel (broken down)
__kernel void main(__write_only image2d_t screen)
{
unsigned int x = get_global_id(0);
unsigned int y = get_global_id(1);
int2 coords = (int2) (x, y);
write_imagef(screen, coords, (float4)(1,0,1,1));
}
This is the code that runs once in c++:
cl::Program::Sources sources;
string code = ResourceLoader::loadFile(filename);
sources.push_back({ code.c_str(),code.length() });
program = cl::Program(OpenCL::context, sources);
if (program.build({ OpenCL::default_device }) != CL_SUCCESS)
{
cout << "Could not build program \"" << filename << "\"! Error:" << endl;
cout << "OpenCL: Error building: " << program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(OpenCL::default_device) << "\n";
system("PAUSE");
exit(1);
}
queue = CommandQueue(OpenCL::context, OpenCL::default_device);
kernel = Kernel(program, "main");
//OpenGL texture
ImageGL b(OpenCL::context, CL_MEM_READ_WRITE, GL_TEXTURE_2D, 0, argument, &error);
if (error != 0)
{
cout << "CL Error: " << OpenCL::get_cl_error_string(error) << endl;
system("PAUSE");
exit(error);
}
kernel.setArg(0, b);
This Code runs every frame:
glFinish();
queue.enqueueAcquireGLObjects(&this->buffersGL);
NDRange range;
if (lengthZ <= 0 && lengthY <= 0)
range = NDRange(lengthX);
else if (lengthZ <= 0)
range = NDRange(lengthX, lengthY);
else
range = NDRange(lengthX, lengthY, lengthZ);
cl::Event wait;
cl_int run_err = queue.enqueueNDRangeKernel(kernel, NDRange(), range, NullRange, NULL, &wait);
if (run_err != 0)
{
cout << OpenCL::get_cl_error_string(run_err) << " (" << run_err << ")" << endl;
system("PAUSE");
}
queue.enqueueReleaseGLObjects(&this->buffersGL);
What could be causing the -9999 error and how can I fix it? Also, there are often big chunks of "dead pixels" that have not been drawn to in the texture...
You enqueue the release of GL buffers, but do not wait for it to complete.
queue.enqueueReleaseGLObjects(&this->buffersGL);
either get the finish event out of this (watch out for leaks!), or wait on the command queue to finish all tasks before proceeding to releasing the GL objects. When one thing in a queue depends on another, you are supposed to arrange their ordering yourself.
You also queue a bunch of tasks that depend on the GL objects. Either wait for them to complete (finish the queue), or take their events and feed them to the enqueue release GL objects as perquisites.
As an aside:
Using fewer kernels might be a good idea, instead of one per pixel.
Using fewer kernels might be a good idea, instead of one per pixel.
Thanks alot Yakk! I tried that by first simply using a smaller screen size and it suddenly worked again! As it turns out though the texture I was drawing to was the problem. It was not 600x600 pixels big and that's what caused the crash. Apparently OpenCL can draw to pixels that "don't actually exist" a couple of times before crashing. It still is weird behaviour...
Related
I have written a small c++ program that receives data from the USRP. The program can receive the I/Q data and show it on a spectrum analyzer. The receiver LED is not always green though. It sorts of blinking and dimming. I suspect there is a rate mismatch between the computer and the USRP. Could this be the case? How does one make sure that the computer consumes the samples at the same rate as the USRP is acquiring them? Below is a thread function I use for the I/Q signal acquisition.
void
USRPDriver::RxEventLoop()
{
uhd::rx_metadata_t md;
uhd::stream_cmd_t stream_cmd(uhd::stream_cmd_t::STREAM_MODE_NUM_SAMPS_AND_DONE);
stream_cmd.stream_now = true;
stream_cmd.num_samps = 1024;
//std::cout << "Maximum num samps = " << rx_stream->get_max_num_samps() << std::endl;
std::vector<std::complex<float> > fcpxIQ;
fcpxIQ.resize(1024);
usrp->issue_stream_cmd(stream_cmd);
while(true)
{
usrp->issue_stream_cmd(stream_cmd);
size_t num_rx_samps = rx_stream->recv(&fcpxIQ[0], 1024, md);
emit ReceiveIQ(fcpxIQ);
//std::cout << "Rx rate = " << usrp->get_rx_rate(0) << std::endl;
//fcpxIQ.clear();
}
}
you should not use NUM_SAMPS_AND_DONE if you want continuous streaming. That's exactly not the use case it's for: It tells the USRP to stop receiving once 1024 samples have been received.
Simply don't use that mode.
After generating a set of data using a compute shader and storing it in a Shader Storage buffer, I am attempting to read from that buffer to print out the data using the code:
#define INDEX_AT(x,y,z,i) (xyzToId(Vec3i((x), (y), (z)),\
Vec3i(NUM_RAYS_X,\
NUM_RAYS_Y,\
POINTS_ON_RAY))\
* 3 + (i))
PRINT_GL_ERRORS();
glBindBuffer(GL_SHADER_STORAGE_BUFFER, dPositionBuffer);
float* data_ptr = NULL;
for (int ray_i = 0; ray_i < POINTS_ON_RAY; ray_i++)
{
for (int y = 0; y < NUM_RAYS_Y; y++)
{
int x = 0;
data_ptr = NULL;
data_ptr = (float*)glMapBufferRange(
GL_SHADER_STORAGE_BUFFER,
INDEX_AT(x, y, ray_i, 0) * sizeof(float),
3 * (NUM_RAYS_X) * sizeof(float),
GL_MAP_READ_BIT);
if (data_ptr == NULL)
{
PRINT_GL_ERRORS();
return false;
}
else
{
for (int x = 0; x < NUM_RAYS_X; x++)
{
std::cout << "("
<< data_ptr[x * 3 + 0] << ","
<< data_ptr[x * 3 + 1] << ","
<< data_ptr[x * 3 + 2] << ") , ";
}
}
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
PRINT_GL_ERRORS();
std::cout << std::endl;
}
std::cout << "\n" << std::endl;
}
where the function xyzToId converts three dimensional coordinates into a one-dimensional index.
When I attempt to run this, however, the program crashes at the call to glMapBufferRange, giving the error message:
The NVIDIA OpenGL driver lost connection with the display driver due to exceeding the Windows Time-Out limit and is unable to continue.
The application must close.
Error code: 7
Would you like to visit
http://nvidia.custhelp.com/cgi-bin/nvidia.cfg/php/enduser/std_adp.php?p_faqid=3007
for help?
The buffer that I am mapping is not very large at all, only 768 floats, and previous calls to glMapBuffer on a different shader storage buffer (of only two floats) completed with no problems. I can't seem to find any information relevant to this error online, and everything that I have read about the speed of glMapBufferRange indicates that a buffer of this size should only take on the order of tens of milliseconds to map, not the two second timeout that the program is crashing on.
Am I missing something about how glMapBufferRange should be used?
It was an unrelated error. Today I learned that OpenGL sometimes buffers commands, and several actions (like mapping a buffer) forces it to finish all the commands in its queue. In this case, it was the action of actually dispatching the compute shader itself.
Today I also learned that indexing a shader storage buffer out of bounds will cause the OpenGL driver to freeze up just like it would if it was taking to long to complete.
All in all, this was largely a case of errors masquerading as different errors and popping up in the wrong spot.
I'm wondering what the best way to detect a high DPI display is. Currently I'm trying to use SDL_GetDisplayDPI (int, *float, *float, *float), however this has only returned errors on the two different computers I tested with (MacBook Pro running OS X 10.11.5 and iMac running macOS 10.12 Beta (16A238m)). For reference, my code is bellow.
float diagDPI = -1;
float horiDPI = -1;
float vertDPI = -1;
int dpiReturn = SDL_GetDisplayDPI (0, &diagDPI, &horiDPI, &vertDPI);
std::cout << "GetDisplayDPI() returned " << dpiReturn << std::endl;
if (dpiReturn != 0)
{
std::cout << "Error: " << SDL_GetError () << std::endl;
}
std::cout << "DDPI: " << diagDPI << std::endl << "HDPI: " << horiDPI << std::endl << "VDPI: " << vertDPI << std::endl;
Unfortunately, this is only giving me something like this:
/* Output */
GetDisplayDPI() returned -1
Error:
DDPI: -1
HDPI: -1
VDPI: -1
Not Retina
I also tried comparing the OpenGL drawable size with the SDL window size, but SDL_GetWindowSize (SDL_Window, *int, *int) is returning 0s, too. That code is bellow, followed by the output.
int gl_w;
int gl_h;
SDL_GL_GetDrawableSize (window, &gl_w, &gl_h);
std::cout << "GL_W: " << gl_w << std::endl << "GL_H: " << gl_h << std::endl;
int sdl_w;
int sdl_h;
SDL_GetWindowSize (window, &sdl_w, &sdl_h);
std::cout << "SDL_W: " << sdl_w << std::endl << "SDL_H: " << sdl_h << std::endl;
/* Output */
GL_W: 1280
GL_H: 720
SDL_W: 0
SDL_H: 0
It's entirely possible that I'm doing something wrong here, or making these calls in the wrong place, but I think more likely is that I'm on the wrong track entirely. There's a hint to disallow high-dpi canvases, so there's probably a simple bool somewhere, or something that I'm missing. I have certainly looked through the wiki, and checked Google, but I can't really find any help for this. Any suggestions or feedback are welcome!
Thank you for your time!
I know I'm not answering your question directly, and want to reiterate one thing you tried.
On a Macbook pro, when an SDL window is on an external display, SDL_GetWindowSize and SDL_GL_GetDrawableSize return the same values. If the window is on a Retina screen, they're different. Specifically, the drawable size is 2x larger in each dimension.
I was using a .framework installation of SDL when I encountered this issue. For an unrelated reason, I trashed the .framework SDL files (image and ttf as well), and built SDL from source (thus transitioning to a "unix-style" SDL-installation). To my surprise, SDL_GetDisplayDPI () is now returning 0, setting the values of DDPI, HDPI, and VDPI, to 109 on a non-retina iMac, and 113.5 on a retina MacBook Pro. I'm not certain that these are correct/accurate, but it is consistent between launches, so I'll work with it.
At this point, I'm not sure if it was a bug, which has been fixed in the repo, or was an issue with my .framework files. On a somewhat unrelated note, SDL_GetBasePath () and SDL_GetPrefPath (), which also weren't working, now return the expected values. If you're also experiencing any of these problems on macOS, try compiling and installing SDL from source (https://hg.libsdl.org/SDL).
Thanks for your input, everyone!
Maybe I’m confusing myself with threads, but my understanding of threading conflicts with each other.
I’ve created a program which uses POSIX pthreads. Without using these threads the program takes 0.061723 seconds to run, and with threads takes 0.081061 seconds to run.
At first I thought this is what should happen, as threads allow something to happen while other things should be able to happen. i.e. processing a lot of data on one thread while still having responsive UI on another, this would mean the processing of the data would take longer as the CPU divides its time between processing UI and processing the data.
However, surely the point of multithreading is to make the program take advantage of multiple CPUs/cores?
As you can tell I’m something of an intermediate so excuse me if it’s a simple question.
But what should I expect the program to do?
I’m running this on a mid-2012 Macbook Pro 13” base model. CPU is 22 nm "Ivy Bridge" 2.5 GHz Intel "Core i5" processor (3210M), with two independent processor "cores" on a single silicon chip
UPDATED WITH CODE
This is in main function. I didn’t add variable declaration for convenience but I’m sure you can work out what each does by its name:
// Loop through all items we need to process
//
while (totalNumberOfItemsToProcess > 0 && numberOfItemsToProcessOnEachIteration > 0 && startingIndex <= totalNumberOfItemsToProcess)
{
// As long as we have items to process...
//
// Align the index with number of items to process per iteration
//
const uint endIndex = startingIndex + (numberOfItemsToProcessOnEachIteration - 1);
// Create range
//
Range range = RangeMake(startingIndex,
endIndex);
rangesProcessed[i] = range;
// Create thread
//
// Create a thread identifier, 'newThread'
//
pthread_t newThread;
// Create thread with range
//
int threadStatus = pthread_create(&newThread, NULL, processCoordinatesInRangePointer, &rangesProcessed[i]);
if (threadStatus != 0)
{
std::cout << "Failed to create thread" << std::endl;
exit(1);
}
// Add thread to threads
//
threadIDs.push_back(newThread);
// Setup next iteration
//
// Starting index
//
// Realign the index with number of items to process per iteration
//
startingIndex = (endIndex + 1);
// Number of items to process on each iteration
//
if (startingIndex > (totalNumberOfItemsToProcess - numberOfItemsToProcessOnEachIteration))
{
// If the total number of items to process is less than the number of items to process on each iteration
//
numberOfItemsToProcessOnEachIteration = totalNumberOfItemsToProcess - startingIndex;
}
// Increment index
//
i++;
}
std::cout << "Number of threads: " << threadIDs.size() << std::endl;
// Loop through all threads, rejoining them back up
//
for ( size_t i = 0;
i < threadIDs.size();
i++ )
{
// Wait for each thread to finish before returning
//
pthread_t currentThreadID = threadIDs[i];
int joinStatus = pthread_join(currentThreadID, NULL);
if (joinStatus != 0)
{
std::cout << "Thread join failed" << std::endl;
exit(1);
}
}
The processing functions:
void processCoordinatesAtIndex(uint index)
{
const int previousIndex = (index - 1);
// Get coordinates from terrain
//
Coordinate3D previousCoordinate = terrain[previousIndex];
Coordinate3D currentCoordinate = terrain[index];
// Calculate...
//
// Euclidean distance
//
double euclideanDistance = Coordinate3DEuclideanDistanceBetweenPoints(previousCoordinate, currentCoordinate);
euclideanDistances[index] = euclideanDistance;
// Angle of slope
//
double slopeAngle = Coordinate3DAngleOfSlopeBetweenPoints(previousCoordinate, currentCoordinate, false);
slopeAngles[index] = slopeAngle;
}
void processCoordinatesInRange(Range range)
{
for ( uint i = range.min;
i <= range.max;
i++ )
{
processCoordinatesAtIndex(i);
}
}
void *processCoordinatesInRangePointer(void *threadID)
{
// Cast the pointer to the right type
//
struct Range *range = (struct Range *)threadID;
processCoordinatesInRange(*range);
return NULL;
}
UPDATE:
Here are my global variables, which, are only global for simplicity - don’t have a go!
std::vector<Coordinate3D> terrain;
std::vector<double> euclideanDistances;
std::vector<double> slopeAngles;
std::vector<Range> rangesProcessed;
std::vector<pthread_t> threadIDs;
Correct me if I’m wrong, but, I think the issue was with how the time elapsed was measured. Instead of using clock_t I’ve moved to gettimeofday() and that reports a shorter time, from non threaded time of 22.629000 ms to a threaded time of 8.599000 ms.
Does this seem right to people?
Of course, my original question was based on whether or not a multithreaded program SHOULD be faster or not, so I won’t mark this answer as the correct one for that reason.
I am doing a benchmark project between two graphical libraries (SDL, SFML) for my final cs project. I got it almost finished, however when I benchmark the speed of playing sounds, it always returns time taken 0, no matter how many loops he does. Do you know whats wrong with my code? The sound actually plays, however I should probably do some other algorithm.
void playSound()
{
Mix_PlayChannel(-1, sound, 0);
}
void soundBenchmark(int numOfCycles)
{
int time = SDL_GetTicks(), timeRequired;
for(int i = 0; i < numOfCycles; i++) playSound();
timeRequired = SDL_GetTicks() - time;
cout << "Time required for " << numOfCycles << " cycles: " << timeRequired << " seconds.\n";
}
The function Mix_PlayChannel() does not block the execution of the code. The function just send the data to the sound card( or equivalent) and returns.
You are going to have to remember the channel you used with Mix_PlayChannel() and then check periodically with Mix_Playing() whether that channel is playing or not and look at the time.