OpenGL fragment shader doesn't write to SSBO - opengl

I have a small OpenGL engine written and I want to use SSBOs to allow shaders to write debugging logs that I can write out. My log class looks like this:
#define SHADER_LOG_LINE_LENGTH 128
#define SHADER_LOG_MAX_LINES 512
#define SHADER_LOG_DATA_OFFSET sizeof(int32_t) * 3
#define SHADER_LOG_DATA_SIZE SHADER_LOG_LINE_LENGTH * SHADER_LOG_MAX_LINES * sizeof(int32_t)
#define SHADER_LOG_TOTAL_SIZE SHADER_LOG_DATA_OFFSET + SHADER_LOG_DATA_SIZE
class ShaderLog
{
protected:
GLuint ssbo;
GLuint binding_point;
int32_t number_of_lines;
int32_t max_lines;
int32_t line_length;
int32_t data[SHADER_LOG_DATA_SIZE];
public:
ShaderLog()
{
glGenBuffers(1,&(this->ssbo));
this->number_of_lines = 0;
this->max_lines = SHADER_LOG_MAX_LINES;
this->line_length = SHADER_LOG_LINE_LENGTH;
this->binding_point = 0;
glBindBuffer(GL_SHADER_STORAGE_BUFFER,this->ssbo);
glBufferData(GL_SHADER_STORAGE_BUFFER,SHADER_LOG_TOTAL_SIZE,&(this->number_of_lines),GL_DYNAMIC_COPY);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER,this->binding_point,this->ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER,0);
};
void connect_to_shader(Shader *shader, string shader_variable_name)
{
GLuint block_index = 0;
block_index = glGetProgramResourceIndex(shader->get_shader_program_number(),GL_SHADER_STORAGE_BLOCK,shader_variable_name.c_str());
if (block_index == GL_INVALID_INDEX)
ErrorWriter::write_error("Shader log could not be connected to the shader.");
glBindBufferBase(GL_SHADER_STORAGE_BUFFER,block_index,this->binding_point);
glShaderStorageBlockBinding(shader->get_shader_program_number(),block_index,this->binding_point);
}
virtual void load_from_gpu()
{
glBindBuffer(GL_SHADER_STORAGE_BUFFER,this->ssbo);
GLvoid* mapped_ssbo = glMapBuffer(GL_SHADER_STORAGE_BUFFER,GL_READ_ONLY);
if (mapped_ssbo == NULL)
ErrorWriter::write_error("Could not map shader log into client's memory space for reading.");
else
memcpy(&(this->number_of_lines),mapped_ssbo,SHADER_LOG_DATA_SIZE);
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
}
int get_number_of_lines()
{
return this->number_of_lines;
}
...
};
In my fragment shader I do:
#version 430
layout (std430, binding=0) buffer shader_log_data
{
int number_of_lines;
int max_lines;
int line_length;
int data[];
} shader_log;
...
void main()
{
shader_log.number_of_lines = 20; // just to test
shader_log.data[0] = 10000;
...
}
And the main program looks like this:
void render()
{
... // rendering
shader_log->load_from_gpu();
cout << "lines: " << shader_log->get_number_of_lines() << endl;
glutSwapBuffers();
}
...
int main(int argc, char** argv)
{
...
shader_log = new ShaderLog();
...
shader_log->connect_to_shader(shader,"shader_log_data");
shader_log->update_gpu();
...
// rendering loop
...
}
Now the number of lines written out remains 0, even though it should be set to 20 by the shader. I tried loading the data from GPU after glBufferData(...) and they are there, the problem seems to be in the connection between the buffer and the shader. I also tried reading the data in the shader and outputting them to screen and they're always 0, which supports my hypothesis. Basically I seem to be able to write/read to/from the SSBO from the CPU but not from the shader. Could anyone help me with finding the issue?

Why do you pass binding_point (which is 0) to your memcpy function? This way you get 0 bytes of data copied.
memcpy(&(this->number_of_lines),mapped_ssbo,this->binding_point);

I am assuming that at least one of those ... lines is a memory barrier that you neglected to show?
You need some form of synchronization across parallel invocations to ensure coherency, this is not automagic when using SSBOs, Image Load/Store, etc. You are likely to read an undefined value without a barrier in place as each execution unit has a different (isolated) view of the same memory.
Making sure that changes you made to the SSBO are visible to other shader invocations will require a barrier in your GLSL code prior to performing the read. Other types of incoherent operations may require a barrier in the GL command stream, but the premise is the same; order commands and read/write memory such that following operations use the result of previous ones.

Related

glfwSwapBuffers slow (>3s)

The bounty expires in 7 days. Answers to this question are eligible for a +50 reputation bounty.
Paul Aner is looking for a canonical answer:
I think the reason for this question is clear: I want the main-loop to NOT lock while a compute shader is processing larger amounts of data. I could try and seperate the data into smaller snippets, but if the computations were done on CPU, I would simply start a thread and everything would run nice and smoothly. Altough I of course would have to wait until the calculation-thread delivers new data to update the screen - the GUI (ImGUI) would not lock up...
I have written a program that does some calculations on a compute shader and the returned data is then being displayed. This works perfectly, except that the program execution is blocked while the shader is running (see code below) and depending on the parameters, this can take a while:
void CalculateSomething(GLfloat* Result)
{
// load some uniform variables
glDispatchCompute(X, Y, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
GLfloat* mapped = (GLfloat*)(glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_ONLY));
memcpy(Result, mapped, sizeof(GLfloat) * X * Y);
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
}
void main
{
// Initialization stuff
// ...
while (glfwWindowShouldClose(Window) == 0)
{
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glfwPollEvents();
glfwSwapInterval(2); // Doesn't matter what I put here
CalculatateSomething(Result);
Render(Result);
glfwSwapBuffers(Window.WindowHandle);
}
}
To keep the main loop running while the compute shader is calculating, I changed CalculateSomething to something like this:
void CalculateSomething(GLfloat* Result)
{
// load some uniform variables
glDispatchCompute(X, Y, 1);
GPU_sync = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
}
bool GPU_busy()
{
GLint GPU_status;
if (GPU_sync == NULL)
return false;
else
{
glGetSynciv(GPU_sync, GL_SYNC_STATUS, 1, nullptr, &GPU_status);
return GPU_status == GL_UNSIGNALED;
}
}
These two functions are part of a class and it would get a little messy and complicated if I had to post all that here (if more code is needed, tell me). So every loop when the class is told to do the computation, it first checks, if the GPU is busy. If it's done, the result is copied to CPU-memory (or a calculation is started), else it returns to main without doing anything else. Anyway, this approach works in that it produces the right result. But my main loop is still blocked.
Doing some timing revealed that CalculateSomething, Render (and everything else) runs fast (as I would expect them to do). But now glfwSwapBuffers takes >3000ms (depending on how long the calculations of the compute shader take).
Shouldn't it be possible to switch buffers while a compute shader is running? Rendering the result seems to work fine and without delay (as long as the compute shader is not done yet, the old result should get rendered). Or am I missing something here (queued OpenGL calls get processed before glfwSwapBuffers does something?)?
Edit:
I'm not sure why this question got closed and what additional information is needed (maybe other than the OS, which would be Windows). As for "desired behavior": Well - I'd like the glfwSwapBuffers-call not to block my main loop. For additional information, please ask...
As pointed out by Erdal Küçük an implicit call of glFlush might cause latency. I did put this call before glfwSwapBuffer for testing purposes and timed it - no latency here...
I'm sure, I can't be the only one who ever ran into this problem. Maybe someone could try and reproduce it? Simply put a compute shader in the main-loop that takes a few seconds to do it's calculations. I have read somewhere that similar problems occur escpecially when calling glMapBuffer. This seems to be an issue with the GPU-driver (mine would be an integrated Intel-GPU). But nowhere have I read about latencies above 200ms...
Solved a similar issue with GL_PIXEL_PACK_BUFFER effectively used as an offscreen compute shader. The approach with fences is correct, but you then need to have a separate function that checks the status of the fence using glGetSynciv to read the GL_SYNC_STATUS. The solution (admittedly in Java) can be found here.
An explanation for why this is necessary can be found in: in #Nick Clark's comment answer:
Every call in OpenGL is asynchronous, except for the frame buffer swap, which stalls the calling thread until all submitted functions have been executed. Thus, the reason why glfwSwapBuffers seems to take so long.
The relevant portion from the solution is:
public void finishHMRead( int pboIndex ){
int[] length = new int[1];
int[] status = new int[1];
GLES30.glGetSynciv( hmReadFences[ pboIndex ], GLES30.GL_SYNC_STATUS, 1, length, 0, status, 0 );
int signalStatus = status[0];
int glSignaled = GLES30.GL_SIGNALED;
if( signalStatus == glSignaled ){
// Ready a temporary ByteBuffer for mapping (we'll unmap the pixel buffer and lose this) and a permanent ByteBuffer
ByteBuffer pixelBuffer;
texLayerByteBuffers[ pboIndex ] = ByteBuffer.allocate( texWH * texWH );
// map data to a bytebuffer
GLES30.glBindBuffer( GLES30.GL_PIXEL_PACK_BUFFER, pbos[ pboIndex ] );
pixelBuffer = ( ByteBuffer ) GLES30.glMapBufferRange( GLES30.GL_PIXEL_PACK_BUFFER, 0, texWH * texWH * 1, GLES30.GL_MAP_READ_BIT );
// Copy to the long term ByteBuffer
pixelBuffer.rewind(); //copy from the beginning
texLayerByteBuffers[ pboIndex ].put( pixelBuffer );
// Unmap and unbind the currently bound pixel buffer
GLES30.glUnmapBuffer( GLES30.GL_PIXEL_PACK_BUFFER );
GLES30.glBindBuffer( GLES30.GL_PIXEL_PACK_BUFFER, 0 );
Log.i( "myTag", "Finished copy for pbo data for " + pboIndex + " at: " + (System.currentTimeMillis() - initSphereStart) );
acknowledgeHMReadComplete();
} else {
// If it wasn't done, resubmit for another check in the next render update cycle
RefMethodwArgs finishHmRead = new RefMethodwArgs( this, "finishHMRead", new Object[]{ pboIndex } );
UpdateList.getRef().addRenderUpdate( finishHmRead );
}
}
Basically, fire off the computer shader, then wait for the glGetSynciv check of GL_SYNC_STATUS to equal GL_SIGNALED, then rebind the GL_SHADER_STORAGE_BUFFER and perform the glMapBuffer operation.

glMapBufferRange freezing OpenGL driver

After generating a set of data using a compute shader and storing it in a Shader Storage buffer, I am attempting to read from that buffer to print out the data using the code:
#define INDEX_AT(x,y,z,i) (xyzToId(Vec3i((x), (y), (z)),\
Vec3i(NUM_RAYS_X,\
NUM_RAYS_Y,\
POINTS_ON_RAY))\
* 3 + (i))
PRINT_GL_ERRORS();
glBindBuffer(GL_SHADER_STORAGE_BUFFER, dPositionBuffer);
float* data_ptr = NULL;
for (int ray_i = 0; ray_i < POINTS_ON_RAY; ray_i++)
{
for (int y = 0; y < NUM_RAYS_Y; y++)
{
int x = 0;
data_ptr = NULL;
data_ptr = (float*)glMapBufferRange(
GL_SHADER_STORAGE_BUFFER,
INDEX_AT(x, y, ray_i, 0) * sizeof(float),
3 * (NUM_RAYS_X) * sizeof(float),
GL_MAP_READ_BIT);
if (data_ptr == NULL)
{
PRINT_GL_ERRORS();
return false;
}
else
{
for (int x = 0; x < NUM_RAYS_X; x++)
{
std::cout << "("
<< data_ptr[x * 3 + 0] << ","
<< data_ptr[x * 3 + 1] << ","
<< data_ptr[x * 3 + 2] << ") , ";
}
}
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
PRINT_GL_ERRORS();
std::cout << std::endl;
}
std::cout << "\n" << std::endl;
}
where the function xyzToId converts three dimensional coordinates into a one-dimensional index.
When I attempt to run this, however, the program crashes at the call to glMapBufferRange, giving the error message:
The NVIDIA OpenGL driver lost connection with the display driver due to exceeding the Windows Time-Out limit and is unable to continue.
The application must close.
Error code: 7
Would you like to visit
http://nvidia.custhelp.com/cgi-bin/nvidia.cfg/php/enduser/std_adp.php?p_faqid=3007
for help?
The buffer that I am mapping is not very large at all, only 768 floats, and previous calls to glMapBuffer on a different shader storage buffer (of only two floats) completed with no problems. I can't seem to find any information relevant to this error online, and everything that I have read about the speed of glMapBufferRange indicates that a buffer of this size should only take on the order of tens of milliseconds to map, not the two second timeout that the program is crashing on.
Am I missing something about how glMapBufferRange should be used?
It was an unrelated error. Today I learned that OpenGL sometimes buffers commands, and several actions (like mapping a buffer) forces it to finish all the commands in its queue. In this case, it was the action of actually dispatching the compute shader itself.
Today I also learned that indexing a shader storage buffer out of bounds will cause the OpenGL driver to freeze up just like it would if it was taking to long to complete.
All in all, this was largely a case of errors masquerading as different errors and popping up in the wrong spot.

newComputePipelineStateWithFunction failed

I am trying to let a neural net run on metal.
The basic idea is that of data duplication. Each gpu thread runs one version of the net for random data points.
I have written other shaders that work fine.
I also tried my code in a c++ command line app. No errors there.
There is also no compile error.
I used the apple documentation to convert to metal c++, since not everything from c++11 is supported.
It crashes after it loads the kernel function and when it tries to assign newComputePipelineStateWithFunction to the metal device. This means there is a problem with the code that isn't caught at compile time.
MCVE:
kernel void net(const device float *inputsVector [[ buffer(0) ]], // layout of net *
uint id [[ thread_position_in_grid ]]) {
uint floatSize = sizeof(tempFloat);
uint inputsVectorSize = sizeof(inputsVector) / floatSize;
float newArray[inputsVectorSize];
float test = inputsVector[id];
newArray[id] = test;
}
Update
It has everything to do with dynamic arrays.
Since it fails to create the pipeline state and doesn't crash running the actual shader it must be a coding issue. Not an input issue.
Assigning values from a dynamic array to a buffer makes it fail.
The real problem:
It is a memory issue!
To all the people saying that it was a memory issue, you were right!
Here is some pseudo code to illustrate it. Sorry that it is in "Swift" but easier to read. Metal Shaders have a funky way of coming to life. They are first initialised without values to get the memory. It was this step that failed because it relied on a later step: setting the buffer.
It all comes down to which values are available when. My understanding of newComputePipelineStateWithFunction was wrong. It is not simply getting the shader function. It is also a tiny step in the initialising process.
class MetalShader {
// buffers
var aBuffer : [Float]
var aBufferCount : Int
// step One : newComputePipelineStateWithFunction
memory init() {
// assign shader memory
// create memory for one int
let aStaticValue : Int
// create memory for one int
var aNotSoStaticValue : Int // this wil succeed, assigns memory for one int
// create memory for 10 floats
var aStaticArray : [Float] = [Float](count: aStaticValue, repeatedValue: y) // this will succeed
// create memory for x floats
var aDynamicArray : [Float] = [Float](count: aBuffer.count, repeatedValue: y) // this will fail
var aDynamicArray : [Float] = [Float](count: aBufferCount, repeatedValue: y) // this will fail
let tempValue : Float // one float from a loop
}
// step Two : commandEncoder.setBuffer()
assign buffers (buffers) {
aBuffer = cpuMemoryBuffer
}
// step Three : commandEncoder.endEncoding()
actual init() {
// set shader values
let aStaticValue : Int = 0
var aNotSoStaticValue : Int = aBuffer.count
var aDynamicArray : [Float] = [Float](count: aBuffer.count, repeatedValue: 1) // this could work, but the app already crashed before getting to this point.
}
// step Four : commandBuffer.commit()
func shaderFunction() {
// do stuff
for i in 0..<aBuffer.count {
let tempValue = aBuffer[i]
}
}
}
Fix:
I finally realised that buffers are technically dynamic arrays and instead of creating arrays inside the shader, I could also just add more buffers. This obviously works.
I think your problem is with this line :
uint schemeVectorSize = sizeof(schemeVector) / uintSize;
Here schemeVector is dynamic so as in classic C++ you cannot use sizeof on a dynamic array to get number of elements. sizeof would only work on arrays you would have defined locally/statically in the metal shader code.
Just imagine how it works internally : at compile time, the Metal compiler is supposed to transform the sizeof call into a constant ... but he can't since schemeVector is a parameter of your shader and thus can have any size ...
So for me the solution would be to compute schemeVectorSize in the C++/ObjectiveC/Swift part of your code, and pass it as a parameter to the shader (as a uniform in OpenGLES terminology ...).

Why can't I add an int member to my GLSL shader input/output block?

I'm getting an "invalid operation" error when attempting to call glUseProgram against the fragment shader below. The error only occurs when I try to add an int member to the block definition. Note that I am keeping the block definition the same in both the vertex and fragment shaders. I don't even have to access it! Merely adding that field to the vertex and fragment shader copies of the block definition cause the program to fail.
#version 450
...
in VSOutput // and of course "out" in the vertex shader
{
vec4 color;
vec4 normal;
vec2 texCoord;
//int foo; // uncommenting this line causes "invalid operation"
} vs_output;
I also get the same issue when trying to use free standing in/out variables of the same type, though in those cases, I only get the issue if accessing those variables directly; if I ignore them, I assume the compiler optimizes them away and thus error doesn't occur. It's almost like I'm only allowed to pass around vectors and matrices...
What am I missing here? I haven't been able to find anything in the documentation that would indicate that this should be an issue.
EDIT: padding it out with float[2] to force the int member onto the next 16-byte boundary did not work either.
EDIT: solved, as per the answer below. Turns out I could have figured this out much more quickly if I'd checked the shader program's info log. Here's my code to do that:
bool checkProgramLinkStatus(GLuint programId)
{
auto log = logger("Shaders");
GLint status;
glGetProgramiv(programId, GL_LINK_STATUS, &status);
if(status == GL_TRUE)
{
log << "Program link successful." << endlog;
return true;
}
return false;
}
bool checkProgramInfoLog(GLuint programId)
{
auto log = logger("Shaders");
GLint infoLogLength;
glGetProgramiv(programId, GL_INFO_LOG_LENGTH, &infoLogLength);
GLchar* strInfoLog = new GLchar[infoLogLength + 1];
glGetProgramInfoLog(programId, infoLogLength, NULL, strInfoLog);
if(infoLogLength == 0)
{
log << "No error message was provided" << endlog;
}
else
{
log << "Program link error: " << std::string(strInfoLog) << endlog;
}
return false;
}
(As already pointed out in the comments): The GL will never interpolate integer types. To quote the GLSL spec (Version 4.5) section 4.3.4 "input variables":
Fragment shader inputs that are signed or unsigned integers, integer vectors, or any double-precision
floating-point type must be qualified with the interpolation qualifier flat.
This of couse also applies to the corresponding outputs in the previous stage.

DOS ASCII Animation Lagging without constant input, Turbo C compiled

Here's an oddity from the past!
I'm writing an ASCII Pong game for the command prompt (Yes yes oldschool) and I'm writing to the video memory directly (Add. 0xB8000000) so I know I'm rendering quickly (As opposed to gotoxy and then printf rendering)
My code works fine, the code compiles fine under Turbo C++ V1.01 BUT the animation lags... now hold on hold on, there's a cavaet! Under my super fast boosted turbo Dell Core 2 Duo this seems logical however when I hold a key on the keyboard the animation becomes smooth as a newly compiled baby's bottom.
I thought maybe it was because I was slowing the computer down by overloading the keyboard buffer (wtf really? come on...) but then I quickly smartened up and tried compiling for DJGPP and Tiny C Compiler to test if the results are the same. On Tiny C Compiler I found I coulnd't compile 'far' pointer types... still confused on that one but I was able to compile for DJGPP and it the animation ran smoothly!
I want to compile this and have it work for Turbo C++ but this problem has been plagueing me for the past 3 days to no resolve. Does anyone know why the Turbo C++ constant calls to my rendering method (code below) will lag in the command prompt but DJGPP will not? I don't know if I'm compiling as debug or not, I don't even know how to check if I am. I did convert the code to ASM and I saw what looked to be debugging data at the header of the source so I don't know...
Any and all comments and help will be greatly appreciated!
Here is a quick example of what I'm up against, simple to compile so please check it out:
#include<stdio.h>
#include<conio.h>
#include<dos.h>
#include<time.h>
#define bX 80
#define bY 24
#define halfX bX/2
#define halfY bY/2
#define resolution bX*bY
#define LEFT 1
#define RIGHT 2
void GameLoop();
void render();
void clearBoard();
void printBoard();
void ballLogic();
typedef struct {
int x, y;
}vertex;
vertex vertexWith(int x, int y) {
vertex retVal;
retVal.x = x;
retVal.y = y;
return retVal;
}
vertex vertexFrom(vertex from) {
vertex retVal;
retVal.x = from.x;
retVal.y = from.y;
return retVal;
}
int direction;
char far *Screen_base;
char *board;
vertex ballPos;
void main() {
Screen_base = (char far*)0xB8000000;
ballPos = vertexWith(halfX, halfY);
direction = LEFT;
board = (char *)malloc(resolution*sizeof(char));
GameLoop();
}
void GameLoop() {
char input;
clrscr();
clearBoard();
do {
if(kbhit())
input = getch();
render();
ballLogic();
delay(50);
}while(input != 'p');
clrscr();
}
void render() {
clearBoard();
board[ballPos.y*bX+ballPos.x] = 'X';
printBoard();
}
void clearBoard() {
int d;
for(d=0;d<resolution;d++)
board[d] = ' ';
}
void printBoard() {
int d;
char far *target = Screen_base+d;
for(d=0;d<resolution;d++) {
*target = board[d];
*(target+1) = LIGHTGRAY;
++target;
++target;
}
}
void ballLogic() {
vertex newPos = vertexFrom(ballPos);
if(direction == LEFT)
newPos.x--;
if(direction == RIGHT)
newPos.x++;
if(newPos.x == 0)
direction = RIGHT;
else if(newPos.x == bX)
direction = LEFT;
else
ballPos = vertexFrom(newPos);
}
First, in the code:
void printBoard() {
int d;
char far *target = Screen_base+d; // <-- right there
for(d=0;d<resolution;d++) {
you are using the variable d before it is initialized.
My assumption is that if you are running this in a DOS window, rather than booting into DOS and running it, is that kbhit is having to do more work (indirectly -- within the DOS box's provided environment) if there isn't already a keypress queued up.
This shouldn't effect your run time very much, but I suggest that in the event that there is no keypress you explicitly set the input to some constant. Also, input should really be an int, not a char.
Other suggestions:
vertexFrom doesn't really do anything.
A = vertexFrom(B);
should be able to be replaced with:
A = B;
Your macro constants that have operators in them should have parenthisis around them.
#define Foo x/2
should be:
#define Foo (x/2)
so that you never ever have to worry about operator precedence no matter what code surrounds uses of Foo.
Under 16 bit x86 PCs there are actually 4 display areas that can be switched between. If you can swap between 2 of those for your animation, and your animations should appear to happen instantaneously. It's called Double Buffering. You have one buffer that acts as the current display buffer and one that is the working buffer. Then when you are satisfied with the working buffer (and the time is right, if you are trying to update the screen at a certain rate) then you swap them. I don't remember how to do this, but the particulars shouldn't be too difficult to find. I'd suggest that you might leave the initial buffer alone and restore back to it upon exit so that the program would leave the screen in just about the state that it started in. Also, you could use the other buffer to hold debug output and then if you held down the space bar or something that buffer could be displayed.
If you don't want to go that route and the 'X' is the only thing changing then you could forgo clearing the screen and just clear the last location of the 'X'.
Isn't the screen buffer an array of 2 byte units -- one for display character, and the other for the attributes? I think so, so I would represent it as an array of:
struct screen_unit {
char ch;
unsigned char attr;
}; /* or reverse those if I've got them backwards */
This would make it less likely for you to make mistakes based on offsets.
I'd also probably read and write them to the buffer as the 16 bit value, rather than the byte, though this shouldn't make a big difference.
I figured out why it wasn't rendering right away, the timer that I created is fine the problem is that the actual clock_t is only accurate to .054547XXX or so and so I could only render at 18fps. The way I would fix this is by using a more accurate clock... which is a whole other story