Problem
There should not be gaps between those white lines. Those lines are composed of squares (eventually I won't be generating just another giant square; this is for debugging). For some reason when I send data through my Uniform Buffer Object (example below), I am getting gaps. It's almost as if it's skipping every other y value. There are actually two squares on each location instead of there being one at (y) and one at (y + 1).
Code Snippets
Generating data pointer array
blockData = new glm::vec2[24*24];
for (int x = 0; x < 24; x++) {
for (int y = 0; y < 24; y++) {
int i = x * 24 + y;
blockData[i] = glm::vec2(x, y);
}
}
In the rendering class
glBindBuffer(GL_UNIFORM_BUFFER, ubo);
glBufferSubData(GL_UNIFORM_BUFFER, 0, sizeof(glm::vec2) * blocksActive, blockData);
glBindBufferRange(GL_UNIFORM_BUFFER, uniBlockData, ubo, 0, sizeof(glm::vec2) * blocksActive);
glBindBuffer(GL_UNIFORM_BUFFER, 0);
glBindVertexArray(vao);
glDrawElementsInstanced(GL_TRIANGLES, 6, GL_UNSIGNED_INT, (void*)0, blocksActive);
glBindVertexArray(0);
Vertex Shader (GLSL)
layout (std140) uniform blockData {
vec2 blockDataPosition[5184];
};
Testing
When I change blockData[i] = glm::vec2(x, y); to blockData[i] = glm::vec2(y, x); (switching y and x), the gaps move to the x-axis.
I have tried switching the x and the y in the for loop, but it does not affect it. This issue is somehow linked to the y variable.
What does affect it is if I switch the x and y around in int i = x * 24 + y;
Setting the vec2 to (x, x) results in a correctly placed diagonal.
Setting the vec2 to (y, y) results in an oddly placed diagonal (below)
Before switching to a UBO, I was just using a uniform in the shader and it worked fine. That is why I believe it has something to do with my sending of data through the UBO.
Well so what is happening is most likely an alignment issue. Using std140, you must align for 4 floats. However, without seeing all of your code, its not going to be possible for me to be completely certain how you are buffering your data. What I can suggest, however, is to query your block offsets using glGetActiveUniformsiv, and then align your objects based on that. So a typical example would be this :
char* bufferNames[] = [generate the values with a for loop, this should be pretty easy with string and sprintf e.g "blockData.blockDataPosition[0]" to "block.Data.blockDataPosition[5184]"];
GLuint uniformBlockIndex = glGetUniformBlockIndex(yourprogram, "block name");
GLuint uniformIndices[number of names in bufferNames];
glGetUniformIndices(yourprogram, the number of names in bufferNames, bufferNames, indices);
GLint uniformOffsets[the number of names in bufferNames];
glGetActiveUniformsiv(yourprogram, 1, &uniformIndices, GL_UNIFORM_OFFSET, uniformOffsets);
The offsets will tell you everything you need to know. They are the number of offset in bytes for every name. And you can use a for loop to fill each offset, or just print it out to see how everything is aligned. This should not be done in a loop, but once for initialization and such.
Tell me if you have any questions or you need a more specific example.
Related
I currently created two SSBO's to handle some lights because the VS-FS in out interface can't handle a lot of lights (Im using forward shading).
For the first one I only pass the values to the shader (basically a read only one) [cpp]:
struct GLightProperties
{
unsigned int numLights;
LightProperties properties[];
};
...
glp = (GLightProperties*)malloc(sizeof(GLightProperties) + sizeof(LightProperties) * lastSize);
...
glGenBuffers(1, &ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(GLightProperties) + sizeof(LightProperties) * lastSize, glp, GL_DYNAMIC_COPY);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
Shader file [GLSL]:
layout(std430, binding = 1) buffer Lights
{
uint numLights;
LightProperties properties[];
}lights;
So this first SSBO turns out to work fine. However, in the other one, which purpose is VS-FS interface, has some issues:
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo2);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(float) * 4 * 3 * lastSize, nullptr, GL_DYNAMIC_COPY);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
GLSL:
struct TangentProperties
{
vec4 TangentLightPos;
vec4 TangentViewPos;
vec4 TangentFragPos;
};
layout(std430, binding = 0) buffer TangentSpace
{
TangentProperties tangentProperties[];
}tspace;
So here you notice I pass nullptr to the glBufferData because the vs will write in the buffer and the fs will read its contents.
Like so in the VS Stage:
for(int i = 0; i < lights.numLights; i++)
{
tspace.tangentProperties[index].TangentLightPos.xyz = TBN * lights.properties[index].lightPosition.xyz;
tspace.tangentProperties[index].TangentViewPos.xyz = TBN * camPos;
tspace.tangentProperties[index].TangentFragPos.xyz = TBN * vec3(worldPosition);
memoryBarrierBuffer();
}
After this the FS reads the values, which turn out to be just garbage. Am I doing something wrong with memory barriers?
The output turns out this way:
OK, let's get the obvious bug out of the way:
for(int i = 0; i < lights.numLights; i++)
{
tspace.tangentProperties[index].TangentLightPos.xyz = TBN * lights.properties[index].lightPosition.xyz;
tspace.tangentProperties[index].TangentViewPos.xyz = TBN * camPos;
tspace.tangentProperties[index].TangentFragPos.xyz = TBN * vec3(worldPosition);
memoryBarrierBuffer();
}
index never changes in this loop, so you're only writing a single light, and you're only writing the last lights' values. All other lights will have garbage/undefined values.
So you probably meant i rather than index.
But that's only the beginning of the problem. See, if you make that change, you get this:
for(int i = 0; i < lights.numLights; i++)
{
tspace.tangentProperties[i].TangentLightPos.xyz = TBN * lights.properties[i].lightPosition.xyz;
tspace.tangentProperties[i].TangentViewPos.xyz = TBN * camPos;
tspace.tangentProperties[i].TangentFragPos.xyz = TBN * vec3(worldPosition);
}
memoryBarrierBuffer();
Note that the barrier is outside the loop.
That creates a new problem. This code will have every vertex shader invocation writing to the same memory buffer. SSBOs, after all, are not VS output variables. Output variables are stored as part of a vertex. The rasterizer then interpolates this vertex data across the primitive as it rasterizes it, which provides the input values for the FS. So one VS cannot stomp on the output variables of another VS.
That doesn't happen with SSBOs. Every VS is acting on the same SSBO memory. So if they write to the same indices of the same array, they're writing to the same memory address. Which is a race condition (since there can be no synchronization between sibling invocations) and therefore undefined behavior.
So, the only way what you're trying to do could possibly work is if your buffer has numLights entries for each vertex in the entire scene.
This is a fundamentally unreasonable amount of space. Even if you could get it down to just the number of vertices in a particular draw call (which is doable, but I'm not going to say how), you would still be behind in performance. Every FS invocation will have to perform reads of 144 bytes of data for each light (3 table entries, one for each vertex of the triangle), linearly interpolate those values, and then do lighting computations.
It would be faster for you to just pass the TBN matrix as a VS output and do the matrix multiplies in the FS. Yes, that's a lot of matrix multiplies, but GPUs are really fast at matrix multiplies, and are really slow at reading memory.
Also, reconsider whether you need the tangent-space fragment position. Generally speaking, you never do.
I am using OpenGL library in my Visual C++ application where I want to draw say, 100 points in random locations and I would like to check if these points random co-ordinates or random locations that generated are within the screen or window boundaries. I tired using a (x,y,z) vertex option and I get the points vertical running along a line. If I try generating only (x,y) and drawing them then I do get a lot more points scattered but definitely not all 100 within the window dimensions.
my code looks something like this:
GLfloat dots_vert[99];
for (int i = 0; i < 99; i++){
if (i % 2 == 0)
dots_vert[i] = 0.0f;
else
dots_vert[i] = ((GLfloat)rand() / (GLfloat)RAND_MAX)*100.0f - (100.0f / 2);
}
glEnable(GL_POINT_SMOOTH);
glPointSize(3.0f);
glEnableClientState(GL_VERTEX_ARRAY);
GLuint vbo_ID;
glGenBuffers(1, &vbo_ID);
glBindBuffer(GL_ARRAY_BUFFER, vbo_ID);
glBufferData(GL_ARRAY_BUFFER, sizeof(dots_vert), dots_vert, GL_DYNAMIC_DRAW);
while (!GetAsyncKeyState(VK_DOWN)){
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER, vbo_ID);
glVertexAttribPointer(
0,
3,
GL_FLOAT,
GL_FALSE,
0,
(void*)0
);
glDrawArrays(GL_POINTS, 0, 100);
SwapBuffers(g_pOpenGLWindow->hDC);
Let me guide you through the glaring mistakes I can immediately see in that code.
First of all the obvious first mistake: you claim to be drawing 100 points but your dots_vert array is only 99 elements long. This is repeated in the following loop, where you go from 0 to 98 for a total of 99 times.
So first of all:
GLfloat dots_vert[100];
for (int i = 0; i < 100; ++i)
{
[...]
}
There is another huge mistake in there but we'll keep that for later, let's move on for now.
The second mistake is about the knowledge of the OpenGL API and computer graphics. First of all, your goal is to pass points to the GPU, so you need the glVertexAttribPointer function, that much you figured out. The absolute first thing you wanna do is to look at the glVertexAttribPointer reference documentation, so you have an idea of what you need. You need an index, a size, a type, a normalized flag, a stride and an offset.
Let's look at what the reference documentation says about the size parameter:
size
Specifies the number of components per generic vertex attribute. Must be 1, 2, 3, 4. Additionally, the symbolic constant GL_BGRA is accepted by glVertexAttribPointer. The initial value is 4.
This is immediately obvious to be crucial in determining what kind of data you're trying to pass to the GPU. You set the parameter to 3, which means that you have an x, a y and a z. But the previous code contradicts this. For starters, your dots_vert array is 100 elements long, and you want to draw 100 points, so you have enough for 100/100 = 1 component per point, not 3. But even worse, the inside of the for loop contradicts this even further, so let's go back and check the mistake I mentioned previously.
Mistake number three: your for loop consists of an if {} else {} statement, where you set the current element of the dots_vert array to a value of 0.0f if the index of the loop is even (if (i % 2 == 0)), and a random value between -50.0f and 50.0f otherwise. Assuming 1 component per point, this means that you're only generating the x coordinates, so you're working in a single dimension.
Clearly this is not what you intended to do, also because half of your points will be 0.0f and therefore they'll all overlap. So I assume you were trying to generate a random value for x and y, and set z to 0.0f, which would make much more sense. First of all, you have 3 components per point and therefore you'll need an array with 100*3 = 300 elements. So first of all, let's fix the previous code:
GLfloat dots_vert[300];
for (int i = 0; i < 300; ++i)
{
[...]
}
Much better. Now we need to generate a random x and y valye for each point, and set z to 0.0f since we don't need it. You wanna do all of the components at once in a single loop, so you want your loop to step by 3, not 1, so once again let's fix the previous code:
GLfloat dots_vert[300];
for (int i = 0; i < 300; i += 3)
{
[...]
}
Now we can generate x, y and z together in a single loop. This is the crucial part where understanding how computer graphics work, specifically in the context of the OpenGL API. OpenGL uses a coordinate system where the origin is in the middle of the screen, the x axis moves horizontally (positive x points to your right), the y axis moves vertically (positive y points up), and the z axis goes straight through the screen (positive z points out of the screen, towards you). Now this is the very important part: x, y and z are clipped to a specific range of values; anything outside of this range is ignored. For all coordinates, the range goes from -1.0f to 1.0f. Anything below of above that is not drawn at all.
So if you want to have 100 points to be inside the screen, ignoring projection which is outside of the scope of this exercise, you want to generate x and y in the -1.0f to 1.0f range, not -50.0f to 50.0f like you're doing there. You can keep z to 0.0f, doesn't really matter in this case. This is why most of your points fall outside of the screen: with that range, statistically speaking, around 98% of your points will fall outside of the clip space and will be ignored.
So ultimately this is what you want:
GLfloat dots_vert[300];
for (int i = 0; i < 300; i += 3)
{
dots_vert[i] = ((GLfloat)rand() / (GLfloat)RAND_MAX)*2.0f - 1.0f; // this is x
dots_vert[i+1] = ((GLfloat)rand() / (GLfloat)RAND_MAX)*2.0f - 1.0f; // this is y
dots_vert[i+2] = 0.0f; // this is z
}
Finally a reminder: when you do glDrawArrays(GL_POINTS, 0, 100); you're telling the GPU to draw 100 points. Each point is made of however many components you specified in the size parameter of the glVertexAttribPointer function. In this case you wanna draw 100 points, each point is made of 3 components, so the GPU expects an array of 100*3 = 300 floats. numbers. Anything less could result in either a segmentation fault or even worse an undefined behavior (which means anything can happen), so pay close attention to what you're doing and make sure you know exactly what kind of data you're passing to the GPU because you might end up with a nonsense result and you'll be stuck trying to figure out what went wrong. In this case, you have basically no code at all to check so it's easy to fix, but when you'll end up with a decent amount of code (and you will eventually), an error like this could mean hours or even days wasted trying to find the error.
As a bonus, feel free to ignore this one: technically a point is made of 4 components. This component is called w and its use is outside of the scope of this exercise so don't worry about it, just remember that it should always be set to 1.0f, unless you are doing projection.
So technically you could do this too:
GLfloat dots_vert[400];
for (int i = 0; i < 400; i += 4)
{
dots_vert[i] = ((GLfloat)rand() / (GLfloat)RAND_MAX)*2.0f - 1.0f; // this is x
dots_vert[i+1] = ((GLfloat)rand() / (GLfloat)RAND_MAX)*2.0f - 1.0f; // this is y
dots_vert[i+2] = 0.0f; // this is z
dots_vert[i+3] = 1.0f; // this is w
}
Then you set the size parameter of glVertexAttribPointer to 4 instead of 3, the result should be exactly the same.
I'm trying my hand at shader storage buffer objects (aka Buffer Blocks) and there are a couple of things I don't fully grasp. What I'm trying to do is to store the (simplified) data of an indeterminate number of lights n in them, so my shader can iterate through them and perform calculations.
Let me start by saying that I get the correct results, and no errors from OpenGL. However, it bothers me not to know why it is working.
So, in my shader, I got the following:
struct PointLight {
vec3 pos;
float intensity;
};
layout (std430, binding = 0) buffer PointLights {
PointLight pointLights[];
};
void main() {
PointLight light;
for (int i = 0; i < pointLights.length(); i++) {
light = pointLights[i];
// etc
}
}
and in my application:
struct PointLightData {
glm::vec3 pos;
float intensity;
};
class PointLight {
// ...
PointLightData data;
// ...
};
std::vector<PointLight*> pointLights;
glGenBuffers(1, &BBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, BBO);
glNamedBufferStorage(BBO, n * sizeof(PointLightData), NULL, GL_DYNAMIC_STORAGE_BIT);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, BBO);
...
for (unsigned int i = 0; i < pointLights.size(); i++) {
glNamedBufferSubData(BBO, i * sizeof(PointLightData), sizeof(PointLightData), &(pointLights[i]->data));
}
In this last loop I'm storing a PointLightData struct with an offset equal to its size times the number of them I've already stored (so offset 0 for the first one).
So, as I said, everything seems correct. Binding points are correctly set to the zeroeth, I have enough memory allocated for my objects, etc. The graphical results are OK.
Now to my questions. I am using std430 as the layout - in fact, if I change it to std140 as I originally did it breaks. Why is that? My hypothesis is that the layout generated by std430 for the shader's PointLights buffer block happily matches that generated by the compiler for my application's PointLightData struct (as you can see in that loop I'm blindingly storing one after the other). Do you think that's the case?
Now, assuming I'm correct in that assumption, the obvious solution would be to do the mapping for the sizes and offsets myself, querying opengl with glGetUniformIndices and glGetActiveUniformsiv (the latter called with GL_UNIFORM_SIZE and GL_UNIFORM_OFFSET), but I got the sneaking suspicion that these two guys only work with Uniform Blocks and not Buffer Blocks like I'm trying to do. At least, when I do the following OpenGL throws a tantrum, gives me back a 1281 error and returns a very weird number as the indices (something like 3432898282 or whatever):
const char * names[2] = {
"pos", "intensity"
};
GLuint indices[2];
GLint size[2];
GLint offset[2];
glGetUniformIndices(shaderProgram->id, 2, names, indices);
glGetActiveUniformsiv(shaderProgram->id, 2, indices, GL_UNIFORM_SIZE, size);
glGetActiveUniformsiv(shaderProgram->id, 2, indices, GL_UNIFORM_OFFSET, offset);
Am I correct in saying that glGetUniformIndices and glGetActiveUniformsiv do not apply to buffer blocks?
If they do not, or the fact that it's working is like I imagine just a coincidence, how could I do the mapping manually? I checked appendix H of the programming guide and the wording for array of structures is somewhat confusing. If I can't query OpenGL for sizes/offsets for what I'm tryind to do, I guess I could compute them manually (cumbersome as it is) but I'd appreciate some help in there, too.
I'm currently working on a program which supports depth-independent (also known as order-independent) alpha blending. To do that, I implemented a per-pixel linked list, using a texture for the header (points for every pixel to the first entry in the linked list) and a texture buffer object for the linked list itself. While this works fine, I would like to exchange the texture buffer object with a shader storage buffer as an excercise.
I think I almost got it, but it took me about a week to get to a point where I could actually use the shader storage buffer. My question are:
Why I can't map the shader storage buffer?
Why is it a problem to bind the shader storage buffer again?
For debugging, I just display the contents of the shader storage buffer (which doesn't contain a linked list yet). I created the shader storage buffer in the following way:
glm::vec4* bufferData = new glm::vec4[windowOptions.width * windowOptions.height];
glm::vec4* readBufferData = new glm::vec4[windowOptions.width * windowOptions.height];
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
// Set the whole buffer to red
bufferData[x + y * windowOptions.width] = glm::vec4(1,0,0,1);
}
}
GLuint ssb;
// Get a handle
glGenBuffers(1, &ssb);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssb);
// Create buffer
glBufferData(GL_SHADER_STORAGE_BUFFER, windowOptions.width * windowOptions.height * sizeof(glm::vec4), bufferData, GL_DYNAMIC_COPY);
// Now bind the buffer to the shader
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssb);
In the shader, the shader storage buffer is defined as:
layout (std430, binding = 0) buffer BufferObject
{
vec4 points[];
};
In the rendering loop, I do the following:
glUseProgram(defaultProgram);
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
// Create a green/red color gradient
bufferData[x + y * windowOptions.width] =
glm::vec4((float)x / (float)windowOptions.width,
(float)y / (float)windowOptions.height, 0.0f, 1.0f);
}
}
glMemoryBarrier(GL_ALL_BARRIER_BITS); // Don't know if this is necessary, just a precaution
glBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height * sizeof(glm::vec4), bufferData);
// Retrieving the buffer also works fine
// glMemoryBarrier(GL_ALL_BARRIER_BITS);
// glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height * sizeof(glm::vec4), readBufferData);
glMemoryBarrier(GL_ALL_BARRIER_BITS); // Don't know if this is necessary, just a precaution
// Draw a quad which fills the screen
// ...
This code works, but when I replace glBufferSubData with the following code,
glm::vec4* p = (glm::vec4*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height, GL_WRITE_ONLY);
for(unsigned int x = 0; x < windowOptions.width; ++x)
{
for(unsigned int y = 0; y < windowOptions.height; ++y)
{
p[x + y * windowOptions.width] = glm::vec4(0,1,0,1);
}
}
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
the mapping fails, returning GL_INVALID_OPERATION. It seems like the shader storage buffer is still bound to something, so it can't be mapped. I read something about glGetProgramResourceIndex (http://www.opengl.org/wiki/GlGetProgramResourceIndex) and glShaderStorageBlockBinding (http://www.opengl.org/wiki/GlShaderStorageBlockBinding), but I don't really get it.
My second question is, why I can neither call
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssb);
, nor
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssb);
in the render loop after glBufferSubData and glMemoryBarrier. This code should not change a thing, since these calls are the same as during the creation of the shader storage buffer. If I can't bind different shader storage buffers, I can only use one. But I know that more than one shader storage buffer is supported, so I think I'm missing something else (like "releasing" the buffer).
First of all, the glMapBufferRange fails simply because GL_WRITE_ONLY is not a valid argument to it. That was used for the old glMapBuffer, but glMapBufferRange uses a collection of flags for more fine-grained control. In your case you need GL_MAP_WRITE_BIT instead. And since you seem to completely overwrite the whole buffer, without caring for the previous values, an additional optimization would probably be GL_MAP_INVALIDATE_BUFFER_BIT. So replace that call with:
glm::vec4* p = (glm::vec4*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0,
windowOptions.width * windowOptions.height,
GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
The other error is not described that well in the question. But fix this one first and maybe it will already help with the following error.
What i have now
#define QUAD_VERT_COUNT 4
#define QUAD_POS_COMP 3
typedef struct quad_pos
{
GLfloat x, y, z;
}quad_pos;
#define SIZE_QUAD_POS = sizeof(quad_pos) * QUAD_VERT_COUNT
static QUAD_BUFFER = 0;
void init_quad_buffer()
{
quad_pos* pos_data = malloc(SIZE_QUAD_POS);
pos_data[0].x = -1.0f;
pos_data[0].y = -1.0f;
pos_data[0].z = 0.0f;
pos_data[1].x = 1.0f;
pos_data[1].y = -1.0f;
pos_data[1].z = 0.0f;
pos_data[2].x = -1.0f;
pos_data[2].y = 1.0f;
pos_data[2].z = 0.0f;
pos_data[3].x = 1.0f;
pos_data[3].y = 1.0f;
pos_data[3].z = 0.0f;
QUAD_BUFFER = create_buffer(GL_ARRAY_BUFFER, GL_STATIC_DRAW, pos_data, SIZE_QUAD_POS);
free(pos_data);
}
void get_quad_buffer
{
return QUAD_BUFFER;
}
And drawning (part of it)
glBindBuffer(GL_ARRAY_BUFFER, get_quad_buffer());
glEnableVertexAttribArray(ss->attrib[0]);//attrib[o] is vertex pos
glVertexAttribPointer(ss->attrib[0], QUAD_POS_COMP, GL_FLOAT, GL_FALSE, 0, 0);
glDrawArrays(GL_TRIANGLE_STRIP, 0, QUAD_VERT_COUNT);
Scaling, translating and rotating achieved with matrices and shaders, so yes this buffer never changes for every sprite.
But why we need to use GL_float for just -1.0, 1.0? GL_Byte will be enough.
typedef struct quad_pos
{
GLbyte x, y, z;
}quad_pos;
void init_quad_buffer()
{
quad_pos* pos_data = malloc(SIZE_QUAD_POS);
pos_data[0].x = -1;
pos_data[0].y = -1;
pos_data[0].z = 0;
....
}
Drawning
...
glVertexAttribPointer(ss->attrib[0], QUAD_POS_COMP, GL_BYTE, GL_FALSE, 0, 0);
glDrawArrays(GL_TRIANGLE_STRIP, 0, QUAD_VERT_COUNT);
Question 1: Do i need normalize set to GL_TRUE?
Question 2: GLclampf and GLfloat both 4 byted floats, but color values are from 0.0 to 1.0 so if i put them in GLbyte too (val/256, so 255 for 1.0, 128 for 0.5, 0 for 0) do i need GL_TRUE for normalize in glVertexAttribPointer?
Question 3: Do i really need padding in vertex data/other data? Adding fictitious pos_data.g, just for sizeof(pos_data) = 16 == Good for gpu?
In general, you could always aim for the half-float (16bit float) extensions to save memory.
Your implementation looks like causing some draw-call overhead. Normalizing (on the fly!) will cause additional overhead. For drawing multiple instances of this constant quad, I recommend the following to speed things up:
Implementation of a geometry-shader; let it generate, transform and emit the 4 vertices of the quad for you.
instanced drawing with a transform-buffer using a texture buffer object (TBO) containing the transform matrices for each quad instance (each matrix column will be accessed using the builtin uniform 'gl_InstanceID').
OR:
Supply the matrices via vertex attribute arrays (probably faster).
These two approaches can be implemented upon the same buffer data layout (just an array of matrices)
But why we need to use GL_float for just -1.0, 1.0? GL_Byte will be enough.
Please note this is not true in general, in most cases you will need a float for precision. And if you only have so few values and so simple geometry, the odds are quite high that there even isn't a reason at all to optimize it to glByte in the first place. You likely have very few vertices at all, so why would you want to save storage on them? This sounds like a very good example of premature optimization (I know, it's an overused term).
Now, for your actual questions:
No, not if you want the same functionality, if normalize is false, the -1 will convert to -1.0f, if it is true it will be more something like -0.0078125f (or -1/128.0f). So if you want to keep the same scale, you don't want it normalized.
Where do you get the idea that GLclampf and GLfloat are 8-byte floats? They are usually 4 byte floats. If you want to pass in RGB colors through vertex attributes, yes you should normalize them as OpenGL expects color components to be in the range [0.0f,1.0f]. But again: why don't you simply pass them as floats? What do you think to gain? In a simple game you probably have not enough colors to notice the difference and in a non-simple game you're more likely to be using textures.
Of this I am not sure. I know it was true for old GPU's (and I mean almost 10y back), but I don't know of any recent claims that this would actually improve something. And in any case, the best-known alignment was to prop all vertex-attributes for one vertex together into (a multiple of) 32 bytes, and that was for ATI cards. Byte alignment might be necessary for some trickier things/extensions, but I do not think you need to worry about it just yet.