glMapBufferRange persistent buffers and interleaved data - c++

Ok,after a longer break I am a bit on a loss here.
I am trying to use a persistent mapped buffer, split to 2 buffers to improve performance. If using it as one buffer, everything works fine, once using it as 2, it fails miserably. I am not sure what I am missing, but maybe someone can enlighten me. I read quite some tutorials and descriptions, but so far no luck understanding it any better or finding the answer to these questions.
To my setup, I have the following data:
Buffer[0] = Point.X;
Buffer[1] = Point.Y;
Buffer[2] = Point.Z;
Buffer[3] = U;
Buffer[4] = V;
Buffer[5] = Color.X;
Buffer[6] = Color.Y;
Buffer[7] = Color.Z;
Buffer[8] = Color.W;
in my approach without persistent buffers, I do this:
...
glBufferData(GL_ARRAY_BUFFER, sizeof(float) * TotalSize, Buffer, GL_STREAM_DRAW);
...
glVertexAttribPointer(VERTEX_COORD_ATTRIB, 3, GL_FLOAT, GL_FALSE, StrideSize, 0);
glVertexAttribPointer(TEXTURE_COORD_ATTRIB, 2, GL_FLOAT, GL_FALSE, StrideSize, (void*)VertFloatSize);
glVertexAttribPointer(COLOR_ATTRIB, 4, GL_FLOAT, GL_FALSE, StrideSize, (void*)(VertFloatSize+TexFloatSize));
...
glDrawArrays(Mode, 0, (BufferData.VertSize / FloatsPerVertex));
...
Now to the persistent part (I am leaving out fences and sync stuff here, since this doesn't seem to be a problem.):
GLbitfield PersistentBufferFlags = GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT;
GLsizeiptr BufferSize=(2 * BUFFER_SIZE);
glBindBuffer(GL_ARRAY_BUFFER, PMB);
glBufferStorage(GL_ARRAY_BUFFER, BufferSize, 0, PersistentBufferFlags);
Buffer = (float*)glMapBufferRange(GL_ARRAY_BUFFER, 0, BufferSize, PersistentBufferFlags);
Now this already my first question- is this the right approach? Map it with one and use the offset of the size of the first buffer run to continue then from there with the data of the 2nd or should it be rather 2 mappings of the same storage like this? :
glBindBuffer(GL_ARRAY_BUFFER, PMB);
glBufferStorage(GL_ARRAY_BUFFER, BufferSize, 0, PersistentBufferFlags);
Buffer[0] = (float*)glMapBufferRange(GL_ARRAY_BUFFER, 0, BUFFER_SIZE, PersistentBufferFlags)
Buffer[1] = (float*)glMapBufferRange(GL_ARRAY_BUFFER, BUFFER_SIZE, BUFFER_SIZE, PersistentBufferFlags)
I found no hint whats to be preferred or what's right or even wrong. How is it correctly split?
Then trying to draw with
glDrawArrays(Mode, 0, (BufferData.VertSize / FloatsPerVertex)); //Draw "first" buffer, everything works if only this one.
...next round...
glDrawArrays(Mode, offset_of_datasize_first_buffer, (BufferData.VertSize / FloatsPerVertex)); //Second part of the buffer, fails.
Not entirely sure here about the 2nd parameter, I find the explanation of it confusing:
Specifies the starting index in the enabled arrays.
Am I right that this should be too the offset size of the first buffer then in the second run?
Other than that I use the same setup as above, just without the glBufferData.
So anyway, no matter which version above I use, it works if I use it only as 1 buffer, but if using it as 2, it very much looks like it ignores the data in the second part of the buffer.
So this is my next question- is it even allowed to use a persistent buffer like this? Does it respect the interleaved setup done with glVertexAttribPointer or is it plain for verts only and solely?
Let me know if you need any more information or if something is not clear. Many thanks.

The second parameter of glDrawArrays() isn't an offset, but an index to the first vertex, like when using indexed rendering with glDrawElements(). Use offset_in_bytes/stride_in_bytes to get the correct value, or actually compute BUFFER_SIZE has a function of stride and use that value for glDrawArrays().

Second parameter is almost definitely giving you trouble. Is offset_of_datasize_first_buffer an index or byte offset? Try offset_of_datasize_first_buffer / FloatsPerVertex.
Also, you can rebind your attribute pointers and add BUFFER_SIZE to the last parameter to set 0 index at the start of the second buffer.
Try to make sure that offset_of_datasize_first_buffer is an index type and not byte offset first before trying to switch your attribute pointers. That should be faster.

Related

glGetBufferSubData and glMapBufferRange for GL_SHADER_STORAGE_BUFFER very slow on NVIDIA GTX960M

I've been having some issues with transfering a GPU buffer into CPU for performing sorting operations. The buffer is a GL_SHADER_STORAGE_BUFFER composed of 300.000 float values. The transfer operation with glGetBufferSubData is taking around 10ms, and with glMapBufferRange, it takes more than 100 ms.
The code Im using is the following:
std::vector<GLfloat> viewRow;
unsigned int viewRowBuffer = -1;
int length = -1;
void bindRowBuffer(unsigned int buffer){
glBindBuffer(GL_SHADER_STORAGE_BUFFER, buffer);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 3, buffer);
}
void initRowBuffer(unsigned int &buffer, std::vector<GLfloat> &row, int lengthIn){
// Generate and initialize buffer
length = lengthIn;
row.resize(length);
memset(&row[0], 0, length*sizeof(float));
glGenBuffers(1, &buffer);
bindRowBuffer(buffer);
glBufferStorage(GL_SHADER_STORAGE_BUFFER, row.size() * sizeof(float), &row[0], GL_DYNAMIC_STORAGE_BIT | GL_MAP_READ_BIT | GL_MAP_WRITE_BIT);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
}
void cleanRowBuffer(unsigned int buffer) {
float zero = 0.0;
glClearNamedBufferData(buffer, GL_R32F, GL_RED, GL_FLOAT, &zero);
}
void readGPUbuffer(unsigned int buffer, std::vector<GLfloat> &row) {
glGetBufferSubData(GL_SHADER_STORAGE_BUFFER,0,length *sizeof(float),&row[0]);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
}
void readGPUMapBuffer(unsigned int buffer, std::vector<GLfloat> &row) {
float* data = (float*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, length*sizeof(float), GL_MAP_READ_BIT); glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
memcpy(&row[0], data, length *sizeof(float));
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
}
The main is doing:
bindRowBuffer(viewRowBuffer);
cleanRowBuffer(viewRowBuffer);
countPixs.bind();
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, gPatch);
countPixs.setInt("gPatch", 0);
countPixs.run(SCR_WIDTH/8, SCR_HEIGHT/8, 1);
countPixs.unbind();
readGPUbuffer(viewRowBuffer, viewRow);
Where countPixs is a compute shader, but I'm possitive the problem is not there because if I comment the run command, the read takes exactly the same amount of time.
The weird thing is that if I execute a getbuffer of only 1 float:
glGetBufferSubData(GL_SHADER_STORAGE_BUFFER,0, 1 *sizeof(float),&row[0]);
It takes exactly the same time... so I'm guessing there is something wrong all-the-way... maybe related to the GL_SHADER_STORAGE_BUFFER?
This is likely to be an GPU-CPU synchronization/round trip caused delay.
I.e. once you map your buffer the previous GL command(s) which touched the buffer needs to complete immediately causing pipeline stall.
Note that drivers are lazy: it is very probable GL commands have not even started executing yet.
If you can: glBufferStorage(..., GL_MAP_PERSISTENT_BIT) and map the buffer persistently. This avoids completely re-mapping and allocation of any GPU memory and you can keep the mapped pointer over draw calls with some caveats:
You likely also need GPU fences to detect/wait when the data is actually available from GPU. (Unless you like reading garbace.)
The mapped buffer can't be resized. (since you already use glBufferStorage() you are ok)
It is probably good idea to combine GL_MAP_PERSISTENT_BIT with GL_MAP_COHERENT_BIT
After reading GL 4.5 docs bit more I found out that glFenceSync is mandatory in order to guarantee the data has arrived from the GPU, even with GL_MAP_COHERENT_BIT:
If GL_MAP_COHERENT_BIT is set and the server does a write, the app must call glFenceSync with GL_SYNC_GPU_COMMANDS_COMPLETE (or
glFinish). Then the CPU will see the writes after the sync is
complete.

glMapBuffer returns NULL on Mac, works on Windows just fine

I'm having a weird issue with trying to move my working game code from Windows to Mac. I'm using SDL2 and OpenGL.
The game is a simple 2D game, basically only doing 2D quad/sprite rendering. The rendering architecture is simple. I'm using a single static Element Array Buffer that is prefilled at startup (6 indices, 4 vertices per quad), and every frame I push a new VBO with the sprite data (4 vertices per sprite, each containing xy position, color to modulate, and uv texture coordinates).
I'm doing this by calling glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY) at the start of the frame, push onto the given "array" the sprite data, and then glUnmapBuffer(GL_ARRAY_BUFFER) at the end before calling SDL_GL_SwapWindow.
All seems to work just fine on Windows, however when I tried compiling and running on a Mac, it complied just fine, but whenever I call glMapBuffer it returns NULL.
I tried looking for GL errors, but no luck. Calling glGetError() does not help as it returns 0.
It's possible that I have an issue somewhere and actually doing something wrong also on Windows (as the code is literally the same). Could be that the driver on Windows is just more lenient and "let's the error slide" but on my Mac it can't.
I'm literally stumped... I don't know where to go from here. I tried littering my code with glGetError() but could not find a non zero return from it anywhere I tried.
Here is the setup code if it helps:
glGenVertexArrays(1, &overlay_vao);
glBindVertexArray(overlay_vao);
glGenBuffers(1, &overlay_vbo);
glBindBuffer(GL_ARRAY_BUFFER, overlay_vbo);
glBufferData(GL_ARRAY_BUFFER, MAX_OVERLAY_ELEMENTS * 4 * 8 * 4, 0, GL_STREAM_DRAW);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 8 * 4, 0);
glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 8 * 4, (GLvoid*)(2 * 4));
glVertexAttribPointer(2, 2, GL_FLOAT, GL_FALSE, 8 * 4, (GLvoid*)(6 * 4));
glEnableVertexAttribArray(0);
glEnableVertexAttribArray(1);
glEnableVertexAttribArray(2);
GLuint overlay_element_buffer;
glGenBuffers(1, &overlay_element_buffer);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, overlay_element_buffer);
Uint8 *indicesBytes = AcquireTempMemory();
Uint16 *indices = (Uint16*)indicesBytes;
for (int i = 0; i < MAX_OVERLAY_ELEMENTS; i++)
{
indices[6 * i + 0] = 4 * i + 0;
indices[6 * i + 1] = 4 * i + 1;
indices[6 * i + 2] = 4 * i + 2;
indices[6 * i + 3] = 4 * i + 0;
indices[6 * i + 4] = 4 * i + 2;
indices[6 * i + 5] = 4 * i + 3;
}
glBufferData(GL_ELEMENT_ARRAY_BUFFER, MAX_OVERLAY_ELEMENTS * 6 * 2, indicesBytes, GL_STATIC_DRAW);
AcquireTempMemory is basically not much more than a malloc, and I validated that it allocates fine, and the array is filled as expected (on both versions).
On start of every frame, I bind the VAO and the shader program (even though there is only one of each anyway), set some uniforms, and map the buffer:
glEnable(GL_BLEND);
glUseProgram(renderState.overlayShaderProgram);
glBindVertexArray(renderState.overlayVao);
glUniform1f(renderState.xMultUniformLocation, 1.0f / renderState.aspectRatio);
glUniform1i(renderState.textureUniformLocation, 0);
renderState.overlayRects = 0;
renderState.overlayBuffer = glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);
Then once all sprites have been pushed:
glUnmapBuffer(GL_ARRAY_BUFFER);
glDrawElements(GL_TRIANGLES, renderState.overlayRects * 6, GL_UNSIGNED_SHORT, (const GLvoid*)0);
(after that there is a call to SDL_GL_SwapWindow)
I'm not sure if it's relevant, but I'm getting the GL functions from SDL_GL_GetProcAddress like this:
glMapBuffer = (glMapBuffer_TYPE)SDL_GL_GetProcAddress("glMapBuffer");
glUnmapBuffer = (glUnmapBuffer_TYPE)SDL_GL_GetProcAddress("glUnmapBuffer");
I'm really stuck... has anyone ever seen something like this, or can point me to something I'm doing wrong?
If you are using OpenGL 3.3, this means that your OSX version provides OGL function pointers on its own. No need to retrieve them by any GetProcAddress call.
Even worst, you are using the same names as the GL API ones: glMapBuffer instead of something near such as myglMapBuffer. And Apple tells against this:
From Apple doc
Note that each function pointer uses the prefix pf to distinguish it
from the function it points to. Although using this prefix is not a
requirement, it's best to avoid using the exact function names.
In order to use the same code for Windows, Linux and OSX I recommend to warp not OSX code with #ifndef like this:
#ifndef __APPLE__
glMapBuffer = (glMapBuffer_TYPE)SDL_GL_GetProcAddress("glMapBuffer");
glUnmapBuffer = (glUnmapBuffer_TYPE)SDL_GL_GetProcAddress("glUnmapBuffer");
//etc
#endif
Also, you may need this Apple test (if SDL doesn't take it into account):
#ifdef __APPLE__
#define GL_DO_NOT_WARN_IF_MULTI_GL_VERSION_HEADERS_INCLUDED
#include <OpenGL/gl3.h>
#endif

Bugs with loading obj file with c++ and opengl

I wrote obj loader and got following:
It is yellow eagle but as you see it has some additional triangles that go from its leg to wing. The code that I used:
{....
glBindBuffer(GL_ARRAY_BUFFER,vbo);
glBufferData(GL_ARRAY_BUFFER,sizeof(data),data,GL_STATIC_DRAW);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER,numOfIndices*sizeof(GLuint),indices,GL_STATIC_DRAW);
}
void Mesh::draw( )
{
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER,vbo);
glVertexAttribPointer(
0, // attribute 0. No particular reason for 0, but must match the layout in the shader.
3, // size
GL_FLOAT, // type
GL_FALSE, // normalized?
0, // stride
(void*)0 // array buffer offset
);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,ibo);
glDrawElements(GL_TRIANGLES,numOfIndices,GL_UNSIGNED_INT,(void*)0 );
glDisableVertexAttribArray(0);
}
Where data is array of vertices and indices is array of indices.
When I take and save data and indices in obj format and open resulting file in 3D editor eagle looks fine and doesn't have these additional triangles (that implies that both data and indices are fine).
I spent hours trying to to fix code and make eagle look normal but now I run out of ideas. So please if you have any ideas how to make eagle normal share them with me.
For those who think the problem is in loader here is screen of obj model that is made out of data from loader (from data[] and indices[])
Finally found solution.
Indexing in obj. format starts at 1 (not 0 ) and when you load vertices to GL_ARRAY_BUFFER vertex #1 becomes vertex#0 and whole indexing breaks.
Therefore it is necessary to decrease all values of indices by 1 and then index that pointed to vertex #1 will point to vertex #0 and indexing will become correct.

Can I call `glDrawArrays` multiple times while updating the same `GL_ARRAY_BUFFER`?

In a single frame, is it "allowed" to update the same GL_ARRAY_BUFFER continuously and keep calling glDrawArrays after each update?
I know this is probably not the best and not the most recommended way to do it, but my question is: Can I do this and expect to get the GL_ARRAY_BUFFER updated before every call to glDrawArrays ?
Code example would look like this:
// setup a single buffer and bind it
GLuint vbo;
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
while (!renderStack.empty())
{
SomeObjectClass * my_object = renderStack.back();
renderStack.pop_back();
// calculate the current buffer size for data to be drawn in this iteration
SomeDataArrays * subArrays = my_object->arrayData();
unsigned int totalBufferSize = subArrays->bufferSize();
unsigned int vertCount = my_object->vertexCount();
// initialise the buffer to the desired size and content
glBufferData(GL_ARRAY_BUFFER, totalBufferSize, NULL, GL_STREAM_DRAW);
// actually transfer some data to the GPU through glBufferSubData
for (int j = 0; j < subArrays->size(); ++j)
{
unsigned int subBufferOffset = subArrays->get(j)->bufferOffset();
unsigned int subBufferSize = subArrays->get(j)->bufferSize();
void * subBufferData = subArrays->get(j)->bufferData();
glBufferSubData(GL_ARRAY_BUFFER, subBufferOffset, subBufferSize, subBufferData);
unsigned int subAttributeLocation = subArrays->get(j)->attributeLocation();
// set some vertex attribute pointers
glVertexAttribPointer(subAttributeLocation, ...);
glEnableVertexAttribArray(subAttributeLocation, ...);
}
glDrawArrays(GL_POINTS, 0, (GLsizei)vertCount);
}
You may ask - why would I want to do that and not just preload everything onto the GPU at once ... well, obvious answer, because I can't do that when there is too much data that can't fit into a single buffer.
My problem is, that I can only see the result of one of the glDrawArrays calls (I believe the first one) or in other words, it appears as if the GL_ARRAY_BUFFER is not updated before each glDrawArrays call, which brings me back to my question, if this is even possible.
I am using an OpenGL 3.2 CoreProfile (under OS X) and link with GLEW for OpenGL setup as well as Qt 5 for setting up the window creation.
Yes, this is legal OpenGL code. It is in no way something that anyone should ever actually do. But it is legal. Indeed, it makes even less sense in your case, because you're calling glVertexAttribPointer for every object.
If you can't fit all your vertex data into memory, or need to generate it on the GPU, then you should stream the data with proper buffer streaming techniques.

Learning to use VBOs properly

So I've been trying to teach myself to use VBOs, in order to boost the performance of my OpenGL project and learn more advanced stuff than fixed-function rendering. But I haven't found much in the way of a decent tutorial; the best ones I've found so far are Songho's tutorials and the stuff at OpenGL.org, but I seem to be missing some kind of background knowledge to fully understand what's going on, though I can't tell exactly what it is I'm not getting, save the usage of a few parameters.
In any case, I've forged on ahead and come up with some cannibalized code that, at least, doesn't crash, but it leads to bizarre results. What I want to render is this (rendered using fixed-function; it's supposed to be brown and the background grey, but all my OpenGL screenshots seem to adopt magenta as their favorite color; maybe it's because I use SFML for the window?).
What I get, though, is this:
I'm at a loss. Here's the relevant code I use, first for setting up the buffer objects (I allocate lots of memory as per this guy's recommendation to allocate 4-8MB):
GLuint WorldBuffer;
GLuint IndexBuffer;
...
glGenBuffers(1, &WorldBuffer);
glBindBuffer(GL_ARRAY_BUFFER, WorldBuffer);
int SizeInBytes = 1024 * 2048;
glBufferData(GL_ARRAY_BUFFER, SizeInBytes, NULL, GL_STATIC_DRAW);
glGenBuffers(1, &IndexBuffer);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, IndexBuffer);
SizeInBytes = 1024 * 2048;
glBufferData(GL_ELEMENT_ARRAY_BUFFER, SizeInBytes, NULL, GL_STATIC_DRAW);
Then for uploading the data into the buffer. Note that CreateVertexArray() fills the vector at the passed location with vertex data, with each vertex contributing 3 floats for position and 3 floats for normal (one of the most confusing things about the various tutorials was what format I should store and transfer my actual vertex data in; this seemed like a decent approximation):
std::vector<float>* VertArray = new std::vector<float>;
pWorld->CreateVertexArray(VertArray);
unsigned short Indice = 0;
for (int i = 0; i < VertArray->size(); ++i)
{
std::cout << (*VertArray)[i] << std::endl;
glBufferSubData(GL_ARRAY_BUFFER, i * sizeof(float), sizeof(float), &((*VertArray)[i]));
glBufferSubData(GL_ELEMENT_ARRAY_BUFFER, i * sizeof(unsigned short), sizeof(unsigned short), &(Indice));
++Indice;
}
delete VertArray;
Indice -= 1;
After that, in the game loop, I use this code:
glBindBuffer(GL_ARRAY_BUFFER, WorldBuffer);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, IndexBuffer);
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(3, GL_FLOAT, 0, 0);
glNormalPointer(GL_FLOAT, 0, 0);
glDrawElements(GL_TRIANGLES, Indice, GL_UNSIGNED_SHORT, 0);
glDisableClientState(GL_VERTEX_ARRAY);
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
I'll be totally honest - I'm not sure I understand what the third parameter of glVertexPointer() and glNormalPointer() ought to be (stride is the offset in bytes, but Songho uses an offset of 0 bytes between values - what?), or what the last parameter of either of those is. The initial value is said to be 0; but it's supposed to be a pointer. Passing a null pointer in order to get the first coordinate/normal value of the array seems bizarre. This guy uses BUFFER_OFFSET(0) and BUFFER_OFFSET(12), but when I try that, I'm told that BUFFER_OFFSET() is undefined.
Plus, the last parameter of glDrawElements() is supposed to be an address, but again, Songho uses an address of 0. If I use &IndexBuffer instead of 0, I get a blank screen without anything rendering at all, except the background.
Can someone enlighten me, or at least point me in the direction of something that will help me figure this out? Thanks!
The initial value is said to be 0; but it's supposed to be a pointer.
The context (not meaning the OpenGL one) matters. If one of the gl*Pointer functions is called with no Buffer Object being bound to GL_ARRAY_BUFFER, then it is a pointer into client process address space. If a Buffer Object is bound to GL_ARRAY_BUFFER it's an offset into the currently bound buffer object (you may thing the BO forming a virtual address space, to which the parameter to gl*Pointer is then an pointer into that server side address space).
Now let's have a look at your code
std::vector<float>* VertArray = new std::vector<float>;
You shouldn't really mix STL containers and new, learn about the RAII pattern.
pWorld->CreateVertexArray(VertArray);
This is problematic, since you'll delete VertexArray later on, leaving you with a dangling pointer. Not good.
unsigned short Indice = 0;
for (int i = 0; i < VertArray->size(); ++i)
{
std::cout << (*VertArray)[i] << std::endl;
glBufferSubData(GL_ARRAY_BUFFER, i * sizeof(float), sizeof(float), &((*VertArray)[i]));
You should submit large batches of data with glBufferSubData, not individual data points.
glBufferSubData(GL_ELEMENT_ARRAY_BUFFER, i * sizeof(unsigned short), sizeof(unsigned short), &(Indice));
You're passing just incrementing indices into the GL_ELEMENT_ARRAY_BUFFER, thus enumerating the vertices. Why? You can have this, without the extra work using glDrawArrays insteaf of glDrawElements.
++Indice;
}
delete VertArray;
You're deleting VertArray, thus keeping a dangling pointer.
Indice -= 1;
Why didn't you just use the loop counter i?
So how to fix this? Like this:
std::vector<float> VertexArray;
pWorld->LoadVertexArray(VertexArray); // World::LoadVertexArray(std::vector<float> &);
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(float)*VertexArray->size(), &VertexArray[0] );
And using glDrawArrays; of course if you're not enumerating vertices, but have a list of faces→vertex indices, using a glDrawElements is mandatory.
Don't call glBufferSubData for each vertex. It misses the point of VBO. You are supposed to create big buffer of your vertex data, and then pass it to OpenGL in a single go.
Read http://www.opengl.org/sdk/docs/man/xhtml/glVertexPointer.xml
When using VBOs those pointers are relative to VBO data. That's why it's usually 0 or small offset value.
stride = 0 means the data is tightly packed and OpenGL can calculate the stride from other parameters.
I usually use VBO like this:
struct Vertex
{
vec3f position;
vec3f normal;
};
Vertex[size] data;
...
glBufferData(GL_ARRAY_BUFFER, size*sizeof(Vertex), data, GL_STATIC_DRAW);
...
glVertexPointer(3,GL_FLOAT,sizeof(Vertex),offsetof(Vertex,position));
glNormalPointer(3,GL_FLOAT,sizeof(Vertex),offsetof(Vertex,normal));
Just pass a single chunk of vertex data. And then use gl*Pointer to describe how the data is packed using offsetof macro.
For knowing about the offset of the last parameter just look at this post....
What's the "offset" parameter in GLES20.glVertexAttribPointer/glDrawElements, and where does ptr/indices come from?