I'm trying to visualize very large point cloud (700 mln points) and on glDrawArrays call debugger throws access violation writing location exception. I'm using the same code to render smaller clouds (100 mln) and everything works fine. I also have enough RAM memory (32GB) to store the data.
To store point cloud I'm using std::vector<Point3D<float>> where Point3D is
template <class T>
union Point3D
{
T data[3];
struct{
T x;
T y;
T z;
};
}
Vertex array and buffer initialization:
glBindVertexArray(pxCloudHeader.uiVBA);
glBindBuffer(GL_ARRAY_BUFFER, pxCloudHeader.xVBOs.uiVBO_XYZ);
glBufferData(GL_ARRAY_BUFFER, pxCloudHeader.iPointsCount * sizeof(GLfloat) * 3, &p3DfXYZ->data[0], GL_STREAM_DRAW);
glVertexAttribPointer((GLuint)0, 3, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArray(0);
glBindVertexArray(0);
Drawing call:
glBindVertexArray(pxCloudHeader.uiVBA);
glDrawArrays(GL_POINTS, 0, pxCloudHeader.iPointsCount); // here exception is thrown
glBindVertexArray(0);
I also checked if there was OpenGL error thrown but I haven't found any.
I suspect your problem is due to the size of GLsizeiptr.
This is the data type used to represent sizes in OpenGL buffer objects, and it is typically 32-bit.
700 million vertices * 4-bytes per-component * 3-components = 8,400,000,000 bytes
There is a serious issue with trying to allocate that many bytes in GL if it is using 32-bit pointers:
8400000000 & 0xFFFFFFFF = 4,105,032,704 (half as many bytes as you actually need)
If sizeof (GLsizeiptr) on your implementation is 4 then you will have no choice but to split your array up. A 32-bit GLsizeiptr only allows you to store 4 contiguous GiB of memory, but you can work around this if you use 3 single-component arrays instead. Using a vertex shader you can reconstruct these 3 separate (small enough) arrays like so:
#version 330
layout (location = 0) in float x; // Vertex Attrib Ptr. 0
layout (location = 1) in float y; // Vertex Attrib Ptr. 1
layout (location = 2) in float z; // Vertex Attrib Ptr. 2
void main (void)
{
gl_Position = vec4 (x,y,z,1.0);
}
Performance is going to be awful, but that is one way to approach the problem with minimal effort.
By the way, the amount of system memory here (32 GiB) is not your biggest issue. You should be thinking in terms of the amount of VRAM on your GPU because ideally buffer objects are designed to be stored on the GPU. Any part of the buffer object that is too large to be stored in GPU memory will have to be transferred over the PCIe (these days) bus when it is used.
You could draw the data in smaller batches. While there is no predefined upper limit for the size of a buffer, storing 8 GBytes of data in a single buffer is a lot. I'm not very surprised that something would blow up.
I would probably start with storing something like 1 million, or at most a few million, points in each buffer. Then use a pool of buffers with this fixed size, enough to accommodate all your data points.
This might even be beneficial for you performance, because it allows you start submitting draw calls before copying all your data into buffers. This will give you better overlap between CPU and GPU work.
With the amount of data you are shuffling around, you may also want to look into using glMapBuffer()/glUnmapBuffer() instead of glBufferData(). This generally avoids one copy operation for the data.
Related
I'm flattening out an octree and sending it to my fragment shader using an SSBO, and I believe I am running into some memory alignment issues. I'm using std430 for the layout and binding a vector of voxels to this SSBO this is the structure in my shader. I'm using GLSL 4.3 FYI
struct Voxel
{
bool data; // 4
vec4 pos; // 16
vec4 col; // 16
float size; // 4
int index; // 4
int pIndex; // 4
int cIdx[8]; // 4, 16 or 32 bytes?
};
layout (std430, binding=2) buffer octreeData
{
Voxel voxels[];
};
I'm not 100% sure but I think I'm running into an issue using the int cIdx[8] array inside of the struct, looking at the spec (page 124, section 7.6)
If the member is an array of scalars or vectors, the base alignment and array
stride are set to match the base alignment of a single array element, according
to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. The
array may have padding at the end; the base offset of the member following
the array is rounded up to the next multiple of the base alignment.
I'm not entirely sure what the alignment is, I know the vec4's take up 16 bytes of memory, but how much does my array? If it was just sizeof(int)*8 that would be 32, but it says that it's set to the size of a single array element and then rounded up to a vec4 right? So does that mean my cIdx array has a base alignment of 16 bytes? There's no follow up members so is there padding getting added to my struct?
So total structure memory = 52 bytes (if we only allocate 4 bytes for cIdx), would that mean there is 12 bytes of padding being added on that I need to account for that may be causing me issues? If it was allocating 16 bytes would that be 64 bytes total for the structure and no memory alignment issues?
My corresponding c++ structure
struct Voxel
{
bool data;
glm::vec4 pos;
glm::vec4 col;
float size;
int index;
int pIndex;
int cIdx[8];
};
I'm then filling in my std::vector<Voxel> and passing it to my shader like so
glGenBuffers(1, &octreeSSBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, octreeSSBO);
glBufferData(GL_SHADER_STORAGE_BUFFER, voxelData.size()*sizeof(Voxel), voxelData.data(), GL_DYNAMIC_DRAW);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, octreeSSBO);
reading directly from the voxelData vector, I can confirm that the data is getting filled in correctly, and I can even occasionally see that the data is getting passed to the shader but behaving incorrectly compared to what I would expect to see based on the values I'm looking at.
Does it look like there are memory alignment issues here?
I'm not entirely sure what the alignment is
The specification is very clear as to what the base alignment of things are. Your problem is not in item #4 (std430 doesn't do the rounding specified in #4 anyway).
Your problem is in #2:
If the member is a two- or four-component vector with components consuming N basic machine units, the base alignment is 2N or 4N, respectively.
In GLSL, vec4 has a base alignment of 16. That means that any vec4 must be allocated on a 16-byte boundary.
pos must be on a 16-byte boundary. However, data is only 4 bytes. Therefore, 12 bytes of padding must be inserted between data and pos to satisfy std430's alignment requirements.
However, glm::vec4 has a C++ alignment of 4. So the C++ compiler does not insert a bunch of padding between data and pos. Thus, the types in the two languages do not agree.
You should explicitly align all GLM vectors in C++ structs that you want to match GLSL, using C++11's alignas keyword:
struct Voxel
{
bool data;
alignas(16) glm::vec4 pos;
alignas(16) glm::vec4 col;
float size;
int index;
int pIndex;
int cIdx[8];
};
Also, I would not assume that the C++ type bool and the GLSL type bool have the same size.
I'm making a voxel engine and I can render a chunk. I'm using instanced rendering, meaning that I can render all of the chunk with a single draw call. Every blocks of a chunk has a single int (From 0 to 4095) that defines his block type (0 for air, 1 for dirt, etc...). I wanna be able to render my block by applying the good texture in my fragment shader. My chunk contains a tri-dimensionnal array :
uint8_t blocks[16][16][16]
The problem is that I can't find a way to send my array of int to the shader. I tried using a VBO but it makes no-sense (I didn't get any result). I also tried to send my array with glUniform1iv() but I failed.
Is it possible to send an array of int to a shader with glUniformX() ?
In order to prevent storing big data, can I set a byte (uint8_t) instead of int with glUniformX() ?
Is there a good way to send that much data to my shader ?
Is instanced drawing a good way to draw the same model with different textures/types of blocks.
For all purposes and intents, data of this type should be treated like texture data. This doesn't mean literally uploading it as texture data, but rather that that's the frame of thinking you should be using when considering how to transfer it.
Or, in more basic terms: don't try to pass this data as uniform data.
If you have access to OpenGL 4.3+ (which is a reasonably safe bet for most hardware no older than 6-8 years), then Shader Storage Buffers are going to be the most laconic solution:
//GLSL:
layout(std430, binding = 0) buffer terrainData
{
int data[16][16][16];
};
void main() {
int terrainType = data[voxel.x][voxel.y][voxel.z];
//Do whatever
}
//HOST:
struct terrain_data {
int data[16][16][16];
};
//....
terrain_data data = get_terrain_data();
GLuint ssbo;
GLuint binding = 0;//Should be equal to the binding specified in the shader code
glGenBuffers(1, &ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo);
glBufferData(GL_SHADER_STORAGE_BUFFER, GLsizeiptr size, data.data, GLenum usage);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, binding, ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
Any point after this where you need to update the data, simply bind ssbo, call glBufferData (or your preferred method for updating buffer data), and then you're good to go.
If you're limited to older hardware, you do have some options, but they quickly get clunky:
You can use Uniform Buffers, which behave very similarly to Shader Storage Buffers, but
Have limited storage space (65kb in most implementations)
Have other restrictions that may or may not be relevant to your use case
You can use textures directly, where you convert the terrain data to floating point values (or use as integers, if the hardware supports integer formats internally), and then convert back inside the shader
Compatible with almost any hardware
But requires extra complexity and calculations in your shader code
I do second the approach as laid out in #Xirema's answer, but come to a slightly different recommendation. Since your original data type is just uint8_t, using an SSBO or UBO directly will require to either waste 3 bytes per element or to manually pack 4 elements into a single uint. From #Xirema's answer:
For all purposes and intents, data of this type should be treated like texture data. This doesn't mean literally uploading it as texture data, but rather that that's the frame of thinking you should be using when considering how to transfer it.
I totally agree to that. Hence I recommend the use of a Texture Buffer Object (TBO), (a.k.a. "Buffer Texture").
Using glTexBuffer() you can basically re-interpret a buffer object as a texture. In your case, you can just pack the uint8_t[16][16][16] array into a buffer and interpret it as GL_R8UI "texture" format, like this:
//GLSL:
uniform usamplerBuffer terrainData;
void main() {
uint terrainType = texelFetch(terrainData, voxel.z * (16*16) + voxel.y * 16 + voxel.x).r
//Do whatever
}
//HOST:
struct terrain_data {
uint8_t data[16][16][16];
};
//....
terrain_data data = get_terrain_data();
GLuint tbo;
GLuint tex;
glGenBuffers(1, &tbo);
glBindBuffer(GL_TEXTURE_BUFFER, tbo);
glBufferData(GL_TEXTURE_BUFFER, sizeof(terrain_data), data.data, usage);
glGenTextures(1, &tex);
glBindTexture(GL_TEXTURE_BUFFER, tex);
glTexBuffer(GL_TEXTURE_BUFFER, GL_R8UI, tbo);
Note that this will not copy the data to some texture object. Accessing the texture means directly accessing the memory of the buffer.
TBOs also have the advantage that they are available since OpenGL 3.1.
I'm writing ray-tracing on OGL computing shaders, to pass data to and from shaders I use buffers.
When size of vec2 output buffer (which is equal to number of rays multiplied by number of faces) reaches ~30Mb attempt of mapping buffer is stable returning NULL pointer. Range mapping also fails.
I can't find any info about GL_SHADER_STORAGE_BUFFER limitations in ogl documentation, but maybe someone can help me, is ~30Mb limit or this mapping-fail may happen because of something different?
And is there any way to avoid this except for calling shader multiple times?
Data declaration in shader:
#version 440
layout(std430, binding=0) buffer rays{
vec4 r[];
};
layout(std430, binding=1) buffer faces{
vec4 f[];
};
layout(std430, binding=2) buffer outputs{
vec2 o[];
};
uniform int face_count;
uniform vec4 origin;
Calling code (using some Qt5 wrappers):
QOpenGLBuffer ray_buffer;
QOpenGLBuffer face_buffer;
QOpenGLBuffer output_buffer;
QVector<QVector2D> output;
output.resize(rays[r].size()*faces.size());
if(!ray_buffer.create()) { /*...*/ }
if(!ray_buffer.bind()) { /*...*/ }
ray_buffer.allocate(rays.data(), rays.size()*sizeof(QVector4D));
if(!face_buffer.create()) { /*...*/ }
if(!face_buffer.bind()) { /*...*/ }
face_buffer.allocate(faces.data(), faces.size()*sizeof(QVector4D));
if(!output_buffer.create()) { /*...*/ }
if(!output_buffer.bind()) { /*...*/ }
output_buffer.allocate(output.size()*sizeof(QVector2D));
ogl->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ray_buffer.bufferId());
ogl->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, face_buffer.bufferId());
ogl->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, output_buffer.bufferId());
int face_count = faces.size();
compute.setUniformValue("face_count", face_count);
compute.setUniformValue("origin", pos);
ogl->glDispatchCompute(rays.size()/256, faces.size(), 1);
ray_buffer.destroy();
face_buffer.destroy();
QVector2D* data = (QVector2D*)output_buffer.map(QOpenGLBuffer::ReadOnly);
First of all, you have to understand that the OpenGL specification defines minimum maxima for a variety of values (the ones starting with a MAX_{*} prefix). That means that implementations are required to at least provide the specified amount as the maximum value, but are free to increase the limit as implementors see fit. This way, developers can at least rely on some upper bound, but can still make provisions for possibly larger values.
Section 23 - State Tables summarizes what has been previously specified in the corresponding sections. The information you were looking for is found in table 23.64 - Implementation Dependent Aggregate Shader Limits (cont.). If you want to know about which state belongs where (because there is per-object state, quasi-global state, program state and so on), you go to section 23.
The minimum maximum size of a shader storage buffer is represented by the symbolic constant MAX_SHADER_STORAGE_BLOCK_SIZE as per section 7.8 of the core OpenGL 4.5 specification.
Since their adoption into core, the required size (i.e. the minimum maximum) has been significantly increased. In core OpenGL 4.3 and 4.4, the minimum maximum was pow(2, 24) (or 16MB with 1 byte basic machine units and 1MB = 1024^2 bytes) - in core OpenGL 4.5 this value is now pow(2, 27) (or 128MB)
Summary: When in doubt about OpenGL state, refer to section 23 of the core specification.
From OpenGL Wiki:
SSBOs can be much larger. The OpenGL spec guarantees that UBOs can be
up to 16KB in size (implementations can allow them to be bigger). The
spec guarantees that SSBOs can be up to 128MB. Most implementations
will let you allocate a size up to the limit of GPU memory.
OpenGL < 4.5 guarantees only 16MiB (OpenGL 4.5 increased the minimum to 128MiB) , you can try using glGet() to query if you can bind more.
GLint64 max;
glGetInteger64v(GL_MAX_SHADER_STORAGE_BLOCK_SIZE, &max);
In fact problem seems to be in Qt wrappers. Didn't look in-depth, but when I've changed QOpenGLBuffer's create(), bind(), allocate() and map() to glCreateBuffers(), glBindBuffer(), glNamedBufferData() and glMapNamedBuffer(), all called through QOpenGLFunctions_4_5_Core, memory problem was gone until I reached 2Gb (which is GPU physical memory limit).
Second error I've made was not using glMemoryBarrier(), but it didn't help while QOpenGLBuffer was in use.
I've been writing something using GL3.3 which takes a uniform buffer, and uses the information from it to select sprite tiles in a frag shader. It's working on my desktop, with a Nvidia GTX780, but my AMD based laptop (A6-4455M) has some issues with it. Both are on the latest (or very recent) drivers.
Back to the code, It first of all sets up a uniform buffer, which consists of two uints, and a uint array. They then get filled, and are accessed in the shader. At first I got a GL error on the laptop because I was not allocating enough, but a temporary change taking padding into account has sorted that out, and now data is actually being buffered.
The first two uints are no problem. I've also got the array somewhat readable in the shader, there is just one problem; The data is multiplied by four! At the moment the array is just some test data, initialized to its index, so spriteArr[1] == 1, spriteArr[34] == 34, etc. However, Accessing it in the shader, spriteArr[10] gives 40. This goes all the way up to spriteArr[143] == 572. Beyond this and it's something else. I don't know exactly why this is, but it would appear to be an incorrect offset.
I am using the shared uniform layout, and getting the uniform offsets from GL itself, so they should be correct. I did notice that the offsets on the AMD card are much larger, as if it is adding more padding. They are always 0,4,8 on the desktop, but 0,16,32 on the laptop.
If it makes any difference, there is another UBO (binding point 0), which is used for the view and projection matrices. These work as intended. However it is not used in the fragment shader. It is also created before this UBO.
UBO initialisation code:
GLuint spriteUBO;
glGenBuffers(1, &spriteUBO);
glBindBuffer(GL_UNIFORM_BUFFER, spriteUBO);
unsigned maxsize = (2 + 576 + 24) * sizeof(GLuint);
/*Bad I know, but temporary. AMD's driver adds 24 bytes of padding. Nvidias has none.
Not the cause of this problem. At least ensures we have enough allocated. */
glBufferData(GL_UNIFORM_BUFFER, maxsize, NULL, GL_STATIC_DRAW);
glBindBuffer(GL_UNIFORM_BUFFER, 0);
//Set binding point
GLuint spriteUBOIndex = glGetUniformBlockIndex(programID, "SpriteMatchData");
glUniformBlockBinding(programID, spriteUBOIndex, 1);
static const GLchar *unames[] =
{
"width", "height",
//"size",
"spriteArr"
};
GLuint uindices[3];
GLint offsets[3];
glGetUniformIndices(programID,3,unames,uindices);
glGetActiveUniformsiv(programID, 3, uindices, GL_UNIFORM_OFFSET, offsets);
//buffer stuff
glBindBufferBase(GL_UNIFORM_BUFFER, 1, spriteUBO);
glBufferSubData(GL_UNIFORM_BUFFER,offsets[0], sizeof(GLuint), tm.getWidth());
glBufferSubData(GL_UNIFORM_BUFFER, offsets[1], sizeof(GLuint), tm.getHeight());
glBufferSubData(GL_UNIFORM_BUFFER, offsets[2], tm.getTileCount() * sizeof(GLuint), tm.getSpriteArray());
Fragment Shader:
layout (shared) uniform SpriteMatchData{
uint width, height;
uint spriteArr[576];};
Then later on I experiment with the array with something like this:
if(spriteArr[10] == uint(40))
{
debug_colour = vec4(0.0,1.0,0.0,0.0);//green
}
else
{
debug_colour = vec4(1.0,0.0,0.0,0.0); //red
}
With debug_colour turning green in this instance.
Is there any way to sort this out with something that works with both systems? Why is the AMD driver handling this so differently? Could it be a bug in the way it deals with uniform uint arrays?
Why is the AMD driver handling this so differently?
Because that's what you asked for:
layout (shared) uniform SpriteMatchData
You explicitly asked for shared layout. That layout is implementation defined. Therefore, two different implementations are allowed to give you two different layouts. As such, if you want to use SpriteMatchData in a platform-independent way, you must query its layout from the program after linking it.
While you did query the offsets for the values, you did not query the array stride: the byte offset from element to element within the array. There is nothing in the specification that requires that shared layouts tightly pack arrays.
Really though, there's pretty much no reason not to use std140 layout. You can avoid all of this querying of offsets and simply design C++ structs that can be directly consumed by GLSL.
I am trying to use a buffer in a compute shader like this:
layout (binding = 1, std430) writeonly buffer bl1
{
uint data[gl_WorkGroupSize.x * gl_NumWorkGroups.x * gl_NumWorkGroups.y];
};
but I get the following error (because of using gl_NumWorkGroups for the size):
Array size must be a constant integer expression
How can I work around this?
Stop putting in a length at all:
layout (binding = 1, std430) writeonly buffer bl1
{
uint data[];
};
This is a feature unique to SSBOs. And you can only have one unsized array in an SSBO, and it must be the last member in the interface block. The size of data will be computed based on the size of the buffer object range you bind to that binding point. So if you bind 32KB of buffer space, you will get 8K of items (the size of a uint is 4 bytes).
At runtime, your shader can use gl_WorkGroupSize.x * gl_NumWorkGroups.x * gl_NumWorkGroups.y to compute the length of data. Alternatively, just use data.length() to get the length of the buffer that the user gave you. Alternatively... you don't need to explicitly know the length, depending on how you use it.
As long as your OpenGL buffer binding code uses a buffer with enough memory for your dispatch count and work group size, you're fine.