Vector Push_Back VS Emplace_Back - c++

I have two different methods for adding elements to a vector.
GUI_Vertices.emplace_back();
GUI_Vertices.back().pos.x = ((float)x / 400) - 1.f;
GUI_Vertices.back().pos.y = ((float)y / 300) - 1.f;
GUI_Vertices.back().texCoord.x = u;
GUI_Vertices.back().texCoord.y = v;
GUI_Vertices.back().color.r = m_Color.r / 128;
GUI_Vertices.back().color.g = m_Color.g / 128;
GUI_Vertices.back().color.b = m_Color.b / 128;
GUI_Vertices.back().color.a = m_Color.a / 128;
The above code works, however I am forced to add a new element to the GUI_Vertices vector.
Vertex NewVertex;
NewVertex.pos.x = ((float)x / 400) - 1.f;
NewVertex.pos.y = ((float)y / 300) - 1.f;
NewVertex.texCoord.x = u;
NewVertex.texCoord.y = v;
NewVertex.color.r = m_Color.r / 128;
NewVertex.color.g = m_Color.g / 128;
NewVertex.color.b = m_Color.b / 128;
NewVertex.color.a = m_Color.a / 128;
GUI_Vertices.emplace_back(NewVertex);
The above code works sometimes and I can conditionally add the NewVertex into the GUI_Vertices vector if needed.
Here is the definition of Vertex:
struct Vertex {
glm::vec3 pos;
glm::vec4 color;
glm::vec2 texCoord;
static VkVertexInputBindingDescription getBindingDescription() {
VkVertexInputBindingDescription bindingDescription = {};
bindingDescription.binding = 0;
bindingDescription.stride = sizeof(Vertex);
bindingDescription.inputRate = VK_VERTEX_INPUT_RATE_VERTEX;
return bindingDescription;
}
static std::array<VkVertexInputAttributeDescription, 3> getAttributeDescriptions() {
std::array<VkVertexInputAttributeDescription, 3> attributeDescriptions = {};
attributeDescriptions[0].binding = 0;
attributeDescriptions[0].location = 0;
attributeDescriptions[0].format = VK_FORMAT_R32G32B32_SFLOAT;
attributeDescriptions[0].offset = offsetof(Vertex, pos);
attributeDescriptions[1].binding = 0;
attributeDescriptions[1].location = 1;
attributeDescriptions[1].format = VK_FORMAT_R32G32B32A32_SFLOAT;
attributeDescriptions[1].offset = offsetof(Vertex, color);
attributeDescriptions[2].binding = 0;
attributeDescriptions[2].location = 2;
attributeDescriptions[2].format = VK_FORMAT_R32G32_SFLOAT;
attributeDescriptions[2].offset = offsetof(Vertex, texCoord);
return attributeDescriptions;
}
bool operator==(const Vertex& other) const {
return pos == other.pos && color == other.color && texCoord == other.texCoord;
}
};
namespace std {
template<> struct hash<Vertex> {
size_t operator()(Vertex const& vertex) const {
return ((hash<glm::vec3>()(vertex.pos) ^
(hash<glm::vec4>()(vertex.color) << 1)) >> 1) ^
(hash<glm::vec2>()(vertex.texCoord) << 1);
}
};
}
Later on in program execution, after adding all our Vertex elements to the GUI_Vertex vector I perform the following operation on GUI_Vertex:
memcpy(GUI_VertexAllocation->GetMappedData(), GUI_Vertices.data(), sizeof(Vertex) * GUI_Vertices.size());
I'm copying the memory from GUI_Vertices into a preallocated buffer which will be used by Vulkan to render our vertices.
Now i'm trying to figure out why the first method of adding Vertex objects into GUI_Vertices always works and the second method only sometimes works.
Here is a link to the entire project https://github.com/kklouzal/WorldEngine/blob/GUI_Indirect_Draw/Vulkan/VulkanGWEN.hpp
After recompiling the project the second method will occasionally work so I'm getting some undefined behavior here. I have checked the validity of GUI_Vertices up until the point where we do our memcpy and the data appears to be valid so I'm not sure whats going on.
I would like to get the second method working so I can conditionally add new vertices into the buffer.

NewVertex.pos.x = ((float)x / 400) - 1.f;
NewVertex.pos.y = ((float)y / 300) - 1.f;
...
glm::vec3 pos;
emplace_back will always perform value initialization on the object it creates, which initializes all of the data members. By contrast, Vertex NewVertex; will default-initialize the object, which leaves its members uninitialized (since the GLM types have trivial default constructors).
So pos.z is uninitialized. And your code doesn't initialize it yourself. So you're sending uninitialized garbage to the GPU.
If you create the object with Vertex NewVertex{};, then it will be value-initialized, just like emplace_back does.

Related

I'm experiencing very slow OpenGL compute shader compilation (10+ minutes) when using larger work groups, is there anything I can do to speed it up?

So, I'm encountering a really bizarre (at least to me as a compute shader noob) phenomenon when I compile my compute shader using glGetShaderiv(m_shaderID, GL_COMPILE_STATUS, &status). Inexplicably, my compute shader takes much longer to compile when I increase the size of my work groups! When I have one-dimensional work groups, it compiles in less than a second, but when I increase the size of my work groups to 4x1x6, the compute shader takes 10+ minutes to compile! How strange.
For background, I'm trying to implement a light clustering algorithm (essentially the one shown here: http://www.aortiz.me/2018/12/21/CG.html#tiled-shading--forward), and my compute shader is this monster:
// TODO: Figure out optimal tile size, currently using a 16x9x24 subdivision
#define FLT_MAX 3.402823466e+38
#define FLT_MIN 1.175494351e-38
#define DBL_MAX 1.7976931348623158e+308
#define DBL_MIN 2.2250738585072014e-308
layout(local_size_x = 4, local_size_y = 9, local_size_z = 4) in;
// TODO: Change to reflect my light structure
// struct PointLight{
// vec4 position;
// vec4 color;
// uint enabled;
// float intensity;
// float range;
// };
// TODO: Pack this more efficiently
struct Light {
vec4 position;
vec4 direction;
vec4 ambientColor;
vec4 diffuseColor;
vec4 specularColor;
vec4 attributes;
vec4 intensity;
ivec4 typeIndexAndFlags;
// uint flags;
};
// Array containing offset and number of lights in a cluster
struct LightGrid{
uint offset;
uint count;
};
struct VolumeTileAABB{
vec4 minPoint;
vec4 maxPoint;
};
layout(std430, binding = 0) readonly buffer LightBuffer {
Light data[];
} lightBuffer;
layout (std430, binding = 1) buffer clusterAABB{
VolumeTileAABB cluster[ ];
};
layout (std430, binding = 2) buffer screenToView{
mat4 inverseProjection;
uvec4 tileSizes;
uvec2 screenDimensions;
};
// layout (std430, binding = 3) buffer lightSSBO{
// PointLight pointLight[];
// };
// SSBO of active light indices
layout (std430, binding = 4) buffer lightIndexSSBO{
uint globalLightIndexList[];
};
layout (std430, binding = 5) buffer lightGridSSBO{
LightGrid lightGrid[];
};
layout (std430, binding = 6) buffer globalIndexCountSSBO{
uint globalIndexCount;
};
// Shared variables, shared between all invocations WITHIN A WORK GROUP
// TODO: See if I can use gl_WorkGroupSize for this, gl_WorkGroupSize.x * gl_WorkGroupSize.y * gl_WorkGroupSize.z
// A grouped-shared array which contains all the lights being evaluated
shared Light sharedLights[4*9*4]; // A grouped-shared array which contains all the lights being evaluated, size is thread-count
uniform mat4 viewMatrix;
bool testSphereAABB(uint light, uint tile);
float sqDistPointAABB(vec3 point, uint tile);
bool testConeAABB(uint light, uint tile);
float getLightRange(uint lightIndex);
bool isEnabled(uint lightIndex);
// Runs in batches of multiple Z slices at once
// In this implementation, 6 batches, since each thread group contains four z slices (24/4=6)
// We begin by each thread representing a cluster
// Then in the light traversal loop they change to representing lights
// Then change again near the end to represent clusters
// NOTE: Tiles actually mean clusters, it's just a legacy name from tiled shading
void main(){
// Reset every frame
globalIndexCount = 0; // How many lights are active in t his scene
uint threadCount = gl_WorkGroupSize.x * gl_WorkGroupSize.y * gl_WorkGroupSize.z; // Number of threads in a group, same as local_size_x, local_size_y, local_size_z
uint lightCount = lightBuffer.data.length(); // Number of total lights in the scene
uint numBatches = uint((lightCount + threadCount -1) / threadCount); // Number of groups of lights that will be completed, i.e., number of passes
uint tileIndex = gl_LocalInvocationIndex + gl_WorkGroupSize.x * gl_WorkGroupSize.y * gl_WorkGroupSize.z * gl_WorkGroupID.z;
// uint tileIndex = gl_GlobalInvocationID; // doesn't wortk, is uvec3
// Local thread variables
uint visibleLightCount = 0;
uint visibleLightIndices[100]; // local light index list, to be transferred to global list
// Every light is being checked against every cluster in the view frustum
// TODO: Perform active cluster determination
// Each individual thread will be responsible for loading a light and writing it to shared memory so other threads can read it
for( uint batch = 0; batch < numBatches; ++batch){
uint lightIndex = batch * threadCount + gl_LocalInvocationIndex;
//Prevent overflow by clamping to last light which is always null
lightIndex = min(lightIndex, lightCount);
//Populating shared light array
// NOTE: It is VERY important that lightBuffer.data not be referenced after this point,
// since that is not thread-safe
sharedLights[gl_LocalInvocationIndex] = lightBuffer.data[lightIndex];
barrier(); // Synchronize read/writes between invocations within a work group
//Iterating within the current batch of lights
for( uint light = 0; light < threadCount; ++light){
if( isEnabled(light)){
uint lightType = uint(sharedLights[light].typeIndexAndFlags[0]);
if(lightType == 0){
// Point light
if( testSphereAABB(light, tileIndex) ){
visibleLightIndices[visibleLightCount] = batch * threadCount + light;
visibleLightCount += 1;
}
}
else if(lightType == 1){
// Directional light
visibleLightIndices[visibleLightCount] = batch * threadCount + light;
visibleLightCount += 1;
}
else if(lightType == 2){
// Spot light
if( testConeAABB(light, tileIndex) ){
visibleLightIndices[visibleLightCount] = batch * threadCount + light;
visibleLightCount += 1;
}
}
}
}
}
// We want all thread groups to have completed the light tests before continuing
barrier();
// Back to every thread representing a cluster
// Adding the light indices to the cluster light index list
uint offset = atomicAdd(globalIndexCount, visibleLightCount);
for(uint i = 0; i < visibleLightCount; ++i){
globalLightIndexList[offset + i] = visibleLightIndices[i];
}
// Updating the light grid for each cluster
lightGrid[tileIndex].offset = offset;
lightGrid[tileIndex].count = visibleLightCount;
}
// Return whether or not the specified light intersects with the specified tile (cluster)
bool testSphereAABB(uint light, uint tile){
float radius = getLightRange(light);
vec3 center = vec3(viewMatrix * sharedLights[light].position);
float squaredDistance = sqDistPointAABB(center, tile);
return squaredDistance <= (radius * radius);
}
// TODO: Different test for spot-lights
// Has been done by using several AABBs for spot-light cone, this could be a good approach, or even just use one to start.
bool testConeAABB(uint light, uint tile){
// Light light = lightBuffer.data[lightIndex];
// float innerAngleCos = light.attributes[0];
// float outerAngleCos = light.attributes[1];
// float innerAngle = acos(innerAngleCos);
// float outerAngle = acos(outerAngleCos);
// FIXME: Actually do something clever here
return true;
}
// Get range of light given the specified light index
float getLightRange(uint lightIndex){
int lightType = sharedLights[lightIndex].typeIndexAndFlags[0];
float range;
if(lightType == 0){
// Point light
float brightness = 0.01; // cutoff for end of range
float c = sharedLights[lightIndex].attributes.x;
float lin = sharedLights[lightIndex].attributes.y;
float quad = sharedLights[lightIndex].attributes.z;
range = (-lin + sqrt(lin*lin - 4.0 * c * quad + (4.0/brightness)* quad)) / (2.0 * quad);
}
else if(lightType == 1){
// Directional light
range = FLT_MAX;
}
else{
// Spot light
range = FLT_MAX;
}
return range;
}
// Whether the light at the specified index is enabled
bool isEnabled(uint lightIndex){
uint flags = sharedLights[lightIndex].typeIndexAndFlags[2];
return (flags | 1) != 0;
}
// Get squared distance from a point to the AABB of the specified tile (cluster)
float sqDistPointAABB(vec3 point, uint tile){
float sqDist = 0.0;
VolumeTileAABB currentCell = cluster[tile];
cluster[tile].maxPoint[3] = tile;
for(int i = 0; i < 3; ++i){
float v = point[i];
if(v < currentCell.minPoint[i]){
sqDist += (currentCell.minPoint[i] - v) * (currentCell.minPoint[i] - v);
}
if(v > currentCell.maxPoint[i]){
sqDist += (v - currentCell.maxPoint[i]) * (v - currentCell.maxPoint[i]);
}
}
return sqDist;
}
Edit: Whoops, lost the bottom part of this!
What I don't understand is why changing the size of the work groups affects compilation time at all? It sort of defeats the point of the algorithm if my work group sizes are too small for the compute shader to run efficiently, so I'm hoping there's something that I'm missing.
As a last note, I'd like to avoid using glGetProgramBinary as a solution. Not only because it merely circumvents the issue instead of solving it, but because pre-compiling shaders will not play nicely with the engine's current architecture.
So, I'm figuring that this must be a bug in the compiler, since I've replaced the loop in my sqDistPointAABB function with:
vec3 minPoint = currentCell.minPoint.xyz;
vec3 maxPoint = currentCell.maxPoint.xyz;
vec3 t1 = vec3(lessThan(point, minPoint));
vec3 t2 = vec3(greaterThan(point, maxPoint));
vec3 sqDist = t1 * (minPoint - point) * (minPoint - point) + t2 * (maxPoint - point) * (maxPoint - point);
return sqDist.x + sqDist.y + sqDist.z;
And it compiles just fine now, in less than a second! So strange

Setting multiple descriptors in a descriptor range in Direct3D 12

First of all, my understanding of a descriptor range is that I can specify multiple buffers (constant buffers in my case) that a shader may use, is that correct? If not, then this is where my misunderstanding is, and the rest of the question will make no sense.
Lets say I want to pass a couple of constant buffers in my vertex shader
// Vertex.hlsl
float value0 : register(b0)
float value1 : register(b1)
...
And for whatever reason I want to use a descriptor range to specify b0 and b1. I fill out a D3D12_DESCRIPTOR_RANGE:
D3D12_DESCRIPTOR_RANGE range;
range.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_CBV;
range.NumDescriptors = 2;
range.BaseShaderRegister = 0;
range.RegisterSpace = 0;
range.OffsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND;
I then go on to shove this into a root parameter
D3D12_ROOT_PARAMETER param;
param.ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
param.DescriptorTable.NumDescriptorRanges = 1;
param.DescriptorTable.pDescriptorRanges = &range;
param.ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX;
Root parameter goes into my signature description
D3D12_ROOT_SIGNATURE_DESC1 signatureDesc;
signatureDesc.NumParameters = 1;
signatureDesc.pParameters = &param;
signatureDesc.NumStaticSamplers = 0;
signatureDesc.pStaticSamplers = nullptr;
D3D12_ROOT_SIGNATURE_FLAGS = D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT;
After this is create my root signature and so on. I also created a heap for 2 descriptors
D3D12_DESCRIPTOR_HEAP_DESC heapDescCbv;
heapDescCbv.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV;
heapDescCbv.NumDescriptors = 2;
heapDescCbv.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE;
heapDescCbv.NodeMask = 0;
ThrowIfFailed(m_device->CreateDescriptorHeap(&heapDescCbv, IID_PPV_ARGS(&m_cbvHeap)));
I then mapped the respective ID3D12Resource's to get two pointers so I can memcpy my values to them.
void D3D12App::AllocateConstantBuffer(SIZE_T index, size_t dataSize, ID3D12Resource** buffer, void** mappedPtr)
{
D3D12_HEAP_PROPERTIES heapProp;
heapProp.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;
heapProp.MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
heapProp.CreationNodeMask = 1;
heapProp.VisibleNodeMask = 1;
heapProp.Type = D3D12_HEAP_TYPE_UPLOAD;
D3D12_RESOURCE_DESC resDesc;
resDesc.Dimension = D3D12_RESOURCE_DIMENSION_BUFFER;
resDesc.Alignment = 0;
resDesc.Width = (dataSize + 255) & ~255;
resDesc.Height = 1;
resDesc.DepthOrArraySize = 1;
resDesc.MipLevels = 1;
resDesc.Format = DXGI_FORMAT_UNKNOWN;
resDesc.SampleDesc.Count = 1;
resDesc.SampleDesc.Quality = 0;
resDesc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR;
resDesc.Flags = D3D12_RESOURCE_FLAG_NONE;
ThrowIfFailed(m_device->CreateCommittedResource(&heapProp, D3D12_HEAP_FLAG_NONE,
&resDesc, D3D12_RESOURCE_STATE_GENERIC_READ, nullptr, IID_PPV_ARGS(buffer)));
D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc;
cbvDesc.BufferLocation = (*buffer)->GetGPUVirtualAddress();
cbvDesc.SizeInBytes = (dataSize + 255) & ~255;
auto cbvHandle = m_cbvHeap->GetCPUDescriptorHandleForHeapStart();
cbvHandle.ptr += index * m_device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
m_device->CreateConstantBufferView(&cbvDesc, cbvHandle);
D3D12_RANGE readRange;
readRange.Begin = 0;
readRange.End = 0;
ThrowIfFailed((*buffer)->Map(0, &readRange, mappedPtr));
}
AllocateConstantBuffer(0, sizeof(m_value0), &m_value0Resource, reinterpret_cast<void**>&m_constPtrvalue0));
AllocateConstantBuffer(1, sizeof(m_value1), &m_value1Resource, reinterpret_cast<void**>&m_constPtrvalue1));
The problem is when I want to feed them to the pipeline. When rendering, I used
auto cbvHandle = m_cbvHeap->GetGPUDescriptorHandleForHeapStart();
m_commandList->SetGraphicsRootDescriptorTable(0, cbvHandle);
The result I get is only register(b0) got the the correct value, and register(b1) remains uninitialized. What did I do wrong?
OK I got it to work. Turned out I need to change the shader a bit:
cbuffer a : register(b0) { float value0; }
cbuffer b : register(b1) { float value1; };
This gave me another question though, according to this link:
https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-constants
the buffer names a and b should be optional, but when I tried that the shaders cannot compile. I guess that is a different question.

Vertices duplication elimination in models

When importing a model in .obj format a lot of the polygons share vertices and thus consume memory in vain, what I want to do is remove the duplicates by only saving unique vertices.
Hashing of Vertex
/// Template specialization for hashing of a Vec3
namespace std {
template<typename T>
struct hash<Vec3<T>> {
void hash_combine(size_t &seed, const size_t &hash) const {
seed ^= hash + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
size_t operator() (const Vec3<T> &vec) const {
auto hasher = hash<float>{};
auto hashed_x = hasher(vertex.position.x);
auto hashed_y = hasher(vertex.position.y);
auto hashed_z = hasher(vertex.position.z);
auto hashed_color_r = hasher(vertex.color.r);
auto hashed_color_g = hasher(vertex.color.g);
auto hashed_color_b = hasher(vertex.color.b);
auto hashed_color_a = hasher(vertex.color.a);
auto hashed_texcoord_x = hasher(vertex.texCoord.x);
auto hashed_texcoord_y = hasher(vertex.texCoord.y);
auto hashed_normal_x = hasher(vertex.normal.x);
auto hashed_normal_y = hasher(vertex.normal.y);
auto hashed_normal_z = hasher(vertex.normal.z);
size_t seed = 0;
hash_combine(seed, hashed_x);
hash_combine(seed, hashed_y);
hash_combine(seed, hashed_z);
hash_combine(seed, hashed_texcoord_x);
hash_combine(seed, hashed_texcoord_y);
hash_combine(seed, hashed_normal_x);
hash_combine(seed, hashed_normal_y);
hash_combine(seed, hashed_normal_z);
return seed;
}
};
}
Importing mesh with tinyobjcloader
Mesh Renderer::load_mesh_from_file(std::string filepath) {
tinyobj::attrib_t attrib;
std::vector<tinyobj::shape_t> shapes;
std::vector<tinyobj::material_t> materials;
std::string err;
auto success = tinyobj::LoadObj(&attrib, &shapes, &materials, &err, filepath.c_str());
if (!success) { SDL_Log("Failed loading mesh %s: %s", filepath.c_str(), err.c_str()); return Mesh(); }
std::unordered_map<Vertex<float>, size_t> unique_vertices{};
Mesh mesh{};
for (auto shape : shapes) { // Shapes
size_t index_offset = 0;
for (auto face : shape.mesh.num_face_vertices) { // Faces (polygon)
for (auto v = 0; v < face; v++) {
tinyobj::index_t idx = shape.mesh.indices[index_offset + v];
Vertex<float> vertex{};
float vx = attrib.vertices[3 * idx.vertex_index + 0];
float vy = attrib.vertices[3 * idx.vertex_index + 1];
float vz = attrib.vertices[3 * idx.vertex_index + 2];
vertex.position = {vx, vy, vz};
float tx = attrib.vertices[3 * idx.texcoord_index + 0];
float ty = attrib.vertices[3 * idx.texcoord_index + 1];
vertex.texCoord = {tx, ty};
float nx = attrib.normals[3 * idx.normal_index + 0];
float ny = attrib.normals[3 * idx.normal_index + 1];
float nz = attrib.normals[3 * idx.normal_index + 2];
vertex.normal = {nx, ny, nz};
// These two lines work just fine (includes all vertices)
// mesh.vertices.push_back(vertex);
// mesh.indices.push_back(mesh.indices.size());
// Check for unique vertices, models will contain duplicates
if (unique_vertices.count(vertex) == 0) {
unique_vertices[vertex] = mesh.indices.size();
mesh.vertices.push_back(vertex);
mesh.indices.push_back(mesh.indices.size());
} else {
mesh.indices.push_back(unique_vertices.at(vertex));
}
}
index_offset += face;
}
}
SDL_Log("Number of vertices: %lu for model %s", mesh.vertices.size(), filepath.c_str());
return mesh;
}
The first image is when all the vertices are included.
This one is when I am only using unique vertices.
Any ideas?
if (unique_vertices.count(vertex) == 0) {
unique_vertices[vertex] = mesh.vertices.size();
mesh.indices.push_back(mesh.vertices.size());
mesh.vertices.push_back(vertex);
}
Explanation: indices are "pointers" to vertex locations. For that you need to get index where you write vertex data and not index for index data.
From the shown images, it seems to be a triangle-vertex reference problem.
Normally, obj format collects a list of unique vertices and each triangle is just a set of three indices corresponding to its three vertices. Let us assume that, for some reason, you do have a repetition of vertex A and vertex B and you decide to eliminate vertex B. In this case, you need to modify the references of all triangles containing B and substitute them with A.
Its not good idea to eliminate repeated coordinates. Coordinates will repeat for example in the stitching areas between meshes to form a closed 3D mesh structure. In 3D gaming, low polygon mesh structure is adopted to allow fast rendering, but unless otherwise , dealing with a these point clouds is not longer a big issue as powerful GPU and multi-core CPUs system are making real life like animations possible nowadays.

DirectX/C++: Marching Cubes Indexing

I've implemented the Marching Cube algorithm in a DirectX environment (To test and have fun). Upon completion, I noticed that the resulting model looks heavily distorted, as if the indices were off.
I've attempted to extract the indices, but I think the vertices are ordered correctly already, using the lookup tables, examples at http://paulbourke.net/geometry/polygonise/ . The current build uses a 15^3 volume.
Marching cubes iterates over the array as normal:
for (float iX = 0; iX < CellFieldSize.x; iX++){
for (float iY = 0; iY < CellFieldSize.y; iY++){
for (float iZ = 0; iZ < CellFieldSize.z; iZ++){
MarchCubes(XMFLOAT3(iX*StepSize, iY*StepSize, iZ*StepSize), StepSize);
}
}
}
The MarchCube function is called:
void MC::MarchCubes(){
...
int Corner, Vertex, VertexTest, Edge, Triangle, FlagIndex, EdgeFlags;
float Offset;
XMFLOAT3 Color;
float CubeValue[8];
XMFLOAT3 EdgeVertex[12];
XMFLOAT3 EdgeNorm[12];
//Local copy
for (Vertex = 0; Vertex < 8; Vertex++) {
CubeValue[Vertex] = (this->*fSample)(
in_Position.x + VertexOffset[Vertex][0] * Scale,
in_Position.y + VertexOffset[Vertex][1] * Scale,
in_Position.z + VertexOffset[Vertex][2] * Scale
);
}
FlagIndex = 0;
Intersection calculations:
...
//Test vertices for intersection.
for (VertexTest = 0; VertexTest < 8; VertexTest++){
if (CubeValue[VertexTest] <= TargetValue)
FlagIndex |= 1 << VertexTest;
}
//Find which edges are intersected by the surface.
EdgeFlags = CubeEdgeFlags[FlagIndex];
if (EdgeFlags == 0){
return;
}
for (Edge = 0; Edge < 12; Edge++){
if (EdgeFlags & (1 << Edge)) {
Offset = GetOffset(CubeValue[EdgeConnection[Edge][0]], CubeValue[EdgeConnection[Edge][1]], TargetValue); // Get offset function definition. Needed!
EdgeVertex[Edge].x = in_Position.x + VertexOffset[EdgeConnection[Edge][0]][0] + Offset * EdgeDirection[Edge][0] * Scale;
EdgeVertex[Edge].y = in_Position.y + VertexOffset[EdgeConnection[Edge][0]][1] + Offset * EdgeDirection[Edge][1] * Scale;
EdgeVertex[Edge].z = in_Position.z + VertexOffset[EdgeConnection[Edge][0]][2] + Offset * EdgeDirection[Edge][2] * Scale;
GetNormal(EdgeNorm[Edge], EdgeVertex[Edge].x, EdgeVertex[Edge].y, EdgeVertex[Edge].z); //Need normal values
}
}
And the original implementation gets pushed into a holding struct for DirectX.
for (Triangle = 0; Triangle < 5; Triangle++) {
if (TriangleConnectionTable[FlagIndex][3 * Triangle] < 0) break;
for (Corner = 0; Corner < 3; Corner++) {
Vertex = TriangleConnectionTable[FlagIndex][3 * Triangle + Corner];3 * Triangle + Corner]);
GetColor(Color, EdgeVertex[Vertex], EdgeNorm[Vertex]);
Data.VertexData.push_back(XMFLOAT3(EdgeVertex[Vertex].x, EdgeVertex[Vertex].y, EdgeVertex[Vertex].z));
Data.NormalData.push_back(XMFLOAT3(EdgeNorm[Vertex].x, EdgeNorm[Vertex].y, EdgeNorm[Vertex].z));
Data.ColorData.push_back(XMFLOAT4(Color.x, Color.y, Color.z, 1.0f));
}
}
(This is the same ordering as the original GL implementation)
Turns out, I missed a parenthesis showing operator precedence.
EdgeVertex[Edge].x = in_Position.x + (VertexOffset[EdgeConnection[Edge][0]][0] + Offset * EdgeDirection[Edge][0]) * Scale;
EdgeVertex[Edge].y = in_Position.y + (VertexOffset[EdgeConnection[Edge][0]][1] + Offset * EdgeDirection[Edge][1]) * Scale;
EdgeVertex[Edge].z = in_Position.z + (VertexOffset[EdgeConnection[Edge][0]][2] + Offset * EdgeDirection[Edge][2]) * Scale;
Corrected, obtained Visine; resumed fun.

OpenGL Draws Nothing When Sent Too Much Data?

I seem to have ran into a strange issue with OpenGL. Everything works fine with my class until I make the map too big (around 800x800 is the max), and then OpenGL doesn't draw anything. I have made calls to glGetBufferSubData, and as far as I could tell the data seemed correct in both the vertex and index buffers, yet nothing is being drawn? At first I assumed an overflow somewhere in my code, but according to std::numeric_limits my vertex and index iterators don't seem to come anywhere close to the max size of a (signed) int. I use a lot of wrapper classes around OpenGL objects, but they are very simple, usually inline calls to their OpenGL equivalent. Same for the "M_" typedefs around primitive types. Below are the main loop I render in, the class where I believe the issue lies, and 2 screenshots of the output.
Correct output: http://i.imgur.com/cvC1T7L.png
Blank ouput, after expanding map: http://i.imgur.com/MmmNgj4.png
Main loop:
int main(){
//open window
Memento::MainWindow& main_window = Memento::MainWindow::GetInstance();
Memento::MainWindow::Init();
main_window.SetTitle("Memento");
main_window.Open();
//matrices
glmx_mat4 ortho_matrix = {};
glmx_mat4_ortho(0.0f, 800.0f, 600.0f, 0.0f, 5.0f, 25.0f, ortho_matrix);
glmx_mat4 modelview_matrix = {};
glmx_mat4_identity(modelview_matrix);
glmx_vec3 translate_vec = {0.0f, 0.0f, -10.0f};
glmx_mat4_translate(modelview_matrix, translate_vec, modelview_matrix);
glmx_mat4_multiply(ortho_matrix, modelview_matrix, ortho_matrix);
//shaders
Memento::GLShader default_vert_shader("default.vert", GL_VERTEX_SHADER);
default_vert_shader.Compile();
Memento::GLShader default_frag_shader("default.frag", GL_FRAGMENT_SHADER);
default_frag_shader.Compile();
//program
Memento::GLProgram default_program;
default_program.Create();
default_program.AttachShader(default_vert_shader);
default_program.AttachShader(default_frag_shader);
Memento::GLVertexArray default_vert_array;
default_vert_array.Create();
default_vert_array.Bind();
//BufferGameMap class- where I believe the issue lies
Memento::TextureAtlas atlas1("atlas/cat_image.png", "atlas/cat_source.xml");
Memento::BufferGameMap map1("tryagain.tmx", atlas1);
//bind buffers
map1.GetVertexBuffer().Bind();
map1.GetIndexBuffer().Bind();
//upload vertex attributes
default_vert_array.EnableIndex(0);
default_vert_array.IndexData(0, 2, GL_FLOAT, NULL, 8 * sizeof(Memento::M_float));
default_vert_array.BindIndex(default_program, 0, "map_vert");
//link, validate, and use program
default_program.Link();
default_program.Validate();
default_program.Use();
//upload matrix as uniform
glUniformMatrix4fv(default_program.GetUniformLocation("modelviewprojection_matrix"),
1, GL_FALSE, ortho_matrix);
//main draw loop
while(not glfwGetKey(GLFW_KEY_ESC)){
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glDrawElements(GL_TRIANGLES, map1.GetIndexBufferLength(), GL_UNSIGNED_INT, NULL);
glfwSwapBuffers();
}
//close window & exit
main_window.Close();
return (0);
}
BufferGameMap class (issue is probably here!):
Memento::BufferGameMap::BufferGameMap(std::string const& file, const Memento::TextureAtlas& atlas):
TmxMap::GameMap(), background_color_color4(), vertex_buffer(), index_buffer(),
vertex_buffer_len(0), index_buffer_len(0){
Create(file, atlas);
}
Memento::M_void Memento::BufferGameMap::Create(std::string const& file, const Memento::TextureAtlas& atlas){
if(IsCreated())Destroy();
TmxMap::GameMap::CreateFromFile(file);
std::vector<TmxMap::Layer> const& layers = GetLayers();
if(not layers.empty()){
const std::vector<TmxMap::Layer>::const_iterator layers_end = layers.end();
std::vector<TmxMap::Layer>::const_iterator layers_iter = layers.begin();
Memento::M_float* vertex_buffer_data = NULL;
Memento::M_uint* index_buffer_data = NULL;
for(; layers_iter != layers_end; ++layers_iter){
vertex_buffer_len += layers_iter -> GetMapTiles().size() * (4 * (2 +
2 + 2 + 2));
index_buffer_len += layers_iter -> GetMapTiles().size() * 6;
}
vertex_buffer_data = new Memento::M_float[vertex_buffer_len];
index_buffer_data = new Memento::M_uint[index_buffer_len];
//fill data to send to the gl
Memento::M_sizei vertex_buffer_iter = 0, index_buffer_iter = 0, index_buffer_quad_iter = 0;
//map data
const Memento::M_uint map_size_x = GetMapSize().x, map_size_y = GetMapSize().y;
const Memento::M_float map_tile_size_x = GetTileSize().x, map_tile_size_y = GetTileSize().y;
//per layer data
std::vector<TmxMap::MapTile> const* map_tiles = NULL;
std::vector<TmxMap::MapTile>::const_iterator map_tiles_iter, map_tiles_end;
//per tile data
Memento::M_float map_origin_x = 0.0f, map_origin_y = 0.0f;
for(layers_iter = layers.begin(); layers_iter != layers_end; ++layers_iter){
map_tiles = &layers_iter -> GetMapTiles();
for(map_tiles_iter = map_tiles -> begin(), map_tiles_end = map_tiles -> end();
map_tiles_iter != map_tiles_end; ++map_tiles_iter,
vertex_buffer_iter += 4 * (2 + 2 + 2 +
2), index_buffer_iter += 6,
index_buffer_quad_iter += 4){
map_origin_x = static_cast<Memento::M_float>(map_tiles_iter -> map_tile_index /
map_size_y) * map_tile_size_x;
map_origin_y = static_cast<Memento::M_float>(map_tiles_iter -> map_tile_index %
map_size_y) * map_tile_size_y;
vertex_buffer_data[vertex_buffer_iter] = map_origin_x;
vertex_buffer_data[vertex_buffer_iter + 1] = map_origin_y;
//=========================================================
vertex_buffer_data[vertex_buffer_iter + 8] = map_origin_x;
vertex_buffer_data[vertex_buffer_iter + 9] = map_origin_y + map_tile_size_y;
//=========================================================
vertex_buffer_data[vertex_buffer_iter + 16] = map_origin_x + map_tile_size_x;
vertex_buffer_data[vertex_buffer_iter + 17] = map_origin_y + map_tile_size_y;
//=========================================================
vertex_buffer_data[vertex_buffer_iter + 24] = map_origin_x + map_tile_size_x;
vertex_buffer_data[vertex_buffer_iter + 25] = map_origin_y;
//=========================================================
index_buffer_data[index_buffer_iter] = index_buffer_quad_iter;
index_buffer_data[index_buffer_iter + 1] = index_buffer_quad_iter + 1;
index_buffer_data[index_buffer_iter + 2] = index_buffer_quad_iter + 2;
index_buffer_data[index_buffer_iter + 3] = index_buffer_quad_iter;
index_buffer_data[index_buffer_iter + 4] = index_buffer_quad_iter + 2;
index_buffer_data[index_buffer_iter + 5] = index_buffer_quad_iter + 3;
}
}
vertex_buffer.Create(GL_ARRAY_BUFFER, GL_STATIC_DRAW);
vertex_buffer.Bind();
vertex_buffer.AllocateRef(vertex_buffer_len * sizeof(Memento::M_float),
static_cast<const Memento::M_void*>(vertex_buffer_data));
vertex_buffer.Unbind();
index_buffer.Create(GL_ELEMENT_ARRAY_BUFFER, GL_STATIC_DRAW);
index_buffer.Bind();
index_buffer.AllocateRef(index_buffer_len * sizeof(Memento::M_uint),
static_cast<const Memento::M_void*>(index_buffer_data));
index_buffer.Unbind();
delete[] vertex_buffer_data;
delete[] index_buffer_data;
}
}
Vertex shader:
#version 140
precision highp float;
uniform mat4 modelviewprojection_matrix;
in vec2 map_vert;
void main(){
gl_Position = modelviewprojection_matrix * vec4(map_vert, 0, 1);
}
Fragment shader:
#version 140
precision highp float;
out vec4 frag_color;
void main(){
frag_color = vec4(0.4, 0.2, 0.6, 0.5);
}
I think you are running out of stack memory.
By allocating the data on the heap you can use all the memory available to your process, while the stack is limited to 1MB.
In other words: Move the object allocation outside of the main scope to the global scope.
Memento::TextureAtlas * atlas1;//("atlas/cat_image.png", "atlas/cat_source.xml");
Memento::BufferGameMap * map1;//("tryagain.tmx", atlas1);
int main(){
atlas1 = new Memento::TextureAtlas("atlas/cat_image.png", "atlas/cat_source.xml");
map1 = new Memento::BufferGameMap("tryagain.tmx", atlas1);
//.... acess with ->
}
or if this will not cause compiler errors:
Memento::TextureAtlas atlas1("atlas/cat_image.png", "atlas/cat_source.xml");
Memento::BufferGameMap map1("tryagain.tmx", atlas1);
int main(){
//.... acess with .
}