Custom collision function with mesh class not outputting vector

Custom collision function with mesh class not outputting vector - c++

I am working on a custom 3D renderer engine based on Javidx9 3D engine tutorial. I'm trying to get some collision detection with rays and segments and it's not properly working.
I've tried implementing different methods trying to make it work the way I want it to but either the function is not working or its the way im using the function after it has been implemented. The current algorithm I am using is based on "Real-Time Collision Detection", by Christer Ericson, Pg 191. However, when the program is run I am getting no triangles intercepted triangles (the return vector is always zero). Is there something wrong with my code?
Note: I have some predefined functions that are self-explanatory based on the name.
#include <iostream>
#include <fstream>
#include <strstream>
#include <algorithm>
#include <string>
#include <vector>
#define SMALL_NUM 0.00000001 // anything that avoids division overflow
using namespace std;
// Created a 2D structure to hold texture coordinates
struct vec2d
{
float u = 0;
float v = 0;
float w = 1;
};
struct vec3d
{
float x = 0;
float y = 0;
float z = 0;
float w = 1; // Need a 4th term to perform sensible matrix vector multiplication
bool operator==(vec3d a) const
{
if (a.x == x && a.y == y && a.z == z && a.w == w)
return true;
else
return false;
}
vec3d operator+(const vec3d& a) const
{
return vec3d{ a.x + x, a.y + y, a.z + z, w };
}
vec3d operator-(const vec3d& a) const
{
return vec3d{ a.x - x, a.y - y, a.z - z, w };
}
vec3d operator*(const vec3d& a) const
{
return vec3d{ a.x * x, a.y * y, a.z * z, w };
}
vec3d operator/(const vec3d& a) const
{
return vec3d{ a.x / x, a.y / y, a.z / z, w };
}
vec3d operator+(const float& a) const
{
return vec3d{ a + x, a + y, a + z, w };
}
vec3d operator-(const float& a) const
{
return vec3d{ a - x, a - y, a - z, w };
}
vec3d operator*(const float& a) const
{
return vec3d{ a * x, a * y, a * z, w };
}
vec3d operator/(const float& a) const
{
return vec3d{ a / x, a / y, a / z, w };
}
};
struct triangle
{
vec3d p[3];
vec2d t[3]; // added a texture coord per vertex
int triid = 0;
wchar_t sym;
short col;
bool calculated = false;
};
struct mesh
{
vector<triangle> tris;
};
struct collision
{
bool plane = false;
triangle tris;
vec3d points;
};
float Vector_DotProduct(vec3d v1, vec3d v2)
{
return v1.x * v2.x + v1.y * v2.y + v1.z * v2.z;
}
float Vector_Length(vec3d v)
{
return sqrtf(Vector_DotProduct(v, v));
}
vec3d Vector_Normalise(vec3d v)
{
float l = Vector_Length(v);
return { v.x / l, v.y / l, v.z / l };
}
vec3d Vector_CrossProduct(vec3d v1, vec3d v2)
{
vec3d v;
v.x = v1.y * v2.z - v1.z * v2.y;
v.y = v1.z * v2.x - v1.x * v2.z;
v.z = v1.x * v2.y - v1.y * v2.x;
return v;
}
vector<collision> CollisionSegment(vec3d ray, vec3d rayend, mesh* m)
{
vector<collision> ret; // The return varaible
vec3d raydir = Vector_Normalise(rayend - ray); // The ray direction
for (auto tri : m->tris)
{
vec3d ab = tri.p[1] - tri.p[0];
vec3d ac = tri.p[2] - tri.p[0];
vec3d qp = rayend - ray;
vec3d n = Vector_CrossProduct(ab, ac);
float d = Vector_DotProduct(qp, n);
if (d <= SMALL_NUM) continue;
vec3d ap = ray - tri.p[0];
float t = Vector_DotProduct(ap, n);
if (t < SMALL_NUM) continue;
if (t < d) continue;
vec3d e = Vector_CrossProduct(qp, ap);
float v = Vector_DotProduct(ac, e);
if (v < SMALL_NUM || v > d) continue;
float w = -Vector_DotProduct(ab, e);
if (w < SMALL_NUM || v + w > d) continue;
float ood = 1.0f / d;
t *= ood;
v *= ood;
w *= ood;
float u = 1.0f - v - w;
ret.push_back(collision{ false, tri, vec3d() });
}
return ret;
}
int main()
{
mesh meshCube;
meshCube.tris = {
// SOUTH
{ 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f,},
{ 0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f,},
// EAST
{ 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f,},
{ 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f,},
// NORTH
{ 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f,},
{ 1.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f,},
// WEST
{ 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f,},
{ 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f,},
// TOP
{ 0.0f, 1.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f,},
{ 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f,},
// BOTTOM
{ 1.0f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 1.0f,},
{ 1.0f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f,},
};
vec3d p1 = { -1.5f, -1.5f, -1.5f };
vec3d p2 = { 1.5f, 1.5f, 1.5f };
vector<collision> hits = CollisionSegment(p1, p2, &meshCube);
return 0;
}
What I want is whenever I look at a mesh I want a vector of the triangles that come between me and my camera segment.

Related

How to use keyboard and mouse input to navigate a figure

I'm working on a project involving viewing a 3D from different viewpoints using mouse and keyboard input. When I submitted my first draft, I received the following feedback:
"Your object did not react to any of the buttons I pressed to change the camera view! The object of this project is to have the user control the camera by being able to change different views but your object didn't give me that ability!"
I currently have it coded to zoom in on the object when pressing the up key and out when pressing the down key. The camera view is supposed to move up and down when moving the mouse.
I've tried using some previous code that involved the cameraPosition variable, but it does not function properly when utilized in the pressSpecialKey function or in the rendering function.
/*Header Inclusions*/
#include <iostream>
#include <GL/glew.h>
#include <GL/freeglut.h>
//GLM Math Header Inclusions
#include <glm/glm.hpp>
#include <glm/gtc/matrix_transform.hpp>
#include <glm/gtc/type_ptr.hpp>
//SOIL image loader Inclusion
#include "SOIL2/SOIL2.h"
using namespace std; //Standard namespace
#define WINDOW_TITLE "Final Project: Spoon" //Window title Macro
/*Shader program Macro*/
#ifndef GLSL
#define GLSL(Version, Source) "#version " #Version "\n" #Source
#endif
//Global variable declarations
int view_state = 1;
/*Variable declarations for shader, window size initialization, buffer and array objects*/
GLint spoonShaderProgram, lampShaderProgram, WindowWidth = 800, WindowHeight = 600;
GLuint VBO, SpoonVAO, LightVAO, texture;
GLfloat cameraSpeed = 0.0005f; //Movement speed per frame
//TODO: Remove unnessary code
GLchar currentKey; //Will store key pressed
GLfloat lastMouseX = 400, lastMouseY = 300; //Locks mouse cursor at the center of the screen
GLfloat mouseXOffset, mouseYOffset, yaw = 0.0f, pitch = 0.0f; //mouse offset, yaw, and pitch variables
GLfloat sensitivity = 0.5f; //Used for mouse / camera rotation sensitivity
bool mouseDetected = true; //Initially true when mouse movement is detected
//Global vector declarations
glm::vec3 cameraPosition = glm::vec3(-2.0f, 1.0f, 2.0f); //Initial camera position.
glm::vec3 CameraUpY = glm::vec3(0.0f, 1.0f, 0.0f); //Temporary y unit vector
glm::vec3 CameraForwardZ = glm::vec3(0.0f, 0.0f, -1.0f); //Temporary z unit vector
glm::vec3 front; //Temporary z unit vector for mouse
//Subject position and scale
glm::vec3 spoonPosition(0.0f, 0.0f, 0.0f);
glm::vec3 spoonScale(2.0f);
//spoon and light color
glm::vec3 objectColor(1.0f, 1.0f, 1.0f);
glm::vec3 lightColor(1.0f, 1.0f, 1.0f);
//Light position and scale
glm::vec3 lightPosition(0.5f, 0.5f, 3.0f);
glm::vec3 lightScale(0.3f);
/*Function prototypes*/
void UResizeWindow(int, int);
void URenderGraphics(void);
void UCreateShader(void);
void UCreateBuffers(void);
void pressSpecialKey(int key, int xx, int yy);
void UMouseMove(int x, int y);
void UGenerateTexture(void);
/*Spoon Vertex Shader Course Code*/
const GLchar * spoonVertexShaderSource = GLSL(330,
layout (location = 0) in vec3 position; //Vertex data from Vertex Attrib Pointer 0
layout (location = 1) in vec3 normal; //VAP for normals from Vertex Attrib Pointer 1
layout (location = 2) in vec2 textureCoordinate; //Texture vertex data from Vertex Attrib Pointer 2
out vec3 FragmentPos; //For outgoing color / pixels to fragment shader
out vec3 Normal; //For outgoing normals to fragment shader
out vec2 mobileTextureCoordinate;
//Global variables for the transform matrices
uniform mat4 model;
uniform mat4 view;
uniform mat4 projection;
void main(){
gl_Position = projection * view * model * vec4(position, 1.0f); //transforms vertices to clip coordinates
FragmentPos = vec3(model * vec4(position, 1.0f)); //Gets fragment / pixel position in world space only (exclude view and projection)
Normal = mat3(transpose(inverse(model))) * normal; //get normal vectors in world space only and exclude normal translation properties
mobileTextureCoordinate = vec2(textureCoordinate.x, 1 - textureCoordinate.y); //flips the texture horizontal
}
);
/*Spoon Fragment Shader Source Code*/
const GLchar * spoonFragmentShaderSource = GLSL(330,
in vec3 FragmentPos; //For incoming fragment position
in vec3 Normal; //For incoming normals
in vec2 mobileTextureCoordinate;
out vec4 spoonColor; //For outgoing spoon color to the GPU
//Uniform / Global variables for object color, light color, light position, and camera/view position
uniform vec3 lightColor;
uniform vec3 lightPos;
uniform vec3 viewPosition;
uniform sampler2D uTexture; //Useful when working with multiple textures
void main(){
/*Phong lighting model calculations to generate ambient, diffuse, and specular components*/
//Calculate Ambient Lighting
float ambientStrength = 0.1f; //Set ambient or global lighting strength
vec3 ambient = ambientStrength * lightColor; //Generate ambient light color
//Calculate Diffuse Lighting
vec3 norm = normalize(Normal); //Normalize vectors to 1 unit
vec3 lightDirection = normalize(lightPos - FragmentPos); //Calculate distance (light direction) between light source and fragments/pixels on
float impact = max(dot(norm, lightDirection), 0.0); //Calculate diffuse impact by generating dot product of normal and light
vec3 diffuse = impact * lightColor; //Generate diffuse light color
//Calculate Specular lighting
float specularIntensity = 1.6f; //Set specular light strength
float highlightSize = 128.0f; //Set specular highlight size
vec3 viewDir = normalize(viewPosition - FragmentPos); //Calculate view direction
vec3 reflectDir = reflect(-lightDirection, norm); //Calculate reflection vector
//Calculate specular component
float specularComponent = pow(max(dot(viewDir, reflectDir), 0.0), highlightSize);
vec3 specular = specularIntensity * specularComponent * lightColor;
//Calculate phong result
vec3 objectColor = texture(uTexture, mobileTextureCoordinate).xyz;
vec3 phong = (ambient + diffuse) * objectColor + specular;
spoonColor = vec4(phong, 1.0f); //Send lighting results to GPU
}
);
/*Lamp Shader Source Code*/
const GLchar * lampVertexShaderSource = GLSL(330,
layout (location = 0) in vec3 position; //VAP position 0 for vertex position data
//Uniform / Global variables for the transform matrices
uniform mat4 model;
uniform mat4 view;
uniform mat4 projection;
void main()
{
gl_Position = projection * view *model * vec4(position, 1.0f); //Transforms vertices into clip coordinates
}
);
/*Lamp Fragment Shader Source Code*/
const GLchar * lampFragmentShaderSource = GLSL(330,
out vec4 color; //For outgoing lamp color (smaller spoon) to the GPU
void main()
{
color = vec4(1.0f); //Set color to white (1.0f, 1.0f, 1.0f) with alpha 1.0
}
);
/*Main Program*/
int main(int argc, char* argv[])
{
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_DEPTH | GLUT_DOUBLE | GLUT_RGBA);
glutInitWindowSize(WindowWidth, WindowHeight);
glutCreateWindow(WINDOW_TITLE);
glutReshapeFunc(UResizeWindow);
glewExperimental = GL_TRUE;
if (glewInit() != GLEW_OK)
{
std::cout << "Failed to initialize GLEW" << std::endl;
return -1;
}
UCreateShader();
UCreateBuffers();
UGenerateTexture();
glClearColor(0.8f, 0.8f, 0.8f, 1.0f); //Set background color
glutDisplayFunc(URenderGraphics);
glutSpecialFunc(pressSpecialKey); //Detects key press
glutPassiveMotionFunc(UMouseMove);
glutMainLoop();
//Destroys Buffer objects once used
glDeleteVertexArrays(1, &SpoonVAO);
glDeleteVertexArrays(1, &LightVAO);
glDeleteBuffers(1, &VBO);
return 0;
}
/*Resizes the window*/
void UResizeWindow(int w, int h)
{
WindowWidth = w;
WindowHeight = h;
glViewport(0, 0, WindowWidth, WindowHeight);
}
/*Renders graphics*/
void URenderGraphics(void)
{
glEnable(GL_DEPTH_TEST); //Enable z-depth
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); //Clears the screen
GLint uTextureLoc, lightColorLoc, lightPositionLoc, viewPositionLoc;
/*********Use the Spoon Shader to activate the Spoon Vertex Array Object for rendering and transforming*********/
glUseProgram(spoonShaderProgram);
glBindVertexArray(SpoonVAO);
CameraForwardZ = front; //Replaces camera forward vector with Radians normalized as a unit vector
//Transforms the object
glm::mat4 model;
model = glm::translate(model, glm::vec3(0.0f, 0.0f, 0.0f)); //Place the object at the center of the viewport
model = glm::rotate(model, 45.0f, glm:: vec3(0.0, 1.0f, 0.0f)); //Rotate the object 45 degrees on the X
model = glm::scale(model, glm::vec3(2.0f, 2.0f, 2.0f)); //Increase the object size by a scale of 2
//Transform the camera
glm::mat4 view;
view = glm::lookAt(cameraPosition - CameraForwardZ, cameraPosition, CameraUpY);
//Creates a perspective projection
glm::mat4 projection;
if(view_state == 1){
projection = glm::perspective(45.0f, (GLfloat)WindowWidth / (GLfloat)WindowHeight, 0.1f, 100.0f);
}else if(view_state == 0){
projection = glm::ortho(-5.0f, 5.0f, -5.0f, 5.0f, 0.1f, 100.0f);
}
//Reference matrix uniforms from the spoon Shader program
GLint modelLoc = glGetUniformLocation(spoonShaderProgram, "model");
GLint viewLoc = glGetUniformLocation(spoonShaderProgram, "view");
GLint projLoc = glGetUniformLocation(spoonShaderProgram, "projection");
//Pass matrix data to the spoon Shader program's matrix uniforms
glUniformMatrix4fv(modelLoc, 1, GL_FALSE, glm::value_ptr(model));
glUniformMatrix4fv(viewLoc, 1, GL_FALSE, glm::value_ptr(view));
glUniformMatrix4fv(projLoc, 1, GL_FALSE, glm::value_ptr(projection));
//Reference matrix uniforms from the spoon Shader program for the spoon color, light color, light position, and camera position
uTextureLoc = glGetUniformLocation(spoonShaderProgram, "uTexture");
lightColorLoc = glGetUniformLocation(spoonShaderProgram, "lightColor");
lightPositionLoc = glGetUniformLocation(spoonShaderProgram, "lightPos");
viewPositionLoc = glGetUniformLocation(spoonShaderProgram, "viewPosition");
//Pass color, light, and camera data to the spoon Shader programs corresponding uniforms
glUniform1i(uTextureLoc, 0); // texture unit 0
glUniform3f(lightColorLoc, lightColor.r, lightColor.g, lightColor.b);
glUniform3f(lightPositionLoc, lightPosition.x, lightPosition.y, lightPosition.z);
glUniform3f(viewPositionLoc, cameraPosition.x, cameraPosition.y, cameraPosition.z);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, texture);
glDrawArrays(GL_TRIANGLES, 0, 126); //Draw the primitives / spoon
glBindVertexArray(0); //Deactivate the spoon Vertex Array Object
/***************Use the Lamp Shader and activate the Lamp Vertex Array Object for rendering and transforming ************/
glUseProgram(lampShaderProgram);
glBindVertexArray(LightVAO);
//Transform the smaller spoon used as a visual cue for the light source
model = glm::translate(model, lightPosition);
model = glm::scale(model, lightScale);
//Reference matrix uniforms from the Lamp Shader program
modelLoc = glGetUniformLocation(lampShaderProgram, "model");
viewLoc = glGetUniformLocation(lampShaderProgram, "view");
projLoc = glGetUniformLocation(lampShaderProgram, "projection");
//Pass matrix uniforms from the Lamp Shader Program
glUniformMatrix4fv(modelLoc, 1, GL_FALSE, glm::value_ptr(model));
glUniformMatrix4fv(viewLoc, 1, GL_FALSE, glm::value_ptr(view));
glUniformMatrix4fv(projLoc, 1, GL_FALSE, glm::value_ptr(projection));
//Draws the triangles
glDrawArrays(GL_TRIANGLES, 0, 126);
glBindVertexArray(0); //Deactivate the Vertex Array Object
glutPostRedisplay();
glutSwapBuffers(); //Flips the back buffer with the front buffer every frame. Similar to GL Flush
}
/*Creates the Shader program*/
void UCreateShader()
{
//Spoon Vertex shader
GLint spoonVertexShader = glCreateShader(GL_VERTEX_SHADER); //Create the Vertex shader
glShaderSource(spoonVertexShader, 1, &spoonVertexShaderSource, NULL); //Attaches the vertex shader to the source code
glCompileShader(spoonVertexShader); //Compiles the Vertex shader
//Spoon Fragment shader
GLint spoonFragmentShader = glCreateShader(GL_FRAGMENT_SHADER); //Create the Fragment shader
glShaderSource(spoonFragmentShader, 1, &spoonFragmentShaderSource, NULL); //Attaches the Fragment shader to the source code
glCompileShader(spoonFragmentShader); //Compiles the Fragment shader
//Spoon Shader program
spoonShaderProgram = glCreateProgram(); //Creates the Shader program and returns an id
glAttachShader(spoonShaderProgram, spoonVertexShader); //Attach Vertex shader to the Shader program
glAttachShader(spoonShaderProgram, spoonFragmentShader); //Attach Fragment shader to the Shader program
glLinkProgram(spoonShaderProgram); //Link Vertex and Fragment shaders to Shader program
//Delete the Vertex and Fragment shaders once linked
glDeleteShader(spoonVertexShader);
glDeleteShader(spoonFragmentShader);
//Lamp Vertex shader
GLint lampVertexShader = glCreateShader(GL_VERTEX_SHADER); //Creates the Vertex shader
glShaderSource(lampVertexShader, 1, &lampVertexShaderSource, NULL); //Attaches the Vertex shader to the source code
glCompileShader(lampVertexShader); //Compiles the Vertex shader
//Lamp Fragment shader
GLint lampFragmentShader = glCreateShader(GL_FRAGMENT_SHADER); //Creates the Fragment shader
glShaderSource(lampFragmentShader, 1, &lampFragmentShaderSource, NULL); //Attaches the Fragment shader to the source code
glCompileShader(lampFragmentShader); //Compiles the Fragment shader
//Lamp Shader Program
lampShaderProgram = glCreateProgram(); //Creates the Shader program and returns an id
glAttachShader(lampShaderProgram, lampVertexShader); //Attach Vertex shader to the Shader program
glAttachShader(lampShaderProgram, lampFragmentShader); //Attach Fragment shader to the Shader program
glLinkProgram(lampShaderProgram); //Link Vertex and Fragment shaders to the Shader program
//Delete the lamp shaders once linked
glDeleteShader(lampVertexShader);
glDeleteShader(lampFragmentShader);
}
void UCreateBuffers()
{
GLfloat vertices[] = {
//Position //Normals //Texture //Point Name
//Front of Scoop //Positive Z
-0.4f, 0.05f, 0.1f, 0.0f, 0.0f, 1.0f, 0.3f, 1.0f, //Q
-0.4f, -0.1f, 0.1f, 0.0f, 0.0f, 1.0f, 0.3f, 0.0f, //R
-0.6f, 0.1f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, //U
-0.4f, 0.05f, 0.1f, 0.0f, 0.0f, 1.0f, 0.3f, 1.0f, //Q
-0.2f, 0.0f, 0.1f, 0.0f, 0.0f, 1.0f, 0.6f, 1.0f, //W
-0.4f, -0.1f, 0.1f, 0.0f, 0.0f, 1.0f, 0.3f, 0.0f, //R
-0.4f, -0.1f, 0.1f, 0.0f, 0.0f, 1.0f, 0.3f, 0.0f, //R
-0.2f, -0.1f, 0.1f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, //A
-0.2f, 0.0f, 0.1f, 0.0f, 0.0f, 1.0f, 0.6f, 1.0f, //W
-0.2f, 0.0f, 0.1f, 0.0f, 0.0f, 1.0f, 0.6f, 1.0f, //W
-0.2f, -0.1f, 0.1f, 0.0f, 0.0f, 1.0f, 0.6f, 0.0f, //A_1
0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, //A
0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, //A
-0.2f, -0.1f, 0.1f, 0.0f, 0.0f, 1.0f, 0.6f, 1.0f, //A_1
0.0f, -0.05f, 0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, //B
//Bottom of Scoop Slant //Negative X
-0.6f, 0.1f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, 0.6f, //U
-0.6f, 0.1f, -0.1f, -1.0f, 0.0f, 0.0f, 0.0f, 0.3f, //V
-0.4f, -0.1f, 0.1f, -1.0f, 0.0f, 0.0f, 0.3f, 1.0f, //R
-0.4f, -0.1f, 0.1f, -1.0f, 0.0f, 0.0f, 0.3f, 1.0f, //R
-0.4f, -0.1f, -0.2f, -1.0f, 0.0f, 0.0f, 0.3f, 0.0f, //T
-0.6f, 0.1f, -0.1f, -1.0f, 0.0f, 0.0f, 0.0f, 0.3f, //V
//Bottom of Scoop //Negative Y
-0.4f, -0.1f, -0.2f, 0.0f, -1.0f, 0.0f, 0.3f, 0.0f, //T
-0.4f, -0.1f, 0.1f, 0.0f, -1.0f, 0.0f, 0.3f, 1.0f, //R
-0.2f, -0.1f, -0.2f, 0.0f, -1.0f, 0.0f, 0.6f, 0.0f, //B_1
-0.2f, -0.1f, -0.2f, 0.0f, -1.0f, 0.0f, 0.6f, 0.0f, //B_1
-0.4f, -0.1f, 0.1f, 0.0f, -1.0f, 0.0f, 0.3f, 1.0f, //R
-0.2f, -0.1f, 0.1f, 0.0f, -1.0f, 0.0f, 0.6f, 1.0f, //A_1
-0.2f, -0.1f, 0.1f, 0.0f, -1.0f, 0.0f, 0.6f, 1.0f, //A_1
-0.2f, -0.1f, -0.2f, 0.0f, -1.0f, 0.0f, 0.3f, 0.0f, //B_1
0.0f, -0.05f, 0.0f, 0.0f, -1.0f, 0.0f, 1.0f, 0.6f, //B
-0.2f, -0.1f, -0.2f, 0.0f, -1.0f, 0.0f, 0.6f, 0.0f, //B_1
0.0f, -0.05f, 0.0f, 0.0f, -1.0f, 0.0f, 1.0f, 0.6f, //B
0.0f, -0.05f, -0.1f, 0.0f, -1.0f, 0.0f, 1.0f, 0.3f, //D
//Back of Scoop //Negative Z
-0.6f, 0.1f, -0.1f, 0.0f, 0.0f, -1.0f, 0.0f, 1.0f, //V
-0.4f, 0.05f, -0.2f, 0.0f, 0.0f, -1.0f, 0.3f, 1.0f, //S
-0.4f, -0.1f, -0.2f, 0.0f, 0.0f, -1.0f, 0.3f, 0.0f, //T
-0.4f, 0.05f, -0.2f, 0.0f, 0.0f, -1.0f, 0.3f, 1.0f, //S
-0.4f, -0.1f, -0.2f, 0.0f, 0.0f, -1.0f, 0.3f, 0.0f, //T
-0.2f, -0.1f, -0.2f, 0.0f, 0.0f, -1.0f, 0.6f, 0.0f, //B_1
-0.4f, 0.05f, -0.2f, 0.0f, 0.0f, -1.0f, 0.3f, 1.0f, //S
-0.2f, -0.1f, -0.2f, 0.0f, 0.0f, -1.0f, 0.6f, 0.0f, //B_1
-0.2f, 0.0f, -0.2f, 0.0f, 0.0f, -1.0f, 0.6f, 1.0f, //Z
-0.2f, 0.0f, -0.2f, 0.0f, 0.0f, -1.0f, 0.6f, 1.0f, //Z
-0.2f, -0.1f, -0.2f, 0.0f, 0.0f, -1.0f, 0.6f, 0.0f, //B_1
0.0f, 0.0f, -0.1f, 0.0f, 0.0f, -1.0f, 1.0f, 1.0f, //C
0.0f, 0.0f, -0.1f, 0.0f, 0.0f, -1.0f, 1.0f, 1.0f, //C
-0.2f, -0.1f, -0.2f, 0.0f, 0.0f, -1.0f, 0.6f, 0.0f, //B_1
0.0f, -0.05f, -0.1f, 0.0f, 0.0f, -1.0f, 1.0f, 0.0f, //D
//Top of Scoop //Positive Y
-0.6f, 0.1f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.3f, //U
-0.6f, 0.1f, -0.1f, 0.0f, 1.0f, 0.0f, 0.0f, 0.6f, //V
-0.4f, 0.05f, -0.2f, 0.0f, 1.0f, 0.0f, 0.3f, 1.0f, //S
-0.6f, 0.1f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 0.3f, //U
-0.4f, 0.05f, -0.2f, 0.0f, 1.0f, 0.0f, 0.3f, 1.0f, //S
-0.4f, 0.05f, 0.1f, 0.0f, 1.0f, 0.0f, 0.3f, 0.0f, //Q
-0.4f, 0.05f, -0.2f, 0.0f, 1.0f, 0.0f, 0.3f, 1.0f, //S
-0.4f, 0.05f, 0.1f, 0.0f, 1.0f, 0.0f, 0.3f, 0.0f, //Q
-0.2f, 0.0f, -0.2f, 0.0f, 1.0f, 0.0f, 0.6f, 1.0f, //Z
-0.4f, 0.05f, 0.1f, 0.0f, 1.0f, 0.0f, 0.3f, 0.0f, //Q
-0.2f, 0.0f, -0.2f, 0.0f, 1.0f, 0.0f, 0.6f, 1.0f, //Z
-0.2f, 0.0f, 0.1f, 0.0f, 1.0f, 0.0f, 0.6f, 0.0f, //W
-0.2f, 0.0f, 0.1f, 0.0f, 1.0f, 0.0f, 0.6f, 0.0f, //W
-0.2f, 0.0f, -0.2f, 0.0f, 1.0f, 0.0f, 0.6f, 1.0f, //Z
0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 0.3f, //A
-0.2f, 0.0f, -0.2f, 0.0f, 1.0f, 0.0f, 0.6f, 1.0f, //Z
0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 0.3f, //A
0.0f, 0.0f, -0.1f, 0.0f, 1.0f, 0.0f, 1.0f, 0.6f, //C
//Front of Handle //Positive Z
0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.1f, //A
0.0f, -0.05f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, //B
0.6f, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, //E
0.6f, 0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, //E
0.0f, -0.05f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, //B
0.6f, -0.1f, 0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, //F
//Bottom of Handle //Negative Y
0.0f, -0.05f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 1.0f, //B
0.0f, -0.05f, -0.1f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, //D
0.6f, -0.1f, 0.0f, 0.0f, -1.0f, 0.0f, 1.0f, 1.0f, //F
0.0f, -0.05f, -0.1f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, //D
0.6f, -0.1f, 0.0f, 0.0f, -1.0f, 0.0f, 1.0f, 1.0f, //F
0.6f, -0.1f, -0.1f, 0.0f, -1.0f, 0.0f, 1.0f, 0.0f, //H
//Back of Handle //Negative Z
0.0f, 0.0f, -0.1f, 0.0f, 0.0f, -1.0f, 0.0f, 1.0f, //C
0.0f, -0.05f, -0.1f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, //D
0.6f, 0.0f, -0.1f, 0.0f, 0.0f, -1.0f, 1.0f, 1.0f, //G
0.0f, -0.05f, -0.1f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, //D
0.6f, 0.0f, -0.1f, 0.0f, 0.0f, -1.0f, 1.0f, 1.0f, //G
0.6f, -0.1f, -0.1f, 0.0f, 0.0f, -1.0f, 1.0f, 0.0f, //H
//Top of Handle //Positive Y
0.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, 1.0f, //A
0.0f, 0.0f, -0.1f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f, //C
0.6f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, //E
0.0f, 0.0f, -0.1f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f, //C
0.6f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, //E
0.6f, 0.0f, -0.1f, 0.0f, 1.0f, 0.0f, 1.0f, 0.0f, //G
//Grip Connection //Negative X
0.6f, 0.0f, 0.1f, -1.0f, 0.0f, 0.0f, 1.0f, 1.0f, //I
0.6f, 0.0f, -0.2f, -1.0f, 0.0f, 0.0f, 0.0f, 1.0f, //J
0.6f, -0.1f, 0.1f, -1.0f, 0.0f, 0.0f, 1.0f, 0.0f, //K
0.6f, 0.0f, -0.2f, -1.0f, 0.0f, 0.0f, 0.0f, 1.0f, //J
0.6f, -0.1f, 0.1f, -1.0f, 0.0f, 0.0f, 1.0f, 0.0f, //K
0.6f, -0.1f, -0.2f, -1.0f, 0.0f, 0.0f, 0.0f, 0.0f, //L
//Front to Grip //Positive Z
0.6f, 0.0f, 0.1f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, //I
1.0f, 0.0f, 0.05f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, //M
0.6f, -0.1f, 0.1f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, //K
1.0f, 0.0f, 0.05f, 0.0f, 0.0f, 1.0f, 1.0f, 0.0f, //M
0.6f, -0.1f, 0.1f, 0.0f, 0.0f, 1.0f, 0.0f, 0.0f, //K
1.0f, -0.1f, 0.05f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, //N
//Bottom to Grip //Negative Y
0.6f, -0.1f, 0.1f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f, //K
1.0f, -0.1f, 0.05f, 0.0f, -1.0f, 0.0f, 1.0f, 0.0f, //N
0.6f, -0.1f, -0.2f, 0.0f, -1.0f, 0.0f, 0.0f, 1.0f, //L
1.0f, -0.1f, 0.05f, 0.0f, -1.0f, 0.0f, 1.0f, 0.0f, //N
0.6f, -0.1f, -0.2f, 0.0f, -1.0f, 0.0f, 0.0f, 1.0f, //L
1.0f, -0.1f, -0.15f, 0.0f, -1.0f, 0.0f, 1.0f, 1.0f, //P
//Back to Grip //Negative Z
0.6f, 0.0f, -0.2f, 0.0f, 0.0f, -1.0f, 0.0f, 1.0f, //J
0.6f, -0.1f, -0.2f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, //L
1.0f, 0.0f, -0.15f, 0.0f, 0.0f, -1.0f, 1.0f, 1.0f, //O
0.6f, -0.1f, -0.2f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, //L
1.0f, 0.0f, -0.15f, 0.0f, 0.0f, -1.0f, 1.0f, 1.0f, //O
1.0f, -0.1f, -0.15f, 0.0f, 0.0f, -1.0f, 1.0f, 0.0f, //P
//Top to Grip //Positive Y
1.0f, 0.0f, -0.15f, 0.0f, 1.0f, 0.0f, 1.0f, 0.0f, //O
1.0f, 0.0f, 0.05f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, //M
0.6f, 0.0f, -0.2f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f, //J
1.0f, 0.0f, 0.05f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0, //M
0.6f, 0.0f, -0.2f, 0.0f, 1.0f, 0.0f, 0.0f, 0.0f, //J
0.6f, 0.0f, 0.1f, 0.0f, 1.0f, 0.0f, 0.0f, 1.0f, //I
//Base of Grip //Positive X
1.0f, 0.0f, 0.05f, 1.0f, 0.0f, 0.0f, 0.0f, 1.0f, //M
1.0f, -0.1f, 0.05f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, //N
1.0f, 0.0f, -0.15f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, //O
1.0f, -0.1f, 0.05f, 1.0f, 0.0f, 0.0f, 0.0f, 0.0f, //N
1.0f, 0.0f, -0.15f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, //O
1.0f, -0.1f, -0.15f, 1.0f, 0.0f, 0.0f, 1.0f, 0.0f //P
};
//Generate buffer ids
glGenVertexArrays(1, &SpoonVAO);
glGenBuffers(1, &VBO);
//Activate the Vertex Array Object before binding and setting any VBOs and Vertex Attribute Pointers.
glBindVertexArray(SpoonVAO);
//Activate the VBO
glBindBuffer(GL_ARRAY_BUFFER, VBO);
glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW); //Copy vertices to VBO
//Set attribute pointer 0 to hold position data
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 8 * sizeof(GLfloat), (GLvoid*)0);
glEnableVertexAttribArray(0); //Enables vertex attribute
//Set attribute pointer 1 to hold Normal data
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 8 * sizeof(GLfloat), (GLvoid*)(3 * sizeof(GLfloat)));
glEnableVertexAttribArray(1); //Enables vertex attribute
//Set attribute pointer 2 to hold Texture coordinate data
glVertexAttribPointer(2, 2, GL_FLOAT, GL_FALSE, 8 * sizeof(GLfloat), (GLvoid*)(6 * sizeof(GLfloat)));
glEnableVertexAttribArray(2);
glBindVertexArray(0); //Deactivate the Spoon VAO which is good practice
}
void pressSpecialKey(int key, int xx, int yy)
{
switch(key){
//Zoom object in
case GLUT_KEY_UP:
front.x += 0.1f;
front.y += 0.1f;
front.z += 0.1f;
break;
//Zoom object out
case GLUT_KEY_DOWN:
front.x -= 0.1f;
front.y -= 0.1f;
front.z -= 0.1f;
break;
//Change view to orthogonal state
case GLUT_KEY_LEFT:
view_state = 0;
break;
//Change view to perspective state
case GLUT_KEY_RIGHT:
view_state = 1;
break;
}
}
/*Implements the UMouseMove function*/
void UMouseMove(int x, int y)
{
//Immediately replaces center locked coordinated with new mouse coordinates
if(mouseDetected)
{
lastMouseX = x;
lastMouseY = y;
mouseDetected = false;
}
//Gets the direction the mouse was moved in x and y
mouseXOffset = x - lastMouseX;
mouseYOffset = lastMouseY - y; //Inverted Y
//Updates with new mouse coordinates
lastMouseX = x;
lastMouseY = y;
//Applies sensitivity to mouse direction
mouseXOffset *= sensitivity;
mouseYOffset *= sensitivity;
//Accumulates the yaw and pitch variables
yaw += mouseXOffset;
pitch += mouseYOffset;
//Maintains a 90 degree pitch for gimbal lock
if(pitch > 89.0f)
pitch = 89.0f;
if(pitch < -89.0f)
pitch = -89.0f;
//Converts mouse coordinates / degrees into Radians, then to vectors
front.x = cos(glm::radians(pitch)) * cos(glm::radians(yaw));
front.y = sin(glm::radians(pitch));
front.z = cos(glm::radians(pitch)) * sin(glm::radians(yaw));
}
/*Generate and load the texture*/
void UGenerateTexture(){
glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_2D, texture);
int width, height;
unsigned char* image = SOIL_load_image("spoon.jpg", &width, &height, 0, SOIL_LOAD_RGB); //Loads texture file
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, width, height, 0, GL_RGB, GL_UNSIGNED_BYTE, image);
glGenerateMipmap(GL_TEXTURE_2D);
SOIL_free_image_data(image);
glBindTexture(GL_TEXTURE_2D, 0); //Unbind the texture
}
Expected: Spoon in center of the screen, mousemovement changes the camera view (horizontally and vertically), up arrow causes camera to zoom in, and down arrow causes camera to zoom out.
Actual: Spoon not in center. Mousemovement causes the object to move (horizontally and vertically). Arrows not detected (?).

Zooming at perspective projection can be achieved by shifting the the camera position along the line of sight:
void pressSpecialKey(int key, int xx, int yy)
{
switch(key){
case GLUT_KEY_UP: cameraPosition += front * 0.1f; break;
case GLUT_KEY_DOWN: cameraPosition -= front * 0.1f; break;
// [...]
}
or by changing the field of view angle:
float fov_angle = 45.0f;
projection = glm::perspective(glm::radians(fov_angle),
(GLfloat)WindowWidth / (GLfloat)WindowHeight, 0.1f, 100.0f);
void pressSpecialKey(int key, int xx, int yy)
{
switch(key){
case GLUT_KEY_UP: fov_angle -= 0.1f; break;
case GLUT_KEY_DOWN: fov_angle += 0.1f; break;
// [...]
}
If you want to keep the spoon in the center of the view ant to orbit around the spoon, then you've to change the camera position according to the viewing direction:
void UMouseMove(int x, int y)
{
// [...]
cameraPosition = - front * glm::length( cameraPosition );
}
The matrices of the OpenGL Mathematics (GLM) have to be initialized. An identity matrix can be initialized by the single parameter 1.0:
e.g.
glm::mat4 model(1.0f);
The angles which are passed to the OpenGL Mathematics (GLM) library functions have to be set in radians rather than degrees. (In glm version 0.9.4 or less this was different).
glm::perspective():
LM_FUNC_DECL tmat4x4<T, defaultp> glm::perspective(T fovy, T aspect, T near, T far)
Creates a matrix for a symetric perspective-view frustum based on the default handedness.
Parameters
fovy Specifies the field of view angle in the y direction. Expressed in radians.
glm::rotate()
GLM_FUNC_DECL mat<4, 4, T, Q> glm::rotate (mat< 4, 4, T, Q > const & m, T angle, vec<3, T, Q> const & axis)
Builds a rotation 4 * 4 matrix created from an axis vector and an angle.
Parameters
angle Rotation angle expressed in radians.
Initialize the matrices and use glm::radians() to convert from degree to radians:
//Transforms the object
glm::mat4 model(1.0f); // <--- init
model = glm::translate(model, glm::vec3(0.0f, 0.0f, 0.0f)); //Place the object at the center of the viewport
// model = glm::rotate(model, 45.0f, glm:: vec3(0.0, 1.0f, 0.0f));
model = glm::rotate(model, glm::radians(45.0f), glm:: vec3(0.0, 1.0f, 0.0f));
model = glm::scale(model, glm::vec3(2.0f, 2.0f, 2.0f)); //Increase the object size by a scale of 2
//Transform the camera
glm::mat4 view(1.0f); // <--- init
view = glm::lookAt(cameraPosition - CameraForwardZ, cameraPosition, CameraUpY);
//Creates a perspective projection
glm::mat4 projection(1.0f); // <--- init
if(view_state == 1){
// projection = glm::perspective(45.0f,
(GLfloat)WindowWidth / (GLfloat)WindowHeight, 0.1f, 100.0f);
projection = glm::perspective(glm::radians(45.0f),
(GLfloat)WindowWidth / (GLfloat)WindowHeight, 0.1f, 100.0f);
} else if(view_state == 0){
projection = glm::ortho(-5.0f, 5.0f, -5.0f, 5.0f, 0.1f, 100.0f);
}

Perspective Matrix not drawing object correctly

My perspective matrix is doing a weird thing where it makes my 3d pyramid look like a trapezoid.
This is the effect im getting
Angle 1
Angle 2
If I rotate the pyramid this effect is consistent so I don't think it has anything to do with my vertices. What is causing this effect to occur?
Animation Loop:
//enables depth testing
glEnable(GL_DEPTH_TEST);
//creates aspect ratio variable
float aspect = 500.0f / 500.0f;
//enables projection matrix
glm::mat4 pmat = glm::perspective(70.0f, aspect, 0.01f, 1000.0f);
//enables view matrix
glm::vec3 eye(0, -1, 2);
glm::vec3 center(0, 0, 0);
glm::vec3 up(0, 1, 0);
glm::mat4 vmat = glm::lookAt(eye, center, up);
while (!glfwWindowShouldClose(mWindow)) //animation loop
{
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glm::mat4 mvp = pmat * vmat * tmat;
glUniformMatrix4fv(uTransform, 1, GL_FALSE, glm::value_ptr(mvp));
for (int i = 0; i <= numberofshapes; i++)
{
glBindVertexArray(shapes[i].vao);
glDrawArrays(shapes[i].drawtype, 0, shapes[i].numOfvertices);
}
if (w == true) {
tmat = glm::translate(glm::mat4(1.0f), glm::vec3(0.0f, 0.001f, 0.0f)) * tmat;
}if (a == true) {
tmat = glm::translate(glm::mat4(1.0f), glm::vec3(-0.001f, 0.0f, 0.0f)) * tmat;
}if (s == true) {
tmat = glm::translate(glm::mat4(1.0f), glm::vec3(0.0f, -0.001f, 0.0f)) * tmat;
}if (d == true) {
tmat = glm::translate(glm::mat4(1.0f), glm::vec3(0.001f, 0.0f, 0.0f)) * tmat;
}if (r == true) {
tmat = glm::rotate(glm::mat4(1.0f), glm::radians(0.1f), glm::vec3(0.0f, 1.0f, 0.0f)) * tmat;
}
glfwSwapBuffers(mWindow);
counter += 1;
glfwPollEvents();
}
Vertex Arrays:
vertex vertices[] = {
mkVert(0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f),
mkVert(-0.5f, -0.5f, 0.5f, 1.0f, 1.0f, 1.0f),
mkVert(0.5f, -0.5f, 0.5f, 1.0f, 1.0f, 1.0f)
};
//SECOND OBJECT
vertex vertices2[] = {
mkVert(0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f),
mkVert(-0.5f, -0.5f, 0.5f, 1.0f, 1.0f, 1.0f),
mkVert(-0.5f, -0.5f, -0.5f, 1.0f, 1.0f, 1.0f)
};
//THIRD OBJECT
vertex vertices3[] = {
mkVert(0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f),
mkVert(0.5f, -0.5f, 0.5f, 1.0f, 1.0f, 1.0f),
mkVert(0.5f, -0.5f, -0.5f, 1.0f, 1.0f, 1.0f)
};
vertex vertices4[] = {
mkVert(0.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f),
mkVert(0.5f, -0.5f, -0.5f, 1.0f, 1.0f, 1.0f),
mkVert(-0.5f, -0.5f, -0.5f, 1.0f, 1.0f, 1.0f)
};
Just fyi my window is 500x500, I dont know if that would cause this effect.

Your camera is pointing at your pyramid from below. I guess your brain expected to look at it from above!? Especially with just a wireframe, this can get confusing. So here's a GIF of a pyramid being looked at from below:

What is wrong with my matrix stack implementation (OpenGL ES 2.0)?

I am porting my OpenGL 1.1 application to OpenGL ES 2.0 and am writing a wrapper to implement the OpenGL 1.1 functions. My code seems to work fine until I start calling glPushMatrix() and glPopMatrix(). I think my understanding of how these should be implemented is incorrect.
Do I compute the final rotate/translate/scale before pushing it back on the stack? Should I keep only one modelview matrix (instead of separating it into three)? Are the transforms applied in the correct order?
Here is the code for my tranformation matrices
static std::vector<GLfloat> vertices;
static std::vector<std::vector<GLfloat>> rotationMatrixStack;
static std::vector<std::vector<GLfloat>> scalingMatrixStack;
static std::vector<GLfloat> rotationMatrix =
{
1.0f, 0.0f, 0.0f, 0.0f,
0.0f, 1.0f, 0.0f, 0.0f,
0.0f, 0.0f, 1.0f, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f
};
static std::vector<GLfloat> scalingMatrix =
{
1.0f, 0.0f, 0.0f, 0.0f,
0.0f, 1.0f, 0.0f, 0.0f,
0.0f, 0.0f, 1.0f, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f
};
static std::vector<GLfloat> translationMatrix =
{
1.0f, 0.0f, 0.0f, 0.0f,
0.0f, 1.0f, 0.0f, 0.0f,
0.0f, 0.0f, 1.0f, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f
};
static std::vector<GLfloat> orthographicMatrix =
{
.0025f, 0.0f, 0.0f, -1.0f,
0.0f, .0025f, 0.0f, -1.0f,
0.0f, 0.0f, 1.0f, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f
};
void glTranslatef (GLfloat x, GLfloat y, GLfloat z)
{
float translation[] =
{
1.0f, 0.0f, 0.0f, x,
0.0f, 1.0f, 0.0f, y,
0.0f, 0.0f, 1.0f, z,
0.0f, 0.0f, 0.0f, 1.0f
};
multiplyMatrix(translation , &translationMatrix[0], &translationMatrix[0]);
}
void glScalef (GLfloat x, GLfloat y, GLfloat z)
{
float scaling[] =
{
x, 0.0f, 0.0f, 0.0f,
0.0f, y, 0.0f, 0.0f,
0.0f, 0.0f, z, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f
};
multiplyMatrix(scaling , &scalingMatrix[0], &scalingMatrix[0]);
}
void glRotatef (GLfloat angle, GLfloat x, GLfloat y, GLfloat z)
{
glTranslatef(-x, -y, -z);
GLfloat radians = angle * M_PI/180;
float zRotation[] =
{
cos(radians), -sin(radians), 0.0f, 0.0f,
sin(radians), cos(radians), 0.0f, 0.0f,
0.0f, 0.0f, 1.0f, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f
};
multiplyMatrix(zRotation , &rotationMatrix[0], &rotationMatrix[0]);
glTranslatef(x,y,z);
}
void glLoadIdentity (void)
{
rotationMatrix, scalingMatrix, translationMatrix =
{
1.0f, 0.0f, 0.0f, 0.0f,
0.0f, 1.0f, 0.0f, 0.0f,
0.0f, 0.0f, 1.0f, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f
};
}
void multiplyMatrix(float* a, float* b, float* product)
{
int a_heigth = 4;
int a_width = 4;
int b_heigth = 4;
int b_width = 4;
int product_heigth = a_heigth;
int product_width = b_width;
float intermediateMatrix[product_heigth * product_width] = {0};
for (int product_row = 0; product_row < product_heigth; product_row++)
{
for (int product_column = 0; product_column < product_width; product_column++)
{
float value = 0;
//std::cout << "r[" << (product_row*product_width) + product_column << "] = ";
for (int multiplication_index = 0; multiplication_index < a_width ; multiplication_index++)
{
value += a[(product_row * a_width) + multiplication_index] * b[product_column + (b_heigth * multiplication_index)];
//std::cout << "( a[" << (product_row * a_width) + multiplication_index << "] * b[" << product_column + (b_heigth * multiplication_index) << "] ) + ";
}
//std::cout << std::endl;
intermediateMatrix[(product_row*product_width) + product_column] = value;
}
}
for (int i = 0; i < product_heigth * product_width; i++)
{
product[i] = intermediateMatrix[i];
}
}
Here is the code for the matrix stack
static std::vector<std::vector<GLfloat>> translationMatrixStack;
void glPushMatrix()
{
rotationMatrixStack.push_back(rotationMatrix);
scalingMatrixStack.push_back(scalingMatrix);
translationMatrixStack.push_back(translationMatrix);
}
void glPopMatrix()
{
rotationMatrix = rotationMatrixStack.back();
scalingMatrix = scalingMatrixStack.back();
translationMatrix = translationMatrixStack.back();
rotationMatrixStack.pop_back();
scalingMatrixStack.pop_back();
translationMatrix.pop_back();
}
And here is the vertex shader code
attribute highp vec4 myVertex;
uniform mediump mat4 orthographicMatrix;
uniform mediump mat4 translationMatrix;
uniform mediump mat4 scalingMatrix;
uniform mediump mat4 rotationMatrix;
void main(void)
{
gl_Position = orthographicMatrix * translationMatrix * scalingMatrix * rotationMatrix * ( myVertex) ;
}";

You do not have a separate matrix stack for rotation, translation and scaling. In OpenGL there is one matrix stack for each matrix mode (See glMatrixMode). The matrix modes are GL_MODELVIEW, GL_PROJECTION, and GL_TEXTURE.
See the documentation of glTranslate:
glTranslate produces a translation by x y z . The current matrix (see glMatrixMode) is multiplied by this translation matrix, with the product replacing the current matrix.
the documentation of glRotate:
glRotate produces a rotation of angle degrees around the vector x y z . The current matrix (see glMatrixMode) is multiplied by a rotation matrix with the product replacing the current matrix.
and the documentation of glScale:
glScaleproduces a nonuniform scaling along the x, y, and z axes. The three parameters indicate the desired scale factor along each of the three axes.
The current matrix (see glMatrixMode) is multiplied by this scale matrix.
This means you need one matrix stack, and all operations operate on the same matrix stack.
Note, a matrix multiplication C = A * B works like this:
Matrix4x4 A, B, C;
// C = A * B
for ( int k = 0; k < 4; ++ k )
for ( int j = 0; j < 4; ++ j )
C[k][j] = A[0][l] * B[k][0] + A[1][j] * B[k][1] + A[2][j] * B[k][2] + A[3][j] * B[k][3];
A 4*4 matrix looks like this:
c0 c1 c2 c3 c0 c1 c2 c3
[ Xx Yx Zx Tx ] [ 0 4 8 12 ]
[ Xy Yy Zy Ty ] [ 1 5 9 13 ]
[ Xz Yz Zz Tz ] [ 2 6 10 14 ]
[ 0 0 0 1 ] [ 3 7 11 15 ]
And the memory image of a 4*4 matrix looks like this:
[ Xx, Xy, Xz, 0, Yx, Yy, Yz, 0, Zx, Zy, Zz, 0, Tx, Ty, Tz, 1 ]
This means you have to adapt your matrix operations:
static std::vector<std::vector<GLfloat>> modelViewMatrixStack;
static std::vector<GLfloat> modelViewMatrix{
1.0f, 0.0f, 0.0f, 0.0f,
0.0f, 1.0f, 0.0f, 0.0f,
0.0f, 0.0f, 1.0f, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f };
void multiplyMatrix( float A[], float B[], float P[] )
{
float C[16];
for ( int k = 0; k < 4; ++ k ) {
for ( int l = 0; l < 4; ++ l ) {
C[k*4+j] =
A[0*4+j] * B[k*4+0] +
A[1*4+j] * B[k*4+1] +
A[2*4+j] * B[k*4+2] +
A[3*4+j] * B[k*4+3];
}
}
std::copy(C, C+16, P);
}
void glTranslatef( GLfloat x, GLfloat y, GLfloat z )
{
float translation[]{
1.0f, 0.0f, 0.0f, 0.0f,
0.0f, 1.0f, 0.0f, 0.0f,
0.0f, 0.0f, 1.0f, 0.0f,
x, y, z, 1.0f };
multiplyMatrix(&modelViewMatrix[0], translation, &modelViewMatrix[0]);
}
void glScalef( GLfloat x, GLfloat y, GLfloat z )
{
float scaling[]{
x, 0.0f, 0.0f, 0.0f,
0.0f, y, 0.0f, 0.0f,
0.0f, 0.0f, z, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f };
multiplyMatrix(&modelViewMatrix[0], scaling, &modelViewMatrix[0]);
}
void glRotatef( GLfloat angle, GLfloat x, GLfloat y, GLfloat z )
{
float radians = angle * M_PI/180;
float c = cos(radians);
float s = sin(radians);
float rotation[16]{
x*x*(1.0f-c)+c, x*y*(1.0f-c)-z*s, x*z*(1.0f-c)+y*s, 0.0f,
y*x*(1.0f-c)+z*s, y*y*(1.0f-c)+c, y*z*(1.0f-c)-x*s, 0.0f,
z*x*(1.0f-c)-y*s z*y*(1.0f-c)+x*s, z*z*(1.0f-c)+c, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f };
multiplyMatrix(&rotationMatrix[0], rotation, &rotationMatrix[0]);
}
See further:
GLSL 4×4 Matrix Fields
GLSL Programming/Vector and Matrix Operations
Data Type (GLSL)

How can I calculate the vertex normals for my model(house)

I just recently drew a house in my Direct3D11 application and it seems to look ok. But the only problem that I have is calculating the vertex normals for the house. Everytime the light strikes the house, it looks a little awkward, and the worst part is that the light wont even strike the inside of the house. (BTW I am using the spotlight technique to create light).
Down below is the vertex buffer that contains the vertices, texture coordinates, and the normals for my poorly drawn model. Can someone show me how to calculate the normals for the house?
void Create_Vertex_Buffer_for_House()
{
D3D11_BUFFER_DESC VertexBufferDesc;
D3D11_SUBRESOURCE_DATA VertexBufferData;
ZeroMemory(&VertexBufferDesc, sizeof(VertexBufferDesc));
VertexBufferDesc.BindFlags = D3D11_BIND_VERTEX_BUFFER;
VertexBufferDesc.ByteWidth = sizeof(Vertex_Buffer) * 34;
VertexBufferDesc.Usage = D3D11_USAGE_DEFAULT;
VertexBufferDesc.CPUAccessFlags = 0;
/* Vertex coordinates, Texture Coordinates, and vertex normals (respectably)*/
Vertex_Buffer Vertices[] =
{
/* Front wall of the house*/
Vertex_Buffer(-1.0f, -1.0f, 1.0f, 0.0f, 10.0f, -1.0f, -1.0f, 1.0f),
Vertex_Buffer(-1.0f, 1.0f, 1.0f, 0.0f, 0.0f, -1.0f, 1.0f, 1.0f),
Vertex_Buffer(1.0f, 1.0f, 1.0f, 10.0f, 0.0f, 1.0f, 1.0f, 1.0f),
Vertex_Buffer(1.0f, -1.0f, 1.0f, 10.0f, 10.0f, 1.0f, -1.0f, 1.0f),
/* Front wall of the house*/
Vertex_Buffer(-4.0f, -1.0f, 1.0f, 0.0f, 10.0f, -4.0f, -1.0f, 1.0f),
Vertex_Buffer(-4.0f, 1.0f, 1.0f, 0.0f, 0.0f, -4.0f, 1.0f, 1.0f),
Vertex_Buffer(-2.0f, 1.0f, 1.0f, 10.0f, 0.0f, -2.0f, 1.0f, 1.0f),
Vertex_Buffer(-2.0f, -1.0f, 1.0f, 10.0f, 10.0f, -2.0f, -1.0f, 1.0f),
/* Rooftop of house (front)*/
Vertex_Buffer(-4.0f, 1.0f, 1.0f, 0.0f, 10.0f, -4.0f, 1.0f, 1.0f),
Vertex_Buffer(-4.0f, 2.5f, -1.0f, 0.0f, 0.0f, -4.0f, 2.5f, -1.0f),
Vertex_Buffer(1.0f, 2.5f, -1.0f, 10.0f, 0.0f, 1.0f, 2.5f, -1.0f),
Vertex_Buffer(1.0f, 1.0f, 1.0f, 10.0f, 10.0f, 1.0f, 1.0f, 1.0f),
/* Rooftop of the house(back)*/
Vertex_Buffer(-4.0f, 2.5f, -1.0f, 0.0f, 10.0f, -4.0f, 2.5f, 1.0f),
Vertex_Buffer(-4.0f, 1.0f, -3.0f, 0.0f, 0.0f, -4.0f, 2.5f, -3.0f),
Vertex_Buffer(1.0f, 1.0f, -3.0f, 10.0f, 0.0f, 1.0f, 1.0f, -3.0f),
Vertex_Buffer(1.0f, 2.5f, -1.0f, 10.0f, 10.0f, 1.0f, 2.5f, -1.0f),
/* Right wall of the house*/
Vertex_Buffer(1.0f, -1.0f, 1.0f, 0.0f, 10.0f, 1.0f, -1.0f, 1.0f),
Vertex_Buffer(1.0f, 1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f),
Vertex_Buffer(1.0f, 1.0f, -3.0f, 10.0f, 0.0f, 1.0f, 1.0f, -3.0f),
Vertex_Buffer(1.0f, -1.0f, -3.0f, 10.0f, 10.0f, 1.0f, -1.0f, -3.0f),
/* right wall of the house(small triangle strip)*/
Vertex_Buffer(1.0f, 1.0f, 1.0f, 0.0f, 10.0f, 1.0f, 1.0f, 1.0f),
Vertex_Buffer(1.0f, 2.5f, -1.0f, 0.0f, 0.0f, 1.0f, 2.5f, -1.0f),
Vertex_Buffer(1.0f, 1.0f, -3.0f, 10.0f, 0.0f, 1.0f, 1.0f, -3.0f),
/* Left wall of the house*/
Vertex_Buffer(-4.0f, -1.0f, -3.0f, 0.0f, 10.0f, -4.0f, -1.0f, -3.0f),
Vertex_Buffer(-4.0f, 1.0f, -3.0f, 0.0f, 0.0f, -4.0f, 1.0f, -3.0f),
Vertex_Buffer(-4.0f, 1.0f, 1.0f, 10.0f, 0.0f, -4.0f, -1.0f, 1.0f),
Vertex_Buffer(-4.0f, -1.0f, 1.0f, 10.0f, 10.0f, -4.0f, -1.0f, 1.0f),
/* Left wall of the house (triangle strip)*/
Vertex_Buffer(-4.0f, 1.0f, 1.0f, 0.0f, 10.0f, -4.0f, 1.0f, 1.0f),
Vertex_Buffer(-4.0f, 2.5f, -1.0f, 0.0f, 0.0f, -4.0f, 2.5f, -1.0f),
Vertex_Buffer(-4.0f, 1.0f, -3.0f, 10.0f, 0.0f, -4.0f, 1.0f, -3.0f),
/* Back side of the house*/
Vertex_Buffer(-4.0f, -1.0f, -3.0f, 0.0f, 10.0f, -4.0f, -1.0f, -3.0f),
Vertex_Buffer(-4.0f, 1.0f, -3.0f, 0.0f, 0.0f, -4.0f, 1.0f, -3.0f),
Vertex_Buffer(1.0f, 1.0f, -3.0f, 10.0f, 0.0f, 1.0f, 1.0f, -3.0f),
Vertex_Buffer(1.0f, -1.0f, -3.0f, 10.0f, 10.0f, 1.0f, -1.0f, -3.0f),
};
ZeroMemory(&VertexBufferData, sizeof(VertexBufferData));
VertexBufferData.pSysMem = Vertices;
device->CreateBuffer(&VertexBufferDesc, &VertexBufferData, &HouseVertexBuffer);
}
I am not sure if this will help, but down below is my method of implementing the spotlight technique in my pixel shader.
struct Light
{
float3 SpotLight_Position;
float range;
float3 SpotLight_Direction;
float cone;
float3 attenuation;
float3 directional;
float4 ambient;
float4 diffuse;
};
cbuffer Constant_Buffer
{
float4x4 TRANSFORMEDMATRIX; /* The final transformed matrix */
float4x4 WORLDSPACE;
Light LIGHT;
};
struct Vertex_Shader_Output
{
float4 Positions : SV_POSITION;
float2 TextureCoord : TEXTURECOORD;
float4 WorldSpace : POSITION;
float3 normal : NORMAL;
};
struct Sky_Vertex_Shader_Output
{
float4 Positions : SV_POSITION;
float3 TextureCoord : TEXTURECOORD;
};
Texture2D Texture; /* Shader Resource for Pixel Shader*/
SamplerState Sampler; /* Shader Resource for Pixel Shader*/
/* PIXEL SHADER THAT WILL BE USED TO CREATE THE FLASH LIGHT*/
float4 Pixelshader(Vertex_Shader_Output input) : SV_TARGET
{
float4 TextureFormat;
float3 lightToPixelVector;
float3 finalAmbient;
float HowMuchLight;
float distance;
float3 FinalColor = float3 (0.0f, 0.0f, 0.0f);
/* Sampling the texture and storing the format into an object*/
TextureFormat = Texture.Sample(Sampler, input.TextureCoord);
/* Scaling the normal vector to a unit length*/
input.normal = normalize(input.normal);
/* Creating a vector between the light source and the pixel positions of every object*/
lightToPixelVector = LIGHT.SpotLight_Position - input.WorldSpace;
/* Getting the actual distance between the light source and the pixel position*/
distance = length(lightToPixelVector);
/* Adding the ambient and the colors of the texture*/
finalAmbient = TextureFormat * LIGHT.ambient;
/* If the pixel is too far from the light source*/
if (distance > LIGHT.range)
{
/* Return the objects color without the light source*/
return float4(finalAmbient, TextureFormat.a);
}
/* Normalizing the vector to make sure its a unit length*/
lightToPixelVector = normalize(lightToPixelVector);
/* Getting the angle between the light source and the vertex normal to see how much light that pixel will receive*/
HowMuchLight = dot(lightToPixelVector, input.normal);
if (HowMuchLight > 0.0f)
{
/* Adding the diffuse and colors of the texture to make the final color*/
FinalColor += TextureFormat * LIGHT.diffuse;
/* Calculating the attenuation for the Final Color*/
FinalColor /= (LIGHT.attenuation[0] + (LIGHT.attenuation[1] * distance)) + (LIGHT.attenuation[2] * (distance * distance));
/* */
FinalColor *= pow(max(dot(-lightToPixelVector, LIGHT.SpotLight_Direction), 0.0f), LIGHT.cone);
}
FinalColor = saturate(FinalColor + finalAmbient);
/* Returning the final colors of the texture and the alpha value*/
return float4(FinalColor, TextureFormat.a);
}

3D Convolution with CUDA using shared memory

I'm currently trying to adapt the 2D convolution code from THIS question to 3D and having trouble trying to understand where my error is.
My 2D Code looks like this:
#include <iostream>
#define MASK_WIDTH 3
#define MASK_RADIUS MASK_WIDTH / 2
#define TILE_WIDTH 8
#define W (TILE_WIDTH + MASK_WIDTH - 1)
/**
* GPU 2D Convolution using shared memory
*/
__global__ void convolution(float *I, float* M, float *P, int width, int height)
{
/***** WRITE TO SHARED MEMORY *****/
__shared__ float N_ds[W][W];
// First batch loading
int dest = threadIdx.x + (threadIdx.y * TILE_WIDTH);
int destY = dest / W;
int destX = dest % W;
int srcY = destY + (blockIdx.y * TILE_WIDTH) - MASK_RADIUS;
int srcX = destX + (blockIdx.x * TILE_WIDTH) - MASK_RADIUS;
int src = srcX + (srcY * width);
if(srcY >= 0 && srcY < height && srcX >= 0 && srcX < width)
N_ds[destY][destX] = I[src];
else
N_ds[destY][destX] = 0;
// Second batch loading
dest = threadIdx.x + (threadIdx.y * TILE_WIDTH) + TILE_WIDTH * TILE_WIDTH;
destY = dest / W;
destX = dest % W;
srcY = destY + (blockIdx.y * TILE_WIDTH) - MASK_RADIUS;
srcX = destX + (blockIdx.x * TILE_WIDTH) - MASK_RADIUS;
src = srcX + (srcY * width);
if(destY < W)
{
if(srcY >= 0 && srcY < height && srcX >= 0 && srcX < width)
N_ds[destY][destX] = I[src];
else
N_ds[destY][destX] = 0;
}
__syncthreads();
/***** Perform Convolution *****/
float sum = 0;
int y;
int x;
for(y = 0; y < MASK_WIDTH; y++)
for(x = 0; x < MASK_WIDTH; x++)
sum = sum + N_ds[threadIdx.y + y][threadIdx.x + x] * M[x + (y * MASK_WIDTH)];
y = threadIdx.y + (blockIdx.y * TILE_WIDTH);
x = threadIdx.x + (blockIdx.x * TILE_WIDTH);
if(y < height && x < width)
P[x + (y * width)] = sum;
__syncthreads();
}
int main(int argc, char* argv[])
{
int image_width = 16;
int image_height = 16;
float *deviceInputImageData;
float *deviceOutputImageData;
float *deviceMaskData;
float data[] =
{
1.0f, 1.0f, 1.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
2.0f, 2.0f, 2.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
3.0f, 3.0f, 3.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
4.0f, 4.0f, 4.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
5.0f, 5.0f, 5.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
6.0f, 6.0f, 6.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
7.0f, 7.0f, 7.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
8.0f, 8.0f, 8.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
9.0f, 9.0f, 9.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
10.0f, 10.0f, 10.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
11.0f, 11.0f, 11.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
12.0f, 12.0f, 12.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
13.0f, 13.0f, 13.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
14.0f, 14.0f, 14.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
15.0f, 15.0f, 15.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
16.0f, 16.0f, 16.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f
};
float mask[] =
{
1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f
};
// CHECK CHECK CHECK CHECK CHECK
int shared_memory_size = W * W;
int block_size = TILE_WIDTH * TILE_WIDTH;
int max_size = 2 * block_size;
std::cout << "Block Size: " << block_size << " - Shared Memory Size: " << shared_memory_size << " - Max Size: " << max_size << std::endl;
std::cout << "SHARED MEMORY SIZE HAS TO BE SMALLER THAN MAX SIZE IN ORDER TO WORK PROPERLY !!!!!!!";
cudaMalloc((void **)&deviceInputImageData, image_width * image_height * sizeof(float));
cudaMalloc((void **)&deviceOutputImageData, image_width * image_height * sizeof(float));
cudaMalloc((void **)&deviceMaskData, MASK_WIDTH * MASK_WIDTH * sizeof(float));
cudaMemcpy(deviceInputImageData, data, image_width * image_height * sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(deviceMaskData, mask, MASK_WIDTH * MASK_WIDTH * sizeof(float), cudaMemcpyHostToDevice);
dim3 dimBlock(TILE_WIDTH, TILE_WIDTH, 1);
dim3 dimGrid((image_width + TILE_WIDTH - 1) / TILE_WIDTH, (image_height + TILE_WIDTH - 1) / TILE_WIDTH);
convolution<<<dimGrid, dimBlock>>>(deviceInputImageData, deviceMaskData, deviceOutputImageData, image_width, image_height);
cudaDeviceSynchronize();
cudaMemcpy(data, deviceOutputImageData, image_width * image_height * sizeof(float), cudaMemcpyDeviceToHost);
// Print data
for(int i = 0; i < image_width * image_height; ++i)
{
if(i % image_width == 0)
{
std::cout << std::endl;
}
std::cout << data[i] << " - ";
}
cudaFree(deviceInputImageData);
cudaFree(deviceOutputImageData);
cudaFree(deviceMaskData);
return 0;
}
And the 3D equivalent:
#include <iostream>
#define MASK_WIDTH 3
#define MASK_RADIUS MASK_WIDTH / 2
#define TILE_WIDTH 8
#define W (TILE_WIDTH + MASK_WIDTH - 1)
/**
* GPU 2D Convolution using shared memory
*/
__global__ void convolution(float *I, float* M, float *P, int width, int height, int depth)
{
/***** WRITE TO SHARED MEMORY *****/
__shared__ float N_ds[W][W][W];
// First batch loading
int dest = threadIdx.x + (threadIdx.y * TILE_WIDTH) + (threadIdx.z * TILE_WIDTH * TILE_WIDTH);
int destTmp = dest;
int destX = destTmp % W;
destTmp = destTmp / W;
int destY = destTmp % W;
destTmp = destTmp / W;
int destZ = destTmp;
int srcZ = destZ + (blockIdx.z * TILE_WIDTH) - MASK_RADIUS;
int srcY = destY + (blockIdx.y * TILE_WIDTH) - MASK_RADIUS;
int srcX = destX + (blockIdx.x * TILE_WIDTH) - MASK_RADIUS;
int src = srcX + (srcY * width) + (srcZ * width * height);
if(srcZ >= 0 && srcZ < depth && srcY >= 0 && srcY < height && srcX >= 0 && srcX < width)
N_ds[destZ][destY][destX] = I[src];
else
N_ds[destZ][destY][destX] = 0;
// Second batch loading
dest = threadIdx.x + (threadIdx.y * TILE_WIDTH) + (threadIdx.z * TILE_WIDTH * TILE_WIDTH) + TILE_WIDTH * TILE_WIDTH;
destTmp = dest;
destX = destTmp % W;
destTmp = destTmp / W;
destY = destTmp % W;
destTmp = destTmp / W;
destZ = destTmp;
srcZ = destZ + (blockIdx.z * TILE_WIDTH) - MASK_RADIUS;
srcY = destY + (blockIdx.y * TILE_WIDTH) - MASK_RADIUS;
srcX = destX + (blockIdx.x * TILE_WIDTH) - MASK_RADIUS;
src = srcX + (srcY * width) + (srcZ * width * height);
if(destZ < W)
{
if(srcZ >= 0 && srcZ < depth && srcY >= 0 && srcY < height && srcX >= 0 && srcX < width)
N_ds[destZ][destY][destX] = I[src];
else
N_ds[destZ][destY][destX] = 0;
}
__syncthreads();
/***** Perform Convolution *****/
float sum = 0;
int z;
int y;
int x;
for(z = 0; z < MASK_WIDTH; z++)
for(y = 0; y < MASK_WIDTH; y++)
for(x = 0; x < MASK_WIDTH; x++)
sum = sum + N_ds[threadIdx.z + z][threadIdx.y + y][threadIdx.x + x] * M[x + (y * MASK_WIDTH) + (z * MASK_WIDTH * MASK_WIDTH)];
z = threadIdx.z + (blockIdx.z * TILE_WIDTH);
y = threadIdx.y + (blockIdx.y * TILE_WIDTH);
x = threadIdx.x + (blockIdx.x * TILE_WIDTH);
if(z < depth && y < height && x < width)
P[x + (y * width) + (z * width * height)] = sum;
__syncthreads();
}
int main(int argc, char* argv[])
{
int image_width = 16;
int image_height = 16;
int image_depth = 5;
float *deviceInputImageData;
float *deviceOutputImageData;
float *deviceMaskData;
float data[] =
{
1.0f, 1.0f, 1.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
2.0f, 2.0f, 2.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
3.0f, 3.0f, 3.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
4.0f, 4.0f, 4.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
5.0f, 5.0f, 5.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
6.0f, 6.0f, 6.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
7.0f, 7.0f, 7.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
8.0f, 8.0f, 8.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
9.0f, 9.0f, 9.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
10.0f, 10.0f, 10.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
11.0f, 11.0f, 11.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
12.0f, 12.0f, 12.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
13.0f, 13.0f, 13.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
14.0f, 14.0f, 14.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
15.0f, 15.0f, 15.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
16.0f, 16.0f, 16.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
2.0f, 2.0f, 2.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
3.0f, 3.0f, 3.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
4.0f, 4.0f, 4.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
5.0f, 5.0f, 5.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
6.0f, 6.0f, 6.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
7.0f, 7.0f, 7.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
8.0f, 8.0f, 8.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
9.0f, 9.0f, 9.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
10.0f, 10.0f, 10.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
11.0f, 11.0f, 11.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
12.0f, 12.0f, 12.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
13.0f, 13.0f, 13.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
14.0f, 14.0f, 14.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
15.0f, 15.0f, 15.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
16.0f, 16.0f, 16.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
2.0f, 2.0f, 2.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
3.0f, 3.0f, 3.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
4.0f, 4.0f, 4.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
5.0f, 5.0f, 5.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
6.0f, 6.0f, 6.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
7.0f, 7.0f, 7.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
8.0f, 8.0f, 8.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
9.0f, 9.0f, 9.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
10.0f, 10.0f, 10.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
11.0f, 11.0f, 11.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
12.0f, 12.0f, 12.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
13.0f, 13.0f, 13.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
14.0f, 14.0f, 14.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
15.0f, 15.0f, 15.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
16.0f, 16.0f, 16.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
2.0f, 2.0f, 2.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
3.0f, 3.0f, 3.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
4.0f, 4.0f, 4.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
5.0f, 5.0f, 5.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
6.0f, 6.0f, 6.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
7.0f, 7.0f, 7.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
8.0f, 8.0f, 8.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
9.0f, 9.0f, 9.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
10.0f, 10.0f, 10.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
11.0f, 11.0f, 11.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
12.0f, 12.0f, 12.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
13.0f, 13.0f, 13.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
14.0f, 14.0f, 14.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
15.0f, 15.0f, 15.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
16.0f, 16.0f, 16.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
2.0f, 2.0f, 2.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
3.0f, 3.0f, 3.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
4.0f, 4.0f, 4.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
5.0f, 5.0f, 5.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
6.0f, 6.0f, 6.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
7.0f, 7.0f, 7.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
8.0f, 8.0f, 8.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
9.0f, 9.0f, 9.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
10.0f, 10.0f, 10.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
11.0f, 11.0f, 11.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
12.0f, 12.0f, 12.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
13.0f, 13.0f, 13.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
14.0f, 14.0f, 14.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
15.0f, 15.0f, 15.0f, 1.0f, 3.0f, 1.0f, 5.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
16.0f, 16.0f, 16.0f, 2.0f, 1.0f, 4.0f, 1.0f, 6.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f
};
float mask[] =
{
1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f,
1.0f, 1.0f, 1.0f
};
// CHECK CHECK CHECK CHECK CHECK
int shared_memory_size = W * W * W;
int block_size = TILE_WIDTH * TILE_WIDTH * TILE_WIDTH;
int max_size = 3 * block_size;
std::cout << "Block Size: " << block_size << " - Shared Memory Size: " << shared_memory_size << " - Max Size: " << max_size << std::endl;
std::cout << "SHARED MEMORY SIZE HAS TO BE SMALLER THAN MAX SIZE IN ORDER TO WORK PROPERLY !!!!!!!";
cudaMalloc((void **)&deviceInputImageData, image_width * image_height * image_depth * sizeof(float));
cudaMalloc((void **)&deviceOutputImageData, image_width * image_height * image_depth * sizeof(float));
cudaMalloc((void **)&deviceMaskData, MASK_WIDTH * MASK_WIDTH * MASK_WIDTH * sizeof(float));
cudaMemcpy(deviceInputImageData, data, image_width * image_height * image_depth * sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(deviceMaskData, mask, MASK_WIDTH * MASK_WIDTH * MASK_WIDTH * sizeof(float), cudaMemcpyHostToDevice);
dim3 dimBlock(TILE_WIDTH, TILE_WIDTH, TILE_WIDTH);
dim3 dimGrid((image_width + TILE_WIDTH - 1) / TILE_WIDTH, (image_height + TILE_WIDTH - 1) / TILE_WIDTH, (image_depth + TILE_WIDTH - 1) / TILE_WIDTH);
convolution<<<dimGrid, dimBlock>>>(deviceInputImageData, deviceMaskData, deviceOutputImageData, image_width, image_height, image_depth);
cudaDeviceSynchronize();
cudaMemcpy(data, deviceOutputImageData, image_width * image_height * image_depth * sizeof(float), cudaMemcpyDeviceToHost);
// Print data
for(int i = 0; i < image_width * image_height * image_depth; ++i)
{
if((i % image_width) == 0)
std::cout << std::endl;
if((i % (image_width * image_height)) == 0)
std::cout << std::endl;
std::cout << data[i] << " - ";
}
cudaFree(deviceInputImageData);
cudaFree(deviceOutputImageData);
cudaFree(deviceMaskData);
return 0;
}
When using a TILE_WIDTH of 8, the convolution seems to partially work nicely, since the second and third layers are the same and also the values seem to be correct. In the 3D case, I calculated the destX, destY and destZ indices according to THIS explanation. The second thing that I changed is the if-condition for the second batch loading: if(destZ < W) to use destZ instead of destY.
My question now is what the reason for the incorrect values inside layer 4 and 5 of the output is. I guess I'm missing some understanding on how big the TILE_WIDTH MUST be in order to work properly. From this answer, I created the following check because every thread is supposed to perform at least 2 loads from global to shared memory:
// CHECK CHECK CHECK CHECK CHECK
int shared_memory_size = W * W;
int block_size = TILE_WIDTH * TILE_WIDTH;
int max_size = 2 * block_size;
std::cout << "Block Size: " << block_size << " - Shared Memory Size: " << shared_memory_size << " - Max Size: " << max_size << std::endl;
std::cout << "SHARED MEMORY SIZE HAS TO BE SMALLER THAN MAX SIZE IN ORDER TO WORK PROPERLY !!!!!!!";
Does it also apply in the 3D case, and if so, is it adapted correctly in my 3D check?

Seems like I adapted it correctly, apart from one stupid error:
// Second batch loading
dest = threadIdx.x + (threadIdx.y * TILE_WIDTH) + (threadIdx.z * TILE_WIDTH * TILE_WIDTH) + TILE_WIDTH * TILE_WIDTH;
I forgot one * TILE_WIDTH, so it should be:
// Second batch loading
dest = threadIdx.x + (threadIdx.y * TILE_WIDTH) + (threadIdx.z * TILE_WIDTH * TILE_WIDTH) + TILE_WIDTH * TILE_WIDTH * TILE_WIDTH;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Custom collision function with mesh class not outputting vector - c++

Related

How to use keyboard and mouse input to navigate a figure

Perspective Matrix not drawing object correctly

What is wrong with my matrix stack implementation (OpenGL ES 2.0)?

How can I calculate the vertex normals for my model(house)

3D Convolution with CUDA using shared memory

Categories

Resources