Hi I am making a 3d pool game and I am currently at the state of applying collisions, I am using openGL and C++. I already have the collision written and it is working properly. The issue I have is related only to the velocity of the ball class and passing forces to other balls on collision.
I have a ball class that is used for all the balls including the cue ball. I have a drawBall() function:
void drawBall(Shader ourShader) {
extern GLfloat deltaTime;
a = F / radius;
v = v + (a * deltaTime);
ballPos = ballPos + (v*deltaTime) + (0.5f*a*deltaTime*deltaTime);
//F = glm::vec3(0.0f, 0.0f, 0.0f);
glm::mat4 model;
model = glm::translate(model, ballPos);
model = glm::rotate(model, glm::radians(40.0f), glm::vec3(1.0f, 0.0f, 0.0f));
glUniformMatrix4fv(modelLoc, 1, GL_FALSE, glm::value_ptr(model));
glDrawElements(GL_TRIANGLES, indices.size(), GL_UNSIGNED_INT, 0);
And a collision function:
void collision() {
extern vector <Ball> ballCollection;
for (int i = 0; i < ballCollection.size(); i++) {
if (glm::distance(ballCollection.at(i).ballPos, ballPos) <= (2 * radius)) {
if (!glm::distance(ballCollection.at(i).ballPos, ballPos) == 0) {
extern GLfloat forceAmount;
if (isColliding == 0) {
glm::vec3 collPoint = glm::vec3((ballPos.x + ballCollection.at(i).ballPos.x) / 2, 0.0f, (ballPos.z + ballCollection.at(i).ballPos.z) / 2);
glm::vec3 collPointPerp = collPoint - ballPos;
glm::vec3 otherNew = collPointPerp;
glm::vec3 thisNew = glm::cross(collPointPerp, glm::vec3(0.0, 1.0, 0.0));
F = thisNew;
ballCollection.at(i).F = otherNew;
glm::vec3 newV;
glm::vec3 otherNewV;
newV.x = (v.x * (2 * ballCollection.at(i).radius * ballCollection.at(i).v.x)) / (radius + ballCollection.at(i).radius);
newV.z = (v.z * (2 * ballCollection.at(i).radius * ballCollection.at(i).v.z)) / (radius + ballCollection.at(i).radius);
newV.y = 0.0f;
otherNewV.x = (ballCollection.at(i).v.x * (2 * radius * v.x)) / (radius + ballCollection.at(i).radius);
otherNewV.z = (ballCollection.at(i).v.z * (2 * radius * v.z)) / (radius + ballCollection.at(i).radius);
otherNewV.y = 0.0f;
v.x = newV.x;
v.z = newV.z;
ballCollection.at(i).v.x = otherNewV.x;
ballCollection.at(i).v.z = otherNewV.z;
isColliding = 1;
ballCollection.at(i).isColliding = 1;
isColliding = 0;
This calss is in a separate file, in my main cpp. This has a key_kallback function:
void do_movement()
glm::vec3 cameraRight = glm::normalize(glm::cross(cameraFront, cameraUp));
GLfloat cameraSpeed = 50.0f;
glm::vec3 aboveBall = glm::vec3(ballCollection.at(15).ballPos.x, ballCollection.at(15).ballPos.y + 20.0f, ballCollection.at(15).ballPos.z);
glm::vec3 cameraFwd = aboveBall - cameraPos;
//cout << cameraFwd.x << "," << cameraFwd.y << "," << cameraFwd.z << endl;
if (keys[GLFW_KEY_W])
ballCollection.at(15).ballPos = ballCollection.at(15).ballPos + glm::vec3(0.0f, 0.0f, -0.1f);
if (keys[GLFW_KEY_S])
ballCollection.at(15).ballPos = ballCollection.at(15).ballPos + glm::vec3(0.0f, 0.0f, 0.1f);
if (keys[GLFW_KEY_A])
ballCollection.at(15).ballPos = ballCollection.at(15).ballPos + glm::vec3(-0.1f, 0.0f, 0.0f);
if (keys[GLFW_KEY_D])
ballCollection.at(15).ballPos = ballCollection.at(15).ballPos + glm::vec3(0.1f, 0.0f, 0.0f);
if (keys[GLFW_KEY_SPACE]) {
ballCollection.at(15).F = 10.0f*glm::normalize(cameraFwd);
if (keys[GLFW_KEY_E]) {
ballCollection.at(15).F = -10.0f * glm::normalize(cameraFwd);
And then within the game loop:
for (int i = 0; i < 16; i++) {
cameraPos = ballCollection.at(15).ballPos + glm::vec3(0.0f, 20.0f, 20.0f);
My problem is that when I apply the force to the cue ball with space, since the F value keeps adding to v, my cue ball speed increases every frame. If I put F back to 0 after applying the force, I can't pass force to other balls, therefore their v is 0 and they don't move after collision.
Any help would be very appreciated, please comment if I there is extra information needed.
I realized that I had an error in the collision calculation, it should be:
newV.x = (v.x + (2 * ballCollection.at(i).radius * ballCollection.at(i).v.x)) / (radius + ballCollection.at(i).radius);
newV.z = (v.z + (2 * ballCollection.at(i).radius * ballCollection.at(i).v.z)) / (radius + ballCollection.at(i).radius);
newV.y = 0.0f;
otherNewV.x = (ballCollection.at(i).v.x + (2 * radius * v.x)) / (radius + ballCollection.at(i).radius);
otherNewV.z = (ballCollection.at(i).v.z + (2 * radius * v.z)) / (radius + ballCollection.at(i).radius);
otherNewV.y = 0.0f;
Now, if I make F back to zero in the class after calculations, the ball moves with constant velocity which is fine for now. My issue now is that sometimes the collision works, sometimes the balls fly away, sometimes hitting the first ball is fine but if the second ball hits another one then its wrong again. I think this happens because they keep colliding and calculating their speed wrong, but I can't figure out how to set up a boolean to collide only once, anyone have any idea?
I managed to set up the boolean correctly now.
On collision, the force that should be applied to the other ball is the magnitude of the velocity vector. So if you pass in sqrt(v.x*v.x+v.y*v.y+v.z*v.z) as the initial force of the collision and set F to zero after applying the force like you stated, you should get the results you desire.
I'm writing simple renderer in C++. It uses convention similar to OpenGL, but it does not use OpenGL nor DirectX. float3, float4, float4x4 are my own custom structures.
The problem is, when I set the eye somewhere other then 0, 0, 0, I get strange results with triangles where I would not expect to see them.
I guess it's because of wrong matrix multiplication formula, wrong multiplication order, normalization, or wrong formula of lookAt/setPerspective. But I'm stuck at it and I cannot find the mistake.
I will upload some illustrations/screens later, as I don't have access to them now.
I use column-notation for matrices (matrix[column][row]), like OpenGL does.
Here is the matrix multiplication code:
class float4x4 { //[column][row]
float4 columns[4];
float4x4 multiplyBy(float4x4 &b){
float4x4 c = float4x4();
c.columns[0] = float4(
columns[0].x * b.columns[0].x + columns[1].x * b.columns[0].y + columns[2].x * b.columns[0].z + columns[3].x * b.columns[0].w,
columns[0].y * b.columns[0].x + columns[1].y * b.columns[0].y + columns[2].y * b.columns[0].z + columns[3].y * b.columns[0].w,
columns[0].z * b.columns[0].x + columns[1].z * b.columns[0].y + columns[2].z * b.columns[0].z + columns[3].z * b.columns[0].w,
columns[0].w * b.columns[0].x + columns[1].w * b.columns[0].y + columns[2].w * b.columns[0].z + columns[3].w * b.columns[0].w
c.columns[1] = float4(
columns[0].x * b.columns[1].x + columns[1].x * b.columns[1].y + columns[2].x * b.columns[1].z + columns[3].x * b.columns[1].w,
columns[0].y * b.columns[1].x + columns[1].y * b.columns[1].y + columns[2].y * b.columns[1].z + columns[3].y * b.columns[1].w,
columns[0].z * b.columns[1].x + columns[1].z * b.columns[1].y + columns[2].z * b.columns[1].z + columns[3].z * b.columns[1].w,
columns[0].w * b.columns[1].x + columns[1].w * b.columns[1].y + columns[2].w * b.columns[1].z + columns[3].w * b.columns[1].w
c.columns[2] = float4(
columns[0].x * b.columns[2].x + columns[1].x * b.columns[2].y + columns[2].x * b.columns[2].z + columns[3].x * b.columns[2].w,
columns[0].y * b.columns[2].x + columns[1].y * b.columns[2].y + columns[2].y * b.columns[2].z + columns[3].y * b.columns[2].w,
columns[0].z * b.columns[2].x + columns[1].z * b.columns[2].y + columns[2].z * b.columns[2].z + columns[3].z * b.columns[2].w,
columns[0].w * b.columns[2].x + columns[1].w * b.columns[2].y + columns[2].w * b.columns[2].z + columns[3].w * b.columns[2].w
c.columns[3] = float4(
columns[0].x * b.columns[3].x + columns[1].x * b.columns[3].y + columns[2].x * b.columns[3].z + columns[3].x * b.columns[3].w,
columns[0].y * b.columns[3].x + columns[1].y * b.columns[3].y + columns[2].y * b.columns[3].z + columns[3].y * b.columns[3].w,
columns[0].z * b.columns[3].x + columns[1].z * b.columns[3].y + columns[2].z * b.columns[3].z + columns[3].z * b.columns[3].w,
columns[0].w * b.columns[3].x + columns[1].w * b.columns[3].y + columns[2].w * b.columns[3].z + columns[3].w * b.columns[3].w
return c;
float4 multiplyBy(const float4 &b){
//based on http://stackoverflow.com/questions/25805126/vector-matrix-product-efficiency-issue
float4x4 a = *this; //getTransposed(); ???
float4 result(
dotProduct(a[0], b),
dotProduct(a[1], b),
dotProduct(a[2], b),
dotProduct(a[3], b)
return result;
inline float4x4 getTransposed() {
float4x4 transposed;
for (unsigned i = 0; i < 4; i++) {
for (unsigned j = 0; j < 4; j++) {
transposed.columns[i][j] = columns[j][i];
return transposed;
Where #define dotProduct(a, b) a.getDotProduct(b) and:
inline float getDotProduct(const float4 &anotherVector) const {
return x * anotherVector.x + y * anotherVector.y + z * anotherVector.z + w * anotherVector.w;
My VertexProcessor:
class VertexProcessor {
float4x4 obj2world;
float4x4 world2view;
float4x4 view2proj;
float4x4 obj2proj;
inline float3 tr(const float3 & v) { //in object space
float4 r = obj2proj.multiplyBy(float4(v.x, v.y, v.z, 1.0f/*v.w*/));
return float3(r.x / r.w, r.y / r.w, r.z / r.w); //we get vector in unified cube from -1,-1,-1 to 1,1,1
inline void transform() {
obj2proj = obj2world.multiplyBy(world2view);
obj2proj = obj2proj.multiplyBy(view2proj);
inline void setIdentity() {
obj2world = float4x4(
float4(1.0f, 0.0f, 0.0f, 0.0f),
float4(0.0f, 1.0f, 0.0f, 0.0f),
float4(0.0f, 0.0f, 1.0f, 0.0f),
float4(0.0f, 0.0f, 0.0f, 1.0f)
inline void setPerspective(float fovy, float aspect, float nearP, float farP) {
fovy *= PI / 360.0f;
float fValue = cos(fovy) / sin(fovy);
view2proj[0] = float4(fValue/aspect, 0.0f, 0.f, 0.0f);
view2proj[1] = float4(0.0f, fValue, 0.0f, 0.0f);
view2proj[2] = float4(0.0f, 0.0f, (farP + nearP) / (nearP - farP), -1.0f);
view2proj[3] = float4(0.0f, 0.0f, 2.0f * farP * nearP / (nearP - farP), 0.0f);
inline void setLookat(float3 eye, float3 center, float3 up) {
float3 f = center - eye;
float3 s = f.getCrossProduct(up);
float3 u = s.getCrossProduct(f);
world2view[0] = float4(s.x, u.x, -f.x, 0.0f);
world2view[1] = float4(s.y, u.y, -f.y, 0.0f);
world2view[2] = float4(s.z, u.z, -f.z, 0.0f);
world2view[3] = float4(eye/*.getNormalized() ???*/ * -1.0f, 1.0f);
inline void multByTranslation(float3 v) {
float4x4 m(
float4(1.0f, 0.0f, 0.0f, 0.0f),
float4(0.0f, 1.0f, 0.0f, 0.0f),
float4(0.0f, 0.0f, 1.0f, 0.0f),
float4(v.x, v.y, v.z, 1.0f)
world2view = m.multiplyBy(world2view);
inline void multByScale(float3 v) {
float4x4 m(
float4(v.x, 0.0f, 0.0f, 0.0f),
float4(0.0f, v.y, 0.0f, 0.0f),
float4(0.0f, 0.0f, v.z, 0.0f),
float4(0.0f, 0.0f, 0.0f, 1.0f)
world2view = m.multiplyBy(world2view);
inline void multByRotation(float a, float3 v) {
float s = sin(a*PI / 180.0f), c = cos(a*PI / 180.0f);
float4x4 m(
float4(v.x*v.x*(1-c)+c, v.y*v.x*(1 - c) + v.z*s, v.x*v.z*(1-c)-v.y*s, 0.0f),
float4(v.x*v.y*(1-c)-v.z*s, v.y*v.y*(1-c)+c, v.y*v.z*(1-c)+v.x*s, 0.0f),
float4(v.x*v.z*(1-c)+v.y*s, v.y*v.z*(1-c)-v.x*s, v.z*v.z*(1-c)+c, 0.0f),
float4(0.0f, 0.0f, 0.0f, 1.0f)
world2view = m.multiplyBy(world2view);
And the Rasterizer:
class Rasterizer final {
Buffer * buffer = nullptr;
inline float toScreenSpaceX(float x) { return (x + 1) * buffer->getWidth() * 0.5f; }
inline float toScreenSpaceY(float y) { return (y + 1) * buffer->getHeight() * 0.5f; }
inline int orient2d(float ax, float ay, float bx, float by, const float2& c) {
return (bx - ax)*(c.y - ay) - (by - ay)*(c.x - ax);
Rasterizer(Buffer * buffer) : buffer(buffer) {}
//v - position in screen space ([0, width], [0, height], [-1, -1])
void triangle(
float3 v0, float3 v1, float3 v2,
float3 n0, float3 n1, float3 n2,
float2 uv0, float2 uv1, float2 uv2,
Light * light0, Light * light1,
float3 camera, Texture * texture
) {
v0.x = toScreenSpaceX(v0.x);
v0.y = toScreenSpaceY(v0.y);
v1.x = toScreenSpaceX(v1.x);
v1.y = toScreenSpaceY(v1.y);
v2.x = toScreenSpaceX(v2.x);
v2.y = toScreenSpaceY(v2.y);
//based on: https://fgiesen.wordpress.com/2013/02/08/triangle-rasterization-in-practice/
//compute triangle bounding box
int minX = MIN3(v0.x, v1.x, v2.x);
int minY = MIN3(v0.y, v1.y, v2.y);
int maxX = MAX3(v0.x, v1.x, v2.x);
int maxY = MAX3(v0.y, v1.y, v2.y);
//clip against screen bounds
minX = MAX(minX, 0);
minY = MAX(minY, 0);
maxX = MIN(maxX, buffer->getWidth() - 1);
maxY = MIN(maxY, buffer->getHeight() - 1);
float2 p(0.0f, 0.0f);
for (p.y = minY; p.y <= maxY; p.y++) {
for (p.x = minX; p.x <= maxX; p.x++) {
// Determine barycentric coordinates
//int w0 = orient2d(v1.x, v1.y, v2.x, v2.y, p);
//int w1 = orient2d(v2.x, v2.y, v0.x, v0.y, p);
//int w2 = orient2d(v0.x, v0.y, v1.x, v1.y, p);
float w0 = (v1.y - v2.y)*(p.x - v2.x) + (v2.x - v1.x)*(p.y - v2.y);
w0 /= (v1.y - v2.y)*(v0.x - v2.x) + (v2.x - v1.x)*(v0.y - v2.y);
float w1 = (v2.y - v0.y)*(p.x - v2.x) + (v0.x - v2.x)*(p.y - v2.y);
w1 /= (v2.y - v0.y)*(v1.x - v2.x) + (v0.x - v2.x)*(v1.y - v2.y);
float w2 = 1 - w0 - w1;
// If p is on or inside all edges, render pixel.
if (w0 >= 0 && w1 >= 0 && w2 >= 0) {
float depth = w0 * v0.z + w1 * v1.z + w2 * v2.z;
if (depth < buffer->getDepthForPixel(p.x, p.y)) {
buffer->setPixel(p.x, p.y, diffuse.r, diffuse.g, diffuse.b, ALPHA_VISIBLE, depth);
I strongly believe that Rasterizer itself works well , because when I test it with code (instead of main loop):
float3 v0{ 0, 0, 0.1f };
float3 v1{ 0.5, 0, 0.1f };
float3 v2{ 1, 1, 0.1f };
//Rasterizer test (without VertexProcessor)
rasterizer->triangle(v0, v1, v2, n0, n1, n2, uv0, uv1, uv2, light0, light1, eye, defaultTexture);
I get the right image, with triangle that has one corner at the middle of the screen ([0, 0] in unified space), one at bottom-right corner ([1, 1]) and one at [0.5, 0].
The float3 structure:
class float3 {
union {
struct { float x, y, z; };
struct { float r, g, b; };
float p[3];
float3() = delete;
float3(const float3 &other) : x(other.x), y(other.y), z(other.z) {}
float3(float x, float y, float z) : x(x), y(y), z(z) {}
float &operator[](unsigned index){
ERROR_HANDLE(index < 3, L"The float3 index out of bounds (0-2 range, " + C::toWString(index) + L" given).");
return p[index];
float getLength() const { return std::abs(sqrt(x*x + y*y + z*z)); }
void normalizeIt();
inline float3 getNormalized() const {
float3 result(*this);
return result;
inline float3 getCrossProduct(const float3 &anotherVector) const {
//based on: http://www.sciencehq.com/physics/vector-product-multiplying-vectors.html
return float3(
y * anotherVector.z - anotherVector.y * z,
z * anotherVector.x - anotherVector.z * x,
x * anotherVector.y - anotherVector.x * y
inline float getDotProduct(const float3 &anotherVector) const {
//based on: https://www.ltcconline.net/greenl/courses/107/Vectors/DOTCROS.HTM
return x * anotherVector.x + y * anotherVector.y + z * anotherVector.z;
The main loop:
VertexProcessor vp;
DirectionalLight * light0 = new DirectionalLight({ 0.3f, 0.3f, 0.3f }, { 0.0f, -1.0f, 0.0f });
DirectionalLight * light1 = new DirectionalLight({ 0.4f, 0.4f, 0.4f }, { 0.0f, -1.0f, 0.5f });
while(!my_window.is_closed()) {
tgaBuffer.clearDepth(10.0f); //it could be 1.0f but 10.0f won't hurt, we draw pixel if it's depth < actual depth in buffer
tgaBuffer.clearColor(0, 0, 255, ALPHA_VISIBLE);
vp.setPerspective(75.0f, tgaBuffer.getWidth() / tgaBuffer.getHeight(), 10.0f, 2000.0f);
float3 eye = { 10.0f, 10.0f - frameTotal / 10.0f, 10.0f }; //animate eye
vp.setLookat(eye, float3{ 0.0f, 0.0f, 0.0f }.getNormalized(), { 0.0f, 1.0f, 0.0f });
//we could call e.g. vp.multByRotation(...) here, but we won't to keep it simple
drawTriangle(0, 1, 2);
drawTriangle(2, 3, 0);
drawTriangle(3, 2, 7);
drawTriangle(7, 2, 6);
drawTriangle(5, 1, 0);
drawTriangle(0, 5, 4);
drawTriangle(4, 5, 6);
drawTriangle(6, 7, 4);
Where drawTriangle(...) stands for:
#define drawTriangle(i0, i1, i2) rasterizer->triangle(vp.tr(v[i0]), vp.tr(v[i1]), vp.tr(v[i2]), v[i0], v[i1], v[i2], n0, n1, n2, uv0, uv1, uv2, light0, light1, eye, defaultTexture);
And here is the initialization of triangles' data:
float3 offset{ 0.0f, 0.0f, 0.0f };
v.push_back(offset + float3{ -10, -10, -10 });
v.push_back(offset + float3{ +10, -10, -10 });
v.push_back(offset + float3{ +10, -10, +10 });
v.push_back(offset + float3{ -10, -10, +10 });
v.push_back(offset + float3{ -10, +10, -10 });
v.push_back(offset + float3{ +10, +10, -10 });
v.push_back(offset + float3{ +10, +10, +10 });
v.push_back(offset + float3{ -10, +10, +10 });
I've created a little c-library for opengl long time ago. It was generally for learning purpose during my studies of computer graphics. I've looked up my sources and my implementation of perspective projection and orientation very much differs.
pbm_Mat4 pbm_mat4_projection_perspective(PBfloat fov, PBfloat ratio, PBfloat near, PBfloat far) {
PBfloat t = near * tanf(fov / 2.0f);
PBfloat b = -t;
PBfloat r = ratio * t, l = ratio * b;
return pbm_mat4_create(pbm_vec4_create(2.0f * near / (r - l), 0, 0, 0),
pbm_vec4_create(0, 2.0f * near / (t - b), 0, 0),
pbm_vec4_create((r + l) / (r - l), (t + b) / (t - b), - (far + near) / (far - near), -1.0f),
pbm_vec4_create(0, 0, -2.0f * far * near / (far - near), 0));
pbm_Mat4 pbm_mat4_orientation_lookAt(pbm_Vec3 pos, pbm_Vec3 target, pbm_Vec3 up) {
pbm_Vec3 forward = pbm_vec3_normalize(pbm_vec3_sub(target, pos));
pbm_Vec3 right = pbm_vec3_normalize(pbm_vec3_cross(forward, up));
up = pbm_vec3_normalize(pbm_vec3_cross(right, forward));
forward = pbm_vec3_scalar(forward, -1);
pos = pbm_vec3_scalar(pos, -1);
return pbm_mat4_create(pbm_vec4_create_vec3(right),
pbm_vec4_create_vec3_w(pbm_vec3_create(pbm_vec3_dot(right, pos),
pbm_vec3_dot(up, pos),
pbm_vec3_dot(forward, pos)), 1));
These methods are tested and you may want to test against them. Iff you want full sources are availabe here. Furthermore you could revisit frustums and projection matrices online. Unfortanetly I can not share the material from my university with you:(
I am trying to create my own quaternion class and I get weird results. Either the cube I am trying to rotate is flickering like crazy, or it is getting warped.
This is my code:
void Quaternion::AddRotation(vec4 v)
Quaternion temp(v.x, v.y, v.z, v.w);
*this = temp * (*this);
mat4 Quaternion::GenerateMatrix(Quaternion &q)
//Row order
mat4 m( 1 - 2*q.y*q.y - 2*q.z*q.z, 2*q.x*q.y - 2*q.w*q.z, 2*q.x*q.z + 2*q.w*q.y, 0,
2*q.x*q.y + 2*q.w*q.z, 1 - 2*q.x*q.x - 2*q.z*q.z, 2*q.y*q.z + 2*q.w*q.x, 0,
2*q.x*q.z - 2*q.w*q.y, 2*q.y*q.z - 2*q.w*q.x, 1 - 2*q.x*q.x - 2*q.y*q.y, 0,
0, 0, 0, 1);
//Col order
// mat4 m( 1 - 2*q.y*q.y - 2*q.z*q.z,2*q.x*q.y + 2*q.w*q.z,2*q.x*q.z - 2*q.w*q.y,0,
// 2*q.x*q.y - 2*q.w*q.z,1 - 2*q.x*q.x - 2*q.z*q.z,2*q.y*q.z - 2*q.w*q.x,0,
// 2*q.x*q.z + 2*q.w*q.y,2*q.y*q.z + 2*q.w*q.x,1 - 2*q.x*q.x - 2*q.y*q.y,0,
// 0,0,0,1);
return m;
When I create the entity I give it a quaternion:
entity->Quat.AddRotation(vec4(1.0f, 1.0f, 0.0f, 45.f));
And each frame I try to rotate it additionally by a small amount:
for (int i = 0; i < Entities.size(); i++)
if (Entities[i] != NULL)
Entities[i]->Quat.AddRotation(vec4(0.5f, 0.2f, 1.0f, 0.000005f));
And finally this is how I draw each cube:
void Entity::DrawModel()
mat4 RotationMatrix;
RotationMatrix = this->Quat.GenerateMatrix(this->Quat);
mat4 TranslationMatrix = glm::translate(mat4(1.0f), this->Pos);
this->Trans = TranslationMatrix * RotationMatrix;
if (this->shape != NULL)
EDIT: This is the tutorial I used to learn quaternions:
Without studying your rotation matrix to the end, there are two possible bugs I can think of. The first one is that your rotation matrix R is not orthogonal, i.e. the inverse of R is not equal to the transposed. This could cause warping of the object. The second place to hide a bug is inside the multiplication of your quaternions.
There's a mistake in the rotation matrix. Try exchanging the element (2,3) with element (3,2).