How to multiply 4x4 float matrices in C++?

How to multiply 4x4 float matrices in C++? - c++

I have a simple struct "mat4", consisting of float[4][4], and a *= function to multiply 4x4 matrices. It takes a const mat4& "rhs" as follows:
this->m[0][0] = this->m[0][0] * rhs[0][0] + this->m[0][1] * rhs[1][0] + this->m[0][2] * rhs[2][0] + this->m[0][3] * rhs[3][0];
this->m[0][1] = this->m[0][0] * rhs[0][1] + this->m[0][1] * rhs[1][1] + this->m[0][2] * rhs[2][1] + this->m[0][3] * rhs[3][1];
this->m[0][2] = this->m[0][0] * rhs[0][2] + this->m[0][1] * rhs[1][2] + this->m[0][2] * rhs[2][2] + this->m[0][3] * rhs[3][2];
this->m[0][3] = this->m[0][0] * rhs[0][3] + this->m[0][1] * rhs[1][3] + this->m[0][2] * rhs[2][3] + this->m[0][3] * rhs[3][3];
this->m[1][0] = this->m[1][0] * rhs[0][0] + this->m[1][1] * rhs[1][0] + this->m[1][2] * rhs[2][0] + this->m[1][3] * rhs[3][0];
this->m[1][1] = this->m[1][0] * rhs[0][1] + this->m[1][1] * rhs[1][1] + this->m[1][2] * rhs[2][1] + this->m[1][3] * rhs[3][1];
this->m[1][2] = this->m[1][0] * rhs[0][2] + this->m[1][1] * rhs[1][2] + this->m[1][2] * rhs[2][2] + this->m[1][3] * rhs[3][2];
this->m[1][3] = this->m[1][0] * rhs[0][3] + this->m[1][1] * rhs[1][3] + this->m[1][2] * rhs[2][3] + this->m[1][3] * rhs[3][3];
this->m[2][0] = this->m[2][0] * rhs[0][0] + this->m[2][1] * rhs[1][0] + this->m[2][2] * rhs[2][0] + this->m[2][3] * rhs[3][0];
this->m[2][1] = this->m[2][0] * rhs[0][1] + this->m[2][1] * rhs[1][1] + this->m[2][2] * rhs[2][1] + this->m[2][3] * rhs[3][1];
this->m[2][2] = this->m[2][0] * rhs[0][2] + this->m[2][1] * rhs[1][2] + this->m[2][2] * rhs[2][2] + this->m[2][3] * rhs[3][2];
this->m[2][3] = this->m[2][0] * rhs[0][3] + this->m[2][1] * rhs[1][3] + this->m[2][2] * rhs[2][3] + this->m[2][3] * rhs[3][3];
this->m[3][0] = this->m[3][0] * rhs[0][0] + this->m[3][1] * rhs[1][0] + this->m[3][2] * rhs[2][0] + this->m[3][3] * rhs[3][0];
this->m[3][1] = this->m[3][0] * rhs[0][1] + this->m[3][1] * rhs[1][1] + this->m[3][2] * rhs[2][1] + this->m[3][3] * rhs[3][1];
this->m[3][2] = this->m[3][0] * rhs[0][2] + this->m[3][1] * rhs[1][2] + this->m[3][2] * rhs[2][2] + this->m[3][3] * rhs[3][2];
this->m[3][3] = this->m[3][0] * rhs[0][3] + this->m[3][1] * rhs[1][3] + this->m[3][2] * rhs[2][3] + this->m[3][3] * rhs[3][3];
I just wanted to get confirmation whether it was correct or not - when I multiply two matrices in C++ (projection * view matrices) and give the resulting matrix to the shader, I get nothing on the screen showing up.
But if I give the shader projection & view matrices separately, and multiply them in GLSL - then it all works great, results are as expected.
So there must be something wrong with the matrix multiplication function?

Shouldn't
this->m[0][1] = this->m[0][0] * rhs[0][1] + this->m[0][1] * rhs[1][1] + this->m[0][2] * rhs[2][1] + this->m[0][3] * rhs[3][1];
be multiplying by rhs[1][0] .. rhs[1][3]? You are not stepping through the columns of rhs as you step through the rows of this.

Related

Isometric tile placement

I am trying to make an isometric tile map with vertices. I managed to complete it without vertices. In my head it should be as easy as not doing it with vertices, but I can not get it to work. Dozens of attempts have been made, but I am missing something important.
const uint16_t tileMapSize = 16;
quad = &this->tilesDiamond[(x + y * tileMapSize) * 4]; // tilesDiamond is the vertexarray
const uint16_t blockWidth = 64, blockHeight = 64;
pos.x = (x-y) * blockWidth/2;
pos.y = (x + y) * blockWidth/2;
quad[0].position = sf::Vector2f(pos.x, pos.y);
quad[1].position = sf::Vector2f(pos.x+ blockWidth/2, pos.y+ blockWidth/2);
quad[2].position = sf::Vector2f(pos.x, pos.y + blockWidth);
quad[3].position = sf::Vector2f(pos.x - blockWidth/2, pos.y+ blockWidth/2);
This is the result and it is supposed to look like a diamond shaping with the tiles together. https://imgur.com/a/tYhvmZs.

quad[0].texCoords = sf::Vector2f(tu * tileSize.x, tv * tileSize.y);
quad[1].texCoords = sf::Vector2f((tu + 1) * tileSize.x, tv * tileSize.y);
quad[2].texCoords = sf::Vector2f((tu + 1) * tileSize.x, (tv + 1) * tileSize.y);
quad[3].texCoords = sf::Vector2f(tu * tileSize.x, (tv + 1) * tileSize.y);
Since i made an error in the texture coordinated everything was rotated the wrong way.

Optimizing the CUDA kernel with stencil pattern

In the process of writing a program for processing digital images, I wrote a CUDA kernel that runs slowly. The code is given below:
__global__ void Kernel ( int* inputArray, float* outputArray, float3* const col_image, int height, int width, int kc2 ) {
float G, h;
float fx[3];
float fy[3];
float g[2][2];
float k10 = 0.0;
float k11 = 0.0;
float k12 = 0.0;
float k20 = 0.0;
float k21 = 0.0;
float k22 = 0.0;
float k30 = 0.0;
float k31 = 0.0;
float k32 = 0.0;
int xIndex = blockIdx.x * blockDim.x + threadIdx.x;
int yIndex = blockIdx.y * blockDim.y + threadIdx.y;
if ((xIndex < width - kc2/2) && (xIndex >= kc2/2) && (yIndex < height - kc2/2) && (yIndex >= kc2/2))
{
int idx0 = yIndex * width + xIndex;
if (inputArray[idx0] > 0)
{
for (int i = 0; i < kc2; i++)
{
for (int j = 0; j < kc2; j++)
{
int idx1 = (yIndex + i - kc2/2) * width + (xIndex + j - kc2/2);
float3 rgb = col_image[idx1];
k10 = k10 + constMat1[i * kc2 + j] * rgb.x;
k11 = k11 + constMat1[i * kc2 + j] * rgb.y;
k12 = k12 + constMat1[i * kc2 + j] * rgb.z;
k20 = k20 + constMat2[i * kc2 + j] * rgb.x;
k21 = k21 + constMat2[i * kc2 + j] * rgb.y;
k22 = k22 + constMat2[i * kc2 + j] * rgb.z;
k30 = k30 + constMat3[i * kc2 + j] * rgb.x;
k31 = k31 + constMat3[i * kc2 + j] * rgb.y;
k32 = k32 + constMat3[i * kc2 + j] * rgb.z;
}
}
fx[0] = kc2 * (k30 - k20);
fx[1] = kc2 * (k31 - k21);
fx[2] = kc2 * (k32 - k22);
fy[0] = kc2 * (k10 - k20);
fy[1] = kc2 * (k11 - k21);
fy[2] = kc2 * (k12 - k22);
g[0][0] = fx[0] * fx[0] + fx[1] * fx[1] + fx[2] * fx[2];
g[0][1] = fx[0] * fy[0] + fx[1] * fy[1] + fx[2] * fy[2];
g[1][0] = g[0][1];
g[1][1] = fy[0] * fy[0] + fy[1] * fy[1] + fy[2] * fy[2]
G = g[0][0] * g[1][1] - g[0][1] * g[1][0];
h = g[0][0] + g[1][1];
// Output
int idx2 = (yIndex - kc2/2) * (width - kc2) + (xIndex - kc2/2);
outputArray[idx2] = (h * h) / G;
}
}
}
Here some (non-negative) values of inputArray are processed. The array col-image contains color components in the RGB model. If the value of inputArray satisfies the condition, then we compute the special coefficients k_{ij} in a neighborhood of kc2 on kc2 with center at the considered point (the value of kc2 is either 3 or 5). The values of constMat[1,2,3] are stored in the device's constant memory:
__device__ __constant__ float constMat[];
Then we calculate the characteristics fx, fy, g_{ij}, h, G and write the resulting value in the corresponding cell of outputArray.
Importantly, all the data specified is stored in global memory, and the fact that the input array can be large enough (about 40 million points). All this directly affects the speed of the kernel.
How do we speed up the execution of this kernel (any techniques are welcome: use of shared memory / textures, use of stencil templates, etc.)?

What I would call a "standard" usage of shared memory to buffer a block of col_image for use (and reuse) by the threadblock would be a "standard" suggestion here.
According to my tests, it seems to offer a substantial improvement. Since you have not provided a complete code, or any sort of data set or results verification, I will skip all those also. What follows then is a not-really-tested implementation of shared memory into your existing code, to "buffer" a (threadblockwidth + kc2)*(threadblockheight+kc2) "patch" of the col_image input data into a shared memory buffer. Thereafter, during the double-nested for-loops, the data is read out of the shared memory buffer.
A 2D shared memory stencil operation like this is an exercise in indexing as well as an exercise in handling edge cases. Your code is somewhat simpler in that we need only consider the edges to the "right" and "downward" when considering the needed "halo" of data to be buffered into shared memory.
I have not attempted to verify that this code is perfect. However it should give you a "roadmap" for how to implement a 2D shared memory buffer system, with some motivation for the effort: I witness about a ~5x speedup by doing so, although YMMV, and its entirely possible I've made a performance mistake.
Here's a worked example, showing the speedup on Pascal Titan X, CUDA 8.0.61, Linux:
$ cat t390.cu
#include <stdio.h>
#include <iostream>
const int adim = 6000;
const int KC2 = 5;
const int thx = 32;
const int thy = 32;
__constant__ float constMat1[KC2*KC2];
__constant__ float constMat2[KC2*KC2];
__constant__ float constMat3[KC2*KC2];
__global__ void Kernel ( int* inputArray, float* outputArray, float3* const col_image, int height, int width, int kc2 ) {
float G, h;
float fx[3];
float fy[3];
float g[2][2];
float k10 = 0.0;
float k11 = 0.0;
float k12 = 0.0;
float k20 = 0.0;
float k21 = 0.0;
float k22 = 0.0;
float k30 = 0.0;
float k31 = 0.0;
float k32 = 0.0;
int xIndex = blockIdx.x * blockDim.x + threadIdx.x;
int yIndex = blockIdx.y * blockDim.y + threadIdx.y;
int idx0 = yIndex * width + xIndex;
#ifdef USE_SHARED
__shared__ float3 s_col_image[thy+KC2][thx+KC2];
int idx = xIndex;
int idy = yIndex;
int DATAHSIZE= height;
int WSIZE = kc2;
int DATAWSIZE = width;
float3 *input = col_image;
int BLKWSIZE = thx;
int BLKHSIZE = thy;
if ((idx < DATAHSIZE+WSIZE) && (idy < DATAWSIZE+WSIZE))
s_col_image[threadIdx.y][threadIdx.x]=input[idx0];
if ((idx < DATAHSIZE+WSIZE) && (idy < DATAWSIZE) && (threadIdx.y > BLKWSIZE - WSIZE))
s_col_image[threadIdx.y + (WSIZE-1)][threadIdx.x] = input[idx0+(WSIZE-1)*width];
if ((idx < DATAHSIZE) && (idy < DATAWSIZE+WSIZE) && (threadIdx.x > BLKHSIZE - WSIZE))
s_col_image[threadIdx.y][threadIdx.x + (WSIZE-1)] = input[idx0+(WSIZE-1)];
if ((idx < DATAHSIZE) && (idy < DATAWSIZE) && (threadIdx.x > BLKHSIZE - WSIZE) && (threadIdx.y > BLKWSIZE - WSIZE))
s_col_image[threadIdx.y + (WSIZE-1)][threadIdx.x + (WSIZE-1)] = input[idx0+(WSIZE-1)*width + (WSIZE-1)];
__syncthreads();
#endif
if ((xIndex < width - kc2/2) && (xIndex >= kc2/2) && (yIndex < height - kc2/2) && (yIndex >= kc2/2))
{
if (inputArray[idx0] > 0)
{
for (int i = 0; i < kc2; i++)
{
for (int j = 0; j < kc2; j++)
{
#ifdef USE_SHARED
float3 rgb = s_col_image[threadIdx.y][threadIdx.x];
#else
int idx1 = (yIndex + i - kc2/2) * width + (xIndex + j - kc2/2);
float3 rgb = col_image[idx1];
#endif
k10 = k10 + constMat1[i * kc2 + j] * rgb.x;
k11 = k11 + constMat1[i * kc2 + j] * rgb.y;
k12 = k12 + constMat1[i * kc2 + j] * rgb.z;
k20 = k20 + constMat2[i * kc2 + j] * rgb.x;
k21 = k21 + constMat2[i * kc2 + j] * rgb.y;
k22 = k22 + constMat2[i * kc2 + j] * rgb.z;
k30 = k30 + constMat3[i * kc2 + j] * rgb.x;
k31 = k31 + constMat3[i * kc2 + j] * rgb.y;
k32 = k32 + constMat3[i * kc2 + j] * rgb.z;
}
}
fx[0] = kc2 * (k30 - k20);
fx[1] = kc2 * (k31 - k21);
fx[2] = kc2 * (k32 - k22);
fy[0] = kc2 * (k10 - k20);
fy[1] = kc2 * (k11 - k21);
fy[2] = kc2 * (k12 - k22);
g[0][0] = fx[0] * fx[0] + fx[1] * fx[1] + fx[2] * fx[2];
g[0][1] = fx[0] * fy[0] + fx[1] * fy[1] + fx[2] * fy[2];
g[1][0] = g[0][1];
g[1][1] = fy[0] * fy[0] + fy[1] * fy[1] + fy[2] * fy[2]; // had a missing semicolon
G = g[0][0] * g[1][1] - g[0][1] * g[1][0];
h = g[0][0] + g[1][1];
// Output
int idx2 = (yIndex - kc2/2) * (width - kc2) + (xIndex - kc2/2); // possible indexing bug here
outputArray[idx2] = (h * h) / G;
}
}
}
int main(){
int *d_inputArray;
int height = adim;
int width = adim;
float *d_outputArray;
float3 *d_col_image;
int kc2 = KC2;
cudaMalloc(&d_inputArray, height*width*sizeof(int));
cudaMemset(d_inputArray, 1, height*width*sizeof(int));
cudaMalloc(&d_col_image, (height+kc2)*(width+kc2)*sizeof(float3));
cudaMalloc(&d_outputArray, height*width*sizeof(float));
dim3 threads(thx,thy);
dim3 blocks((adim+threads.x-1)/threads.x, (adim+threads.y-1)/threads.y);
Kernel<<<blocks,threads>>>( d_inputArray, d_outputArray, d_col_image, height, width, kc2 );
cudaDeviceSynchronize();
}
$ nvcc -arch=sm_61 -o t390 t390.cu
$ cuda-memcheck ./t390
========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors
$ nvprof ./t390
==1473== NVPROF is profiling process 1473, command: ./t390
==1473== Profiling application: ./t390
==1473== Profiling result:
Time(%) Time Calls Avg Min Max Name
97.29% 34.705ms 1 34.705ms 34.705ms 34.705ms Kernel(int*, float*, float3*, int, int, int)
2.71% 965.14us 1 965.14us 965.14us 965.14us [CUDA memset]
==1473== API calls:
Time(%) Time Calls Avg Min Max Name
88.29% 310.69ms 3 103.56ms 550.23us 309.46ms cudaMalloc
9.86% 34.712ms 1 34.712ms 34.712ms 34.712ms cudaDeviceSynchronize
1.05% 3.6801ms 364 10.110us 247ns 453.59us cuDeviceGetAttribute
0.70% 2.4483ms 4 612.07us 547.62us 682.25us cuDeviceTotalMem
0.08% 284.32us 4 71.079us 63.098us 79.616us cuDeviceGetName
0.01% 29.533us 1 29.533us 29.533us 29.533us cudaMemset
0.01% 21.189us 1 21.189us 21.189us 21.189us cudaLaunch
0.00% 5.2730us 12 439ns 253ns 1.1660us cuDeviceGet
0.00% 3.4710us 6 578ns 147ns 2.4820us cudaSetupArgument
0.00% 3.1090us 3 1.0360us 340ns 2.1660us cuDeviceGetCount
0.00% 1.0370us 1 1.0370us 1.0370us 1.0370us cudaConfigureCall
ubuntu#titanxp-DiGiTS-Dev-Box:~/bobc/misc$ nvcc -arch=sm_61 -o t390 t390.cu -DUSE_SHARED
ubuntu#titanxp-DiGiTS-Dev-Box:~/bobc/misc$ cuda-memcheck ./t390
========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors
$ nvprof ./t390
==1545== NVPROF is profiling process 1545, command: ./t390
==1545== Profiling application: ./t390
==1545== Profiling result:
Time(%) Time Calls Avg Min Max Name
86.17% 5.4181ms 1 5.4181ms 5.4181ms 5.4181ms Kernel(int*, float*, float3*, int, int, int)
13.83% 869.94us 1 869.94us 869.94us 869.94us [CUDA memset]
==1545== API calls:
Time(%) Time Calls Avg Min Max Name
96.13% 297.15ms 3 99.050ms 555.80us 295.90ms cudaMalloc
1.76% 5.4281ms 1 5.4281ms 5.4281ms 5.4281ms cudaDeviceSynchronize
1.15% 3.5664ms 364 9.7970us 247ns 435.92us cuDeviceGetAttribute
0.86% 2.6475ms 4 661.88us 642.85us 682.42us cuDeviceTotalMem
0.09% 266.42us 4 66.603us 62.005us 77.380us cuDeviceGetName
0.01% 29.624us 1 29.624us 29.624us 29.624us cudaMemset
0.01% 19.147us 1 19.147us 19.147us 19.147us cudaLaunch
0.00% 4.8560us 12 404ns 248ns 988ns cuDeviceGet
0.00% 3.3390us 6 556ns 134ns 2.3510us cudaSetupArgument
0.00% 3.1190us 3 1.0390us 331ns 2.0780us cuDeviceGetCount
0.00% 1.1940us 1 1.1940us 1.1940us 1.1940us cudaConfigureCall
$
We see that the kernel execution time is ~35ms in the non-shared case, and ~5.5ms in the shared case. For this case I set kc2=5. For the kc2=3 case, performance gains will be less.
A few notes:
Your posted code was missing a semicolon on one line. I've added that and marked the line in my code.
I suspect you may have an indexing error on the "output" write to outputArray. Your indexing is like this:
int idx2 = (yIndex - kc2/2) * (width - kc2) + (xIndex - kc2/2);
whereas I would have expected this:
int idx2 = (yIndex - kc2/2) * width + (xIndex - kc2/2);
however I haven't thought carefully about it, so I may be wrong here.
In the future, if you want help with a problem like this, I'd advise you to at least provide the level of complete code scaffolding and description that I have. Provide a complete code that somebody else could immediately pick up and test, without having to write their own code. Also define what platform you are on and what your performance measurement is.

GLM rotation matrix differs from expected result

I am trying to create a rotation matrix around the X-axis using glm::gtc::matrix_transform::rotate:
glm::rotate(glm::mat4f(1.0f), glm::radians(90.f), glm::vec3f(1.f, 0.f, 0.f));
I expected the resulting matrix to be (translational offsets removed):
1, 0, 0
0, cos(90), -sin(90)
0, sin(90), cos(90)
0, 0, 0
(See e.g. https://en.wikipedia.org/wiki/Rotation_matrix#Basic_rotations)
However, the result is slightly off, i.e.:
1, 0, 0
0, 0.9996240, -0.0274121
0, 0.0274121, 0.9996240
0, 0, 0
I looked at https://github.com/g-truc/glm/blob/master/glm/gtc/matrix_transform.inl and surely enough, the implementation uses a weird factor c + (1 - c) that would explain the results.
My question is now, why? Why is the definition of glm's rotation matrix different? What is the theory behind it?

glm implementation uses this formula from Wikipedia.
The following lines of code are identical to the formula:
Result[0][0] = c + (1 - c) * axis.x * axis.x;
Result[0][1] = (1 - c) * axis.x * axis.y + s * axis.z;
Result[0][2] = (1 - c) * axis.x * axis.z - s * axis.y;
Result[0][3] = 0;
Result[1][0] = (1 - c) * axis.y * axis.x - s * axis.z;
Result[1][1] = c + (1 - c) * axis.y * axis.y;
Result[1][2] = (1 - c) * axis.y * axis.z + s * axis.x;
Result[1][3] = 0;
Result[2][0] = (1 - c) * axis.z * axis.x + s * axis.y;
Result[2][1] = (1 - c) * axis.z * axis.y - s * axis.x;
Result[2][2] = c + (1 - c) * axis.z * axis.z;
Result[2][3] = 0;
There is nothing weird in c + (1 - c) because c + (1 - c) * axis.x * axis.x is the same as c + ((1 - c) * axis.x * axis.x). Do not forget about operator precedence.
Most likely you are having issues with floating-point precision loss.

how can i change the b-spline curves from 4 point to 6?

I have a code on C++ it's b-spline curve that has 4 points if I want to change it to 6 point what shall I change in the code?
You can check the code:
#include "graphics.h"
#include <math.h>
int main(void) {
int gd, gm, page = 0;
gd = VGA;
gm = VGAMED;
initgraph(&gd, &gm, "");
point2d pontok[4] = { 100, 100, 150, 200, 170, 130, 240, 270 }; //pontok means points
int ap;
for (;;) {
setactivepage(page);
cleardevice();
for (int i = 0; i < 4; i++)
circle(integer(pontok[i].x), integer(pontok[i].y), 3);
double t = 0;
moveto((1.0 / 6) * (pontok[0].x * pow(1 - t, 3) +
pontok[1].x * (3 * t * t * t - 6 * t * t + 4) +
pontok[2].x * (-3 * t * t * t + 3 * t * t + 3 * t + 1) +
pontok[3].x * t * t * t),
(1.0 / 6) * (pontok[0].y * pow(1 - t, 3) +
pontok[1].y * (3 * t * t * t - 6 * t * t + 4) +
pontok[2].y * (-3 * t * t * t + 3 * t * t + 3 * t + 1) +
pontok[3].y * t * t * t));
for (t = 0; t <= 1; t += 0.01)
lineto(
(1.0 / 6) * (pontok[0].x * pow(1 - t, 3) +
pontok[1].x * (3 * t * t * t - 6 * t * t + 4) +
pontok[2].x * (-3 * t * t * t + 3 * t * t + 3 * t + 1) +
pontok[3].x * t * t * t),
(1.0 / 6) * (pontok[0].y * pow(1 - t, 3) +
pontok[1].y * (3 * t * t * t - 6 * t * t + 4) +
pontok[2].y * (-3 * t * t * t + 3 * t * t + 3 * t + 1) +
pontok[3].y * t * t * t));
/* Egerkezeles */ //Egerkezeles means mouse event handling
if (!balgomb)
ap = getactivepoint((point2d *)pontok, 4, 5);
if (ap >= 0 && balgomb) { //balgomb means left mouse button
pontok[ap].x = egerx; //eger means mouse
pontok[ap].y = egery;
}
/* Egerkezeles vege */
setvisualpage(page);
page = 1 - page;
if (kbhit())
break;
}
getch();
closegraph();
return 0;
}

From your formula, it looks like you are trying to draw a cubic Bezier curve. But the formula does not seem entirely correct. You can google "cubic Bezier curve" to find the correct formula. The Wikipedia page contains the formula for any degree of Bezier curve. You can find the "6-points" formula from there by using degree = 5.

unexpected VARSYM, ZIMPL program

I am getting an unexpected VARSYM error for my ZIMPL program, I have no idea what the problem is, here is a portion of the code
Here are the variables
var FWPlus1 integer >= 0 <= 4;
var FWPlus2 integer >= 0 <= 4;
var FWPlus3 integer >= 0 <= 4;
goes up to 28, with the upper bound at 3, 2, and 1 for some of the points
here is the equation that is getting the error
subto R3: FCOMx ==
((FWPlus1 * (FWPlus1 * 0 + 0 )) +(FWPlus2 * (FWPlus2 * .105 + 5.47008 )) +
(FWPlus3 * (FWPlus3 * .2054 + 10.70110)) +(FWPlus4 * (FWPlus4 * .29683 + 15.46443)) +
(FWPlus6 * (FWPlus6 * .48028 + 25.02197)) +(FWPlus7 * (FWPlus7 * .50223 + 26.16553)) +
(FWPlus8 * (FWPlus8 * .50223 + 26.16553)) +(FWPlus9 * (FWPlus9 * .48028 + 25.02197)) +
(FWPlus10 * (FWPlus10 * .43734 + 22.78483)) +(FWPlus11 * (FWPlus11 * .37529 + 19.55188)) +
(FWPlus12 * (FWPlus12 * .29683 + 15.46443)) +(FWPlus13 * (FWPlus13 * .20540 + 10.70110)) +
(FWPlus14 * (FWPlus14 * .105 + 5.47008)) +(FWPlus15 * (FWPlus15 * 0 + 0)) +
(FWPlus16 * (FWPlus16 * -.105 + -5.47008)) +(FWPlus17 * (FWPlus17 * -.2054 + -10.70110)) +
(FWPlus18 * (FWPlus18 * -.29683 + -15.46443)) +(FWPlus19 * (FWPlus19 * -.37529 + -19.55188)) +
(FWPlus20 * (FWPlus20 * -.43734 + -22.78483)) +(FWPlus21 * (FWPlus21 * -.48028 + -25.02197)) +
(FWPlus22 * (FWPlus22 * -.50223 + -26.16553)) +(FWPlus23 * (FWPlus23 * -.50223 + -26.16553)) +
(FWPlus24 * (FWPlus24 * -.48028 + -25.02197)) +(FWPlus25 * (FWPlus25 * -.37529 + -19.55188)) +
(FWPlus26 * (FWPlus26 * -.29683 + -15.44827)) +(FWPlus27 * (FWPlus27 * -.20540 + -10.68992)) +
(FWPlus28 * (FWPlus28 * -.10499 + -5.46437)))
/(FWPlus1 +FWPlus2 +FWPlus3 +FWPlus4 +FWPlus6 +FWPlus7 +FWPlus8 +FWPlus9 +FWPlus10 +FWPlus11 +FWPlus12 +
FWPlus13 +FWPlus14 +FWPlus15 +FWPlus16 +FWPlus17 +FWPlus18 +FWPlus19 +FWPlus20 +FWPlus21 +FWPlus22 +FWPlus23 +
FWPlus24 +FWPlus25 +FWPlus26 +FWPlus27 + FWPlus28);
the error says it is at the end at the semicolon

Sorry but I think I figured it out, it didn't like that I was multiplying by zero in 2 of the terms

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to multiply 4x4 float matrices in C++? - c++

Shouldn't this->m[0][1] = this->m[0][0] * rhs[0][1] + this->m[0][1] * rhs[1][1] + this->m[0][2] * rhs[2][1] + this->m[0][3] * rhs[3][1]; be multiplying by rhs[1][0] .. rhs[1][3]? You are not stepping through the columns of rhs as you step through the rows of this.

Related

Isometric tile placement

Optimizing the CUDA kernel with stencil pattern

GLM rotation matrix differs from expected result

how can i change the b-spline curves from 4 point to 6?

unexpected VARSYM, ZIMPL program

Categories

Resources