Trouble with precision, point accuracy at intersection of 3 planes - c++

have tried multiple methods of plane intersection, the code is not producing an accurate result. At some angles its pretty good, for a right angle triangle, roughly in position but noticeably off.
xyzPoint return_Intersect_3planes(Tri tri1, Tri tri2, Tri tri3) {
double x1 = 0.646;
double y1 = 0.210;
double z1 = 2.147;
double a1 = 0.251;
double b1 = -0.456;
double c1 = -0.411;
double d1 = -((a1 * x1) + (b1 * y1) + (c1 * z1));
double x2 = -0.0744;
double y2 = 0.0808;
double z2 = 2.082;
double a2 = -0.1218;
double b2 = -0.2606;
double c2 = -0.748;
double d2 = -((a2 * x2) + (b2 * y2) + (c2 * z2));
double x3 = 0.10627;
double y3 = 0.3924;
double z3 = 2.335;
double a3 = 0.0987;
double b3 = 0.3236;
double c3 = -0.278;
double d3 = -((a3 * x3) + (b3 * y3) + (c3 * z3));
double D = (a2 * b2 * c3) + (b1 * c2 * a3) + (c1 * a2 * b3) - (a3 * b2 * c1) - (b3 * c2 * a1) - (c3 * a2 * b1);
double Dx = (d1 * b2 * c3) + (b1 * c2 * d3) + (c1 * d2 * b3) - (d3 * b2 * c1) - (b3 * c2 * d1) - (c3 * d2 * b1);
double Dy = (a1 * d2 * c3) + (d1 * c2 * a3) + (c1 * a2 * d3) - (a3 * d2 * c1) - (d3 * c2 * a1) - (c3 * a2 * d1);
double Dz = (a1 * b2 * d3) + (b1 * d2 * a3) + (d1 * a2 * b3) - (a3 * b2 * d1) - (b3 * d2 * a1) - (d3 * a2 * b1);
xyzPoint Intersection;
Intersection.x = Dx / D;
Intersection.y = Dy / D;
Intersection.z = Dz / D;
return Intersection;
}
The input numbers
Result is
x = 0.00276579 y = -0.32880155 z = -4.0193058
(this has been flipped to negative and might be totally wrong?)
Correct position based on CAD is
x = -0.002 y = 0.204 z = 2.498
ODS (open office calc) file has been uploaded
https://filebin.net/g6onuah7q5rfg7lj

Related

Operations of the elements of arrays in a function' c++

Can I operate the elements of an array in a function(in the parameter)?
float f(i, u, v)
{
if (i == 1) {
return (u - v); //example of the returned value
}
if (i == 2) {
return (u + v);
}
}
int main()
{
int i;
float x[3],y1,y2,h;
x[1]=1;//value of the first element of x[m]
x[2]=1;
h=0.01;
for (i = 1; i <= 2; i++) {
y1=h * f(i, x[1], x[2]);
y2=h * f(i, x[1] + y1/2, x[2]+y1/2);
y3=h* f(i,x[1] + y2/2, x[2]+y2/2);
y4=h * f(i,x[1] + y3, x[2]+y3);
x[1]=x[1] + (y1+ 2 * y2 + 2 * y3+2 * y4)/ 6;
x[2]=x[2] + (y1+ 2 * y2 + 2 * y3+2 * y4)/ 6;
cout<<x[1]<<endl;
}
}
with:
x[1] and x[2] are the elements of the array x[m]
How can I operate elements of different arrays in parameter?
I would recommend you to try to compile your code, the compiler will give you some important hints as of what is wrong. Here is the code compiled online.
To make it compile i changed it to this:
#include <iostream>
using namespace std;
float f(int i, float u, float v) {
if (i == 1) {
return (u - v); // example of the returned value
}
// if (i == 2) { // This if-statement is not needed
return (u + v);
// }
}
int main() {
int i;
float x[3] = {0, 1, 1}; // x[0] is unused...?
float y1 = 0;
float y2 = 0;
float y3 = 0;
float y4 = 0;
const float h = 0.01;
for (int i = 1; i <= 2; i++) {
y1 = h * f(i, x[1], x[2]);
y2 = h * f(i, x[1] + y1 / 2, x[2] + y1 / 2);
y3 = h * f(i, x[1] + y2 / 2, x[2] + y2 / 2);
y4 = h * f(i, x[1] + y3, x[2] + y3);
x[1] = x[1] + (y1 + 2 * y2 + 2 * y3 + 2 * y4) / 6;
x[2] = x[2] + (y1 + 2 * y2 + 2 * y3 + 2 * y4) / 6;
cout << x[1] << endl;
}
}
Note the changes
You need to specify the types for the variables in the function f(...)
You need to define all variables before using them (a good rule is to specify everything right before you use it, and add const if not changed.
Remember to zero initialize variables that you are going to add to (y1, y2... etc)
Also I would recommend you to use x1 instead of x1, since you are mixing styles between x and y, and you are not using the zeroeth element. Like this
int main() {
int i;
float x1 = 1;
float x2 = 2;
float y1 = 0;
float y2 = 0;
float y3 = 0;
float y4 = 0;
const float h = 0.01;
for (int i = 1; i <= 2; i++) {
y1 = h * f(i, x1, x2);
y2 = h * f(i, x1 + y1 / 2, x2 + y1 / 2);
y3 = h * f(i, x1 + y2 / 2, x2 + y2 / 2);
y4 = h * f(i, x1 + y3, x2 + y3);
x1 = x1 + (y1 + 2 * y2 + 2 * y3 + 2 * y4) / 6;
x2 = x2 + (y1 + 2 * y2 + 2 * y3 + 2 * y4) / 6;
cout << x1 << endl;
}
}

Why is call by reference so much slower than inline code?

I am programming a physics simulation with few particles (typically 3, no more than 5).
In a condensed version my code structure like this:
#include<iostream>
class Particle{
double x; // coordinate
double m; // mass
};
void performStep(Particle &p, double &F_external){
p.x += -0.2*p.x + F_external/p.m; // boiled down, in reality complex calculation, not important here
}
int main(){
dt = 0.001; // time step, not important
Particle p1;
p1.x = 5; // some random number for initialization, in reality more complex but not important here
p.m = 1;
Particle p2;
p2.x = -1; // some random numbersfor initialization, in reality more complex but not important here
p.m = 2;
Particle p3;
p3.x = 0; // some random number for initialization, in reality more complex but not important here
p.m = 3;
double F_external = 0; // external forces
for(unsigned long long int i=0; i < 10000000000; ++i){ // many steps, typically 10e9
F_external = sin(i*dt);
performStep(p1, F_external);
performStep(p2, F_external);
performStep(p3, F_external);
}
std::cout << "p1.x: " << p1.x << std::endl;
std::cout << "p2.x: " << p2.x << std::endl;
std::cout << "p3.x: " << p3.x << std::endl;
}
I have determined with clock() that the performStep(p, F_external) call is the bottleneck in my code).
When I tried to do inline calculation, i.e. replace performStep(p1, F_external) by p1.x += -0.2*p1.x + F_external/p1.m; the calculation suddenly was roughly a factor of 2 faster. Note that performStep() in reality is about ~60 basic arithmetic calculations over ~20 lines, so the code becomes really bloated if I just inline it for every particle.
Why is that the case? I am compiling with MinGW64/g++ and the -O2 flag. I thought the compiler would optimize such things?
Edit:
Here is the function that is called. Note that in reality, I calculate all three coordinates x,y,z with a couple of different external forces. Variables which are not passed via the function are a member of SimulationRun. The algorithm is a fourth-order leapfrog algorithm.
void SimulationRun::performLeapfrog_z(const unsigned long long int& i, const double& x, const double& y, double& z, const double& vx, const double& vy, double& vz, const double& qC2U0,
const double& U0, const double& m, const double& C4, const double& B2, const double& f_minus, const double& f_z, const double& f_plus, const bool& bool_calculate_xy,
const double& Find, const double& Fheating) {
// probing for C4 == 0 and B2 == 0 saves some computation time
if (C4 == 0) {
Fz_C4_Be = 0;
}
if (B2 == 0 || !bool_calculate_xy) {
Fz_B2_Be = 0;
}
z1 = z + c1 * vz * dt;
if (C4 != 0 && !bool_calculate_xy) {
Fz_C4_Be = (-4) * q * C4 * U0 * z1 * z1 * z1;
}
else if (C4 != 0 && bool_calculate_xy) {
Fz_C4_Be = q * C4 * U0 * (-4 * z1 * z1 * z1 + 6 * z1 * (x * x + y * y));
}
if (B2 != 0 && bool_calculate_xy) {
Fz_B2_Be = q * B2 * (-vx * z1 * y + vy * z1 * x);
}
acc_z1 = (qC2U0 * (-2) * z1 + Find + Fz_C4_Be + Fz_B2_Be + Fheating) / m;
vz1 = vz + d1 * acc_z1 * dt;
z2 = z1 + c2 * vz1 * dt;
if (C4 != 0 && !bool_calculate_xy) {
Fz_C4_Be = (-4) * q * C4 * U0 * z2 * z2 * z2;
}
else if (C4 != 0 && bool_calculate_xy) {
Fz_C4_Be = q * C4 * U0 * (-4 * z2 * z2 * z2 + 6 * z2 * (x * x + y * y));
}
if (B2 != 0 && bool_calculate_xy) {
Fz_B2_Be = q * B2 * (-vx * z2 * y + vy * z2 * x);
}
acc_z2 = (qC2U0 * (-2) * z2 + +Find + Fz_C4_Be + Fz_B2_Be + Fheating) / m;
vz2 = vz1 + d2 * acc_z2 * dt;
z3 = z2 + c3 * vz2 * dt;
if (C4 != 0 && !bool_calculate_xy) {
Fz_C4_Be = (-4) * q * C4 * U0 * z3 * z3 * z3;
}
else if (C4 != 0 && bool_calculate_xy) {
Fz_C4_Be = q * C4 * U0 * (-4 * z3 * z3 * z3 + 6 * z3 * (x * x + y * y));
}
if (B2 != 0 && bool_calculate_xy) {
Fz_B2_Be = q * B2 * (-vx * z3 * y + vy * z3 * x);
}
acc_z3 = (qC2U0 * (-2) * z3 + Find + Fz_C4_Be + Fz_B2_Be + Fheating) / m;
vz3 = vz2 + d3 * acc_z3 * dt;
z = z3 + c4 * vz3 * dt;
vz = vz3;
}
Optimization is hard, even for compilers. Here are some optimization tips:
Since your performStep is hotspot, put it into a header file(in case that you split declaration and definition into header/source), then add inline keyword, like:
// at file xxx.h
inline void performStep(Particle &p, double F_external){
p.x += -0.2*p.x + F_external/p.m; // boiled down, in reality complex calculation, not important here
}
Upgrade your compiler, maybe to the latest.
use https://godbolt.org/ to check the assembly code. In this case, unnecessary dereference is the headache of performance.

C++ Image interpolation with Bicubic method

I am just trying to smoothing the image by BiCubic interpolation. I got some code which is used to interpolate the RGB image. I have changed the code to work for Grayscale image. But in result i only got fully black image. Considered input and output image size are same. The code is pasted below. Please help me. Thanks in advance.
inline Uint16 saturate(float x, unsigned max_pixel)
{
return x > max_pixel ? max_pixel
: x < 0.0f ? 0
: Uint16(x);
}
inline float get_subpixel(const Uint16* in, std::size_t dest_width, std::size_t dest_height, unsigned x, unsigned y)
{
if (x < dest_width && y < dest_height)
return in[(y * dest_width) + x];
return 0;
}
void interpolate(unsigned dest_width, unsigned dest_height, unsigned bits_allocated, const Uint16* src, Uint16** dest)
{
const double tx = 1;
const double ty = 1;
float C[5] = { 0 };
unsigned max_bit = pow(2, bits_allocated);
for (unsigned i = 0; i < dest_height; ++i)
{
for (unsigned j = 0; j < dest_width; ++j)
{
const float x = float(tx * j);
const float y = float(ty * i);
const float dx = tx * j - x, dx2 = dx * dx, dx3 = dx2 * dx;
const float dy = ty * i - y, dy2 = dy * dy, dy3 = dy2 * dy;
for (int jj = 0; jj < 4; ++jj)
{
const int idx = y - 1 + jj;
float a0 = get_subpixel(src, dest_width, dest_height, x, idx);
float d0 = get_subpixel(src, dest_width, dest_height, x - 1, idx) - a0;
float d2 = get_subpixel(src, dest_width, dest_height, x + 1, idx) - a0;
float d3 = get_subpixel(src, dest_width, dest_height, x + 2, idx) - a0;
float a1 = -(1.0f / 3.0f) * d0 + d2 - (1.0f / 6.0f) * d3;
float a2 = 0.5f * d0 + 0.5f * d2;
float a3 = -(1.0f / 6.0f) * d0 - 0.5f * d2 + (1.0f / 6.0f) * d3;
C[jj] = a0 + a1 * dx + a2 * dx2 + a3 * dx3;
d0 = C[0] - C[1];
d2 = C[2] - C[1];
d3 = C[3] - C[1];
a0 = C[1];
a1 = -(1.0f / 3.0f) * d0 + d2 - (1.0f / 6.0f) * d3;
a2 = 0.5f * d0 + 0.5f * d2;
a3 = -(1.0f / 6.0f) * d0 - 0.5f * d2 + (1.0f / 6.0f) * d3;
(*dest)[i * dest_width + j] = saturate(a0 + a1 * dy + a2 * dy2 + a3 * dy3, max_bit);
}
}
}
}
How can you have this? The c's havent been computed until the jj loop ends the brace should be above the d's - I'm not considering if the method is correct otherwise.
for (int jj = 0; jj < 4; ++jj)
{
const int idx = y - 1 + jj;
float a0 = get_subpixel(src, dest_width, dest_height, x, idx);
float d0 = get_subpixel(src, dest_width, dest_height, x - 1, idx) - a0;
float d2 = get_subpixel(src, dest_width, dest_height, x + 1, idx) - a0;
float d3 = get_subpixel(src, dest_width, dest_height, x + 2, idx) - a0;
float a1 = -(1.0f / 3.0f) * d0 + d2 - (1.0f / 6.0f) * d3;
float a2 = 0.5f * d0 + 0.5f * d2;
float a3 = -(1.0f / 6.0f) * d0 - 0.5f * d2 + (1.0f / 6.0f) * d3;
C[jj] = a0 + a1 * dx + a2 * dx2 + a3 * dx3;
// } // end jj
d0 = C[0] - C[1];
d2 = C[2] - C[1];
d3 = C[3] - C[1];
a0 = C[1];
a1 = -(1.0f / 3.0f) * d0 + d2 - (1.0f / 6.0f) * d3;
a2 = 0.5f * d0 + 0.5f * d2;
a3 = -(1.0f / 6.0f) * d0 - 0.5f * d2 + (1.0f / 6.0f) * d3;
(*dest)[i * dest_height + j] = saturate(a0 + a1 * dy + a2 * dy2 + a3 * dy3, max_bit);
} // end jj move his above
}
}
I wanted to share great link
cubic splines

Calculating a exact spline with 3 given Points in 2D. C++

I have a std::vector with 3 points (2D) with values x >= 0 and x <= 512.
With these 3 points I have to calculate a draw that passes all of these 3 points.
Here
you see the 3 Points and the corresponding circle. I need a function to interpolate the points based on a variable which defines the accuracy (eg the number of points inbetween).
If its not clear: I work in C++.
To solve your issue you need to calculate center of triangle's circumscribed circle and it radius. Then find min X and max X from triangle coordinates then calculate delta between maxX - minX and divide delta to numbers of input points. Then in loop you iterates from minX to maxX and calculate coordinates by using circle formula R^2 = (x - centerX)^2 + (y - centerY)^2.
Below a small example
#include <iostream>
#include <vector>
#include <math.h>
template <typename T>
class CPoint2D
{
public:
CPoint2D(T _x, T _y)
: x(_x)
, y(_y)
{}
~CPoint2D()
{}
const T& X() const { return x; }
const T& Y() const { return y; }
private:
T x;
T y;
};
typedef CPoint2D<float> CPoint2Df;
bool GetCenterCircumscribedCircle(float x0, float y0,
float x1, float y1,
float x2, float y2,
float& centerX, float& centerY, float& radius)
{
if ((x0 == x1 && x1 == x2) ||
(y0 == y1 && y1 == y2))
{
return false;
}
float D = 2.0f * (y0 * x2 + y1 * x0 - y1 * x2 - y0 * x1 - y2 * x0 + y2 * x1);
centerX = ( y1 * x0 * x0
- y2 * x0 * x0
- y1 * y1 * y0
+ y2 * y2 * y0
+ x1 * x1 * y2
+ y0 * y0 * y1
+ x2 * x2 * y0
- y2 * y2 * y1
- x2 * x2 * y1
- x1 * x1 * y0
+ y1 * y1 * y2
- y0 * y0 * y2) / D;
centerY = ( x0 * x0 * x2
+ y0 * y0 * x2
+ x1 * x1 * x0
- x1 * x1 * x2
+ y1 * y1 * x0
- y1 * y1 * x2
- x0 * x0 * x1
- y0 * y0 * x1
- x2 * x2 * x0
+ x2 * x2 * x1
- y2 * y2 * x0
+ y2 * y2 * x1) / D;
radius = sqrt((x0 - centerX) * (x0 - centerX) + (y0 - centerY) * (y0 - centerY));
return true;
}
void CalculatePointsOnCirle(const std::vector<CPoint2Df>& triVertexes, std::vector<CPoint2Df>& outPoints, float stride)
{
if (triVertexes.size() != 3)
{
return;
}
const CPoint2Df& v1 = triVertexes[0];
const CPoint2Df& v2 = triVertexes[1];
const CPoint2Df& v3 = triVertexes[2];
float minX = std::min(v1.X(), v2.X());
minX = std::min(minX, v3.X());
float maxX = std::max(v1.X(), v2.X());
maxX = std::max(maxX, v3.X());
float deltaX = (maxX - minX) / stride;
float centerX;
float centerY;
float radius;
if (GetCenterCircumscribedCircle(v1.X(), v1.Y(),
v2.X(), v2.Y(),
v3.X(), v3.Y(),
centerX, centerY, radius))
{
for (float x = minX; x < maxX; x += deltaX)
{
float y = sqrt(radius * radius - (x - centerX) * (x - centerX));
outPoints.push_back(CPoint2Df(x, y));
}
}
}
int main(int argc, const char * argv[])
{
std::vector<CPoint2Df> triVertex = {CPoint2Df(0.0f, 0.0f),
CPoint2Df(256.0f, 256.0f),
CPoint2Df(512.0f, 0.0f)};
std::vector<CPoint2Df> outPoints;
CalculatePointsOnCirle(triVertex, outPoints, 4);
for (unsigned int i = 0; i < outPoints.size(); ++i)
{
printf("p[%d]: (%f, %f)\n", i, outPoints[i].X(), outPoints[i].Y());
}
return 0;
}

Bi-Cubic Interpolation Algorithm for Image Scaling

I'm trying to write a basic bicubic resize algorithm to resize a 24-bit RGB bitmap. I have a general understanding of the math involved, and I'm using this implementation from Google Code as a guide. I'm not using any external libraries here - I'm just experimenting with the algorithm itself. The bitmap is represented as a plain std::vector<unsigned char>:
inline unsigned char getpixel(const std::vector<unsigned char>& in,
std::size_t src_width, std::size_t src_height, unsigned x, unsigned y, int channel)
{
if (x < src_width && y < src_height)
return in[(x * 3 * src_width) + (3 * y) + channel];
return 0;
}
std::vector<unsigned char> bicubicresize(const std::vector<unsigned char>& in,
std::size_t src_width, std::size_t src_height, std::size_t dest_width, std::size_t dest_height)
{
std::vector<unsigned char> out(dest_width * dest_height * 3);
const float tx = float(src_width) / dest_width;
const float ty = float(src_height) / dest_height;
const int channels = 3;
const std::size_t row_stride = dest_width * channels;
unsigned char C[5] = { 0 };
for (int i = 0; i < dest_height; ++i)
{
for (int j = 0; j < dest_width; ++j)
{
const int x = int(tx * j);
const int y = int(ty * i);
const float dx = tx * j - x;
const float dy = ty * i - y;
for (int k = 0; k < 3; ++k)
{
for (int jj = 0; jj < 4; ++jj)
{
const int z = y - 1 + jj;
unsigned char a0 = getpixel(in, src_width, src_height, z, x, k);
unsigned char d0 = getpixel(in, src_width, src_height, z, x - 1, k) - a0;
unsigned char d2 = getpixel(in, src_width, src_height, z, x + 1, k) - a0;
unsigned char d3 = getpixel(in, src_width, src_height, z, x + 2, k) - a0;
unsigned char a1 = -1.0 / 3 * d0 + d2 - 1.0 / 6 * d3;
unsigned char a2 = 1.0 / 2 * d0 + 1.0 / 2 * d2;
unsigned char a3 = -1.0 / 6 * d0 - 1.0 / 2 * d2 + 1.0 / 6 * d3;
C[jj] = a0 + a1 * dx + a2 * dx * dx + a3 * dx * dx * dx;
d0 = C[0] - C[1];
d2 = C[2] - C[1];
d3 = C[3] - C[1];
a0 = C[1];
a1 = -1.0 / 3 * d0 + d2 -1.0 / 6 * d3;
a2 = 1.0 / 2 * d0 + 1.0 / 2 * d2;
a3 = -1.0 / 6 * d0 - 1.0 / 2 * d2 + 1.0 / 6 * d3;
out[i * row_stride + j * channels + k] = a0 + a1 * dy + a2 * dy * dy + a3 * dy * dy * dy;
}
}
}
}
return out;
}
Problem: When I use this algorithm to downscale an image, it works except the output image contains all black pixels on the right side for some reason, giving the appearance that it's been "cropped".
Example:
INPUT IMAGE:
OUTPUT IMAGE:
Question: Reviewing the algorithm, I can't see why this would happen. Does anyone see the flaw here?
try not exchanging width and height.
for (int i = 0; i < dest_width; ++i)
{
for (int j = 0; j < dest_height; ++j)
I suggest don't use this function because it was written very bad. You need to make two convolutions: at first by X coordinate then by Y. In this function all these convolutions are making in the same time that leads to very slow work. And if You would look at jj loop body you could notice that all second part of body begining from "d0 = C[0] - C[1];" could be moved outside jj loop because only the last iteration of this loop takes effect on out[] array (all previous iterations results will be overwrited).
You should switch the x and z when you call getpixel, and in getpixel you should index the array using:
[(y * 3 * src_width) + (3 * x) + channel]
In getpixel(in, src_width, src_height, z, x, k):
z mean horizontal offset
x mean vertical offset
So just need patch the getpixel function, below is the patched code:
inline unsigned char getpixel(const std::vector<unsigned char>& in,
std::size_t src_width, std::size_t src_height, unsigned y, unsigned x, int channel)
{
if (x < src_width && y < src_height)
return in[(y * 3 * src_width) + (3 * x) + channel];
return 0;
}
std::vector<unsigned char> bicubicresize(const std::vector<unsigned char>& in,
std::size_t src_width, std::size_t src_height, std::size_t dest_width, std::size_t dest_height)
{
std::vector<unsigned char> out(dest_width * dest_height * 3);
const float tx = float(src_width) / dest_width;
const float ty = float(src_height) / dest_height;
const int channels = 3;
const std::size_t row_stride = dest_width * channels;
unsigned char C[5] = { 0 };
for (int i = 0; i < dest_height; ++i)
{
for (int j = 0; j < dest_width; ++j)
{
const int x = int(tx * j);
const int y = int(ty * i);
const float dx = tx * j - x;
const float dy = ty * i - y;
for (int k = 0; k < 3; ++k)
{
for (int jj = 0; jj < 4; ++jj)
{
const int z = y - 1 + jj;
unsigned char a0 = getpixel(in, src_width, src_height, z, x, k);
unsigned char d0 = getpixel(in, src_width, src_height, z, x - 1, k) - a0;
unsigned char d2 = getpixel(in, src_width, src_height, z, x + 1, k) - a0;
unsigned char d3 = getpixel(in, src_width, src_height, z, x + 2, k) - a0;
unsigned char a1 = -1.0 / 3 * d0 + d2 - 1.0 / 6 * d3;
unsigned char a2 = 1.0 / 2 * d0 + 1.0 / 2 * d2;
unsigned char a3 = -1.0 / 6 * d0 - 1.0 / 2 * d2 + 1.0 / 6 * d3;
C[jj] = a0 + a1 * dx + a2 * dx * dx + a3 * dx * dx * dx;
d0 = C[0] - C[1];
d2 = C[2] - C[1];
d3 = C[3] - C[1];
a0 = C[1];
a1 = -1.0 / 3 * d0 + d2 -1.0 / 6 * d3;
a2 = 1.0 / 2 * d0 + 1.0 / 2 * d2;
a3 = -1.0 / 6 * d0 - 1.0 / 2 * d2 + 1.0 / 6 * d3;
out[i * row_stride + j * channels + k] = a0 + a1 * dy + a2 * dy * dy + a3 * dy * dy * dy;
}
}
}
}
return out;
}