Wrote some perlin noise kind of code, it looks blocky - c++

The previous answered question doesn't seem to answer my problem "Blocky" Perlin noise
I tried to simplify the most I could to make my code readable and understandable.
I don't use the permutation table, instead I use the mt19937 generator.
I use SFML
using namespace std;
using namespace sf;
typedef Vector2f Vec2;
Sprite spr;
Texture tx;
// dot product
float prod(Vec2 a, Vec2 b) { return a.x*b.x + a.y*b.y; }
// linear interpolation
float interp(float start,float end,float coef){return coef*(end-start)+start;}
// get the noise of a certain pixel, giving its relative value vector in the square with [0.0 1.0] values
float getnoise(Vec2&A, Vec2&B, Vec2&C, Vec2&D, Vec2 rel){
float
dot_a=prod(A ,Vec2(rel.x ,rel.y)),
dot_b=prod(B ,Vec2(rel.x-1 ,rel.y)),
dot_c=prod(C ,Vec2(rel.x ,rel.y-1)),
dot_d=prod(D ,Vec2(rel.x-1 ,rel.y-1));
return interp
(interp(dot_a,dot_b,rel.x),interp(dot_c,dot_d,rel.x),rel.y);
// return interp
// (interp(da,db,rel.x),interp(dc,dd,rel.x),rel.y);
}
// calculate the [0.0 1.0] relative value of a pixel
Vec2 getrel(int i, int j, float cellsize){
return Vec2
(float
(i // which pixel
-(i/int(cellsize))//which cell
*cellsize)// floor() equivalent
/cellsize,// [0,1] range
float(j-(j/int(cellsize))*cellsize)/cellsize
);
}
// generates an array of random float values
vector<float> seeded_rand_float(unsigned int seed, int many){
vector<float> ret;
std::mt19937 rr;
std::uniform_real_distribution<float> dist(0, 1.0);
rr.seed(seed);
for(int j = 0 ; j < many; ++j)
ret.push_back(dist(rr));
return ret;
}
// use above function to generate an array of random vectors with [0.0 1.0] values
vector<Vec2>seeded_rand_vec2(unsigned int seed, int many){
auto coeffs1 = seeded_rand_float(seed, many*2);
// auto coeffs2 = seeded_rand_float(seed+1, many); //bad choice !
vector<Vec2> pushere;
for(int i = 0; i < many; ++i)
pushere.push_back(Vec2(coeffs1[2*i],coeffs1[2*i+1]));
// pushere.push_back(Vec2(coeffs1[i],coeffs2[i]));
return pushere;
}
// here we make the perlin noise
void make_perlin()
{
int seed = 43;
int pixels = 400; // how many pixels
int divisions = 10; // cell squares
float cellsize = float(pixels)/divisions; // size of a cell
auto randv = seeded_rand_vec2(seed,(divisions+1)*(divisions+1));
// makes the vectors be in [-1.0 1.0] range
for(auto&a:randv)
a = a*2.0f-Vec2(1.f,1.f);
Image img;
img.create(pixels,pixels,Color(0,0,0));
for(int j=0;j<=pixels;++j)
{
for(int i=0;i<=pixels;++i)
{
int ii = int(i/cellsize); // cell index
int jj = int(j/cellsize);
// those are the nearest gradient vectors for the current pixel
Vec2
A = randv[divisions*jj +ii],
B = randv[divisions*jj +ii+1],
C = randv[divisions*(jj+1) +ii],
D = randv[divisions*(jj+1) +ii+1];
float val = getnoise(A,B,C,D,getrel(i,j,cellsize));
val = 255.f*(.5f * val + .7f);
img.setPixel(i,j,Color(val,val,val));
}
}
tx.loadFromImage(img);
spr.setPosition(Vec2(10,10));
spr.setTexture(tx);
};
Here are the results, I included the resulted gradients vector (I multiplied them by cellsize/2).
My question is why are there white artifacts, you can somehow see the squares...
PS: it has been solved, I posted the fixed source here http://pastebin.com/XHEpV2UP
Don't make the mistake of applying a smooth interp on the result instead of the coefficient. Normalizing vectors or adding an offset to avoid zeroes doesn't seem to improve anything. Here is the colorized result:

The human eye is sensitive to discontinuities in the spatial derivative of luminance (brightness). The linear interpolation you're using here is sufficient to make brightness continuous, but it does not not make the derivative of the brightness continuous.
Perlin recommends using eased interpolation to get smoother results. You could use 3*t^2 - 2*t^3 (as suggested in the linked presentation) right in your interpolation function. That should solve the immediate issue.
That would look something like
// interpolation
float linear(float start,float end,float coef){return coef*(end-start)+start;}
float poly(float coef){return 3*coef*coef - 2*coef*coef*coef;}
float interp(float start,float end,float coef){return linear(start, end, poly(coef));}
But note that evaluating a polynomial for every interpolation is needlessly expensive. Usually (including here) this noise is being evaluated over a grid of pixels, with squares being some integer (or rational) number of pixels large; this means that rel.x, rel.y, rel.x-1, and rel.y-1 are quantized to particular possible values. You can make a lookup table for values of the polynomial ahead of time at those values, replacing the "poly" function in the code snippet provided. This technique lets you use even smoother (e.g. degree 5) easing functions at very little additional cost.

Although Jerry is correct in his above answer (I would have simply commented above, but I'm still pretty new to StackOverflow and I have insufficient reputation to comment at the moment)...
And his solution of using:
(3*coef*coef) - (2*coef*coef*coef)
to smooth/curve the interpolation factor works.
The slightly better solution is to simplify the equation to:
(3 - (2*coef)) * coef*coef
the resulting curve is virtually identical (there are slight differences, but they are tiny), and there's 2 less multiplications (and still only a single subtraction) to do per interpolation. Resulting in less computational effort.
This reduction in computation could really add up over time, especially when using the noise function alot. For instance, if you start generating noise in more than 2 dimensions.

Related

Is there a library/code snipped which which can convert meshes to a sdf in vector representation for subvoxel exact representation?

I need to generate a sdf on a grid from a 2D mesh to represent the mesh as a closed body in cinder.
My first approach was to use a distance function (euclidean) to check if a gridpoint is close to a meshpoint and then set the value to - or +, but this resulted in bad resolution. Next I tried to add up distances to get a continuous distance field. which resulted in a blown up object.
I am not sure how to represent the the distance to a closed object described by a mesh (concav or convex). My current approach is described in the code below.
#include <iostream>
#include <fstream>
#include <string>
#include <Eigen/Dense>
#include <vector>
#include <algorithm>
#include <random>
using namespace std;
using namespace Eigen;
typedef Eigen::Matrix<double, 2, 1> Vector2;
typedef Eigen::Matrix<double, 3, 2> Vector32;
typedef std::vector<Vector2, Eigen::aligned_allocator<Vector2> > Vector2List;
typedef std::vector<Eigen::Vector3i, Eigen::aligned_allocator<Eigen::Vector3i> > Vector3iList;
typedef std::vector<Vector32> Vector32List;
typedef Eigen::Array<double, Eigen::Dynamic, Eigen::Dynamic> grid_t;
void f( Vector2List vertices, Vector3iList triangles)
{ // each entry of triangles describe which vertice point belongs
// to a triangle of the mesh
grid_t sdf = grid_t::Zero(resolution, resolution);
for (int x = 0; x < resolution; ++x) {
for (int y = 0; y < resolution; ++y) {
Vector2d pos((x + 0.5) / resolution, (y + 0.5) / resolution);
double dist = 1 / double(resolution*resolution);
double check = 100;
double val = 0;
for (std::vector<Vector2>::iterator mean = vertices.begin(); mean != vertices.end(); ++mean) {
//try sdf with euclidian distance function
check = (pos - *mean).squaredNorm();
if (check < dist) {
val = -1; break;
}
else {
val = 20;
}
}
val *= resolution;
static const double epsilon = 0.01;
if (abs(val) < epsilon) {
val = 0;
numberOfClamped++;
}
sdf(x, y) = val; //
}
}
}
It seems as if you have a slight misunderstanding of what the SDF actually is. So let me start with this.
The Signed Distance Function is a function over 2D space that gives you the distance of the respective point to the closest point on the mesh. The distance is positive for points outside of the mesh and negative for points inside (or the other way around). Naturally, points directly on the mesh will have zero distance. We can represent this function formally as:
sdf(x, y) = distance
This is a continuous function and we need a discrete representation that we can work with. A common choice is to use a uniform grid like the one that you want to use. We then sample the SDF at the grid points. Once we have distance values for all our grid points, we can interpolate the SDF between them to get the SDF everywhere. Note that each sample corresponds to a single point and not an area (e.g., a cell).
With this in mind, let us take a look at your code:
Vector2d pos((x + 0.5) / resolution, (y + 0.5) / resolution);
This depends on how the grid point indices map to global coordinates. It might be correct. However, it looks as if it assumes that sample positions are located in the middle of the respective cells. Again, this might be correct, but I assume the + 0.5 should be left away.
for (std::vector<Vector2>::iterator mean = vertices.begin(); mean != vertices.end(); ++mean)
This is an approximation of the SDF. It calculates the closest vertex of the mesh and not the closest point (which may lie on an edge). For dense meshes, this should be fine. If you have coarse meshes, you should iterate the edges and calculate the closest points on these.
if (check < dist) {
val = -1; break;
} else {
val = 20;
}
I don't really know what this is. As explained above, the value of the SDF is the signed distance. Not some arbitrary value. Also the sign should not correspond to whether the mesh is close to the grid position. So, what you should have done instead is:
if(check < val * val) {
//this point is closer than the current closest point
val = std::sqrt(check); //set to absolute distance
if(*mean is inside the mesh)
val *= -1; //invert the sign
}
And finally, this piece:
val *= resolution;
static const double epsilon = 0.01;
if (abs(val) < epsilon) {
val = 0;
numberOfClamped++;
}
Again, I don't know what this is supposed to do. Just leave it away.

Intensity Histogram ++

I'm writing my own Intensity histogram for greyscale images where the number of bins is passed into the function.
This is what i have so far:
std::vector<unsigned int> Image::histogram(const int bins)
{
std::vector<unsigned int> histogram(bins ,0);
for (unsigned int i(0); i < bins; i++)
{
for (unsigned int j(0); j < m_height * m_width; ++j)
{
if (i == m_p_image[j])
{
histogram[i]++;
}
}
}
return histogram;
}
This works perfectly for 256 bins as each count is added to histogram, but for 128 bins its misses the second half of the image, I know I need to implement a way of grouping points together if the bin size is less than 256 but I'm unsure how to do this.
Your code strikes me as unnecessarily clumsy. There's no real need for the outer loop.
To answer the question you asked, however, the usual way to do this would be to use linear interpolation--that is, find the proportional position of a value in the input range, then increment the same proportional position in the output range.
for (j =0; j<height * width; j++) {
double input_pos = image[j] / 256.0;
int output_pos = int(input_pos * bin_count);
++histogram[output_pos];
}
Given that these are colors, you could (if you chose to) apply a gamma curve instead of doing linear interpolation. The reason to do that would be if you wanted to model how you see colors instead of just basing the histogram on the input numbers themselves. The difference between the two is based on the fact that vision is something like logarithmic instead of linear, so a linear histogram (especially if you're using relatively few bins compared to the number of possible input values) doesn't represent what we see very accurately.

Map float values(0.0, 100.0) into RGB

I have around 1000 float values in the range(0.0, 100.0) and I want to map these values into a color(RGB). What I did so far is to create a colormap with 1000 color(RGB) values, use the float values to index the colormap and get an RGB value.
But the problem is, I'm loosing precision since I cast float values into int before using them as indices to my colormap. What is the best way to do this float to rgb conversion?
EDIT:
color color_list[100];
float float_values[1000]
for(i = 0 to 999)
{
int colormap_idx = float_values[i]; // Note that the float is converted into an int
color current_color = color_list[colormap_idx];
}
The total number of RGB values you can have is 256^3. It would be nice if you could utilize all of them, but sometimes it can be hard to come up with a nice intuitive mapping. Since there are a total possible of 256^4 floats (more than possible RGB values) you will lose precision no matter what you do, but you can still do much, much better than what you currently.
I don't know exactly what you are doing with the pre-defined color map, but consider defining only a few intermediate colors that correspond to a few intermediate floating values and interpolating each input floating point value. In the code below, fsample and csample are your corresponding points. For example:
fsample[0] = 0.0 -> csample[0] = (0, 0, 0)
fsample[1] = 0.25 -> csample[1] = (0, 0, 100)
fsample[2] = 0.5 -> csample[2] = (0, 170, 170)
fsample[3] = 0.75 -> csample[3] = (170, 170, 0)
fsample[4] = 1.0 -> csample[4] = (255, 255, 255)
This will allow you to cover a lot more ground in RGB space with floats, allowing a higher precision conversion, while still giving you some power to flexibly define intermediate colors. This is a fairly common method to convert grayscale to color.
There are a few optimizations and error checks you can apply to this code, but I left it unoptimized for the sake of clarity:
int N = float_values.size();
color colormap[N];
for(i = 0 to N)
{
colormap[i] = RGBFromFloat(float_values[i], fsample, csample, num_samples);
}
color RGBFromFloat(float in, float fsample[], float csample[], num_samples)
{
color out;
// find the interval that the input 'in' lies in
// this is a simple search on an ordered array...
// consider replacing with a better algorithm for a large number of samples
for(i = 0 to num_samples-1)
{
if(fsample[i] =< in && in < fsample[i+1])
{
out = interpolate(fsample[i], fsample[i+1], csample[i], csample[i+1], in);
break;
}
}
return color;
}
color interpolate(float flow, float fhigh, color clow, color chigh, float in)
{
float t = (in-flow)/(fhigh-flow);
return clow*(1 - t) + chigh*t
}
I don't know if this is the best method (since you gave us no optimality criteria), but if by "I'm losing precision" you mean that once converted to int, you only have a maximum of 100 different color combinations, then you can do this:
// this code is C99
#define MAX_FLOAT_VAL 100.0
#define N_COLORS 2000
#define N_FLOAT_SAMPLES 1000
color color_list[N_COLORS];
float float_values[N_FLOAT_SAMPLES];
// the following loop must be placed in some function
for( int i = 0; i < N_FLOAT_SAMPLES; i++ )
{
// the following assignment will map
// linearly a float in the range [0 ... MAX_FLOAT_VAL]
// into an int in the range [0 ... (N_COLORS-1)]
int colormap_idx = (float_values[i] / MAX_FLOAT_VAL) * (N_COLORS - 1);
color current_color = color_list[colormap_idx];
// ... do something with current_color ...
}
Of course you still have to generate the entries in color_list with a suitable algorithm (I advice against doing that by hand :-). This is a whole different problem, since it involves more "degrees of freedom", since you try to map a 1-D space (the values of colormap_idx) to a 3-D space (the set of all the possible RGB triples).
P.S: the requirements you seem to have remind me of the computations needed to colorize a fractal like the graphic representation of the Mandelbrot's set.
Hope this helps.

Artefacts in Interpolated Value Noise

I'm trying to create a basic value noise function. I've reached the point where it's outputting it but within the output there are unexpected artefacts popping up such as diagonal discontinuous lines and blurs. I just can't seem to find what's causing it. Could somebody please take a look at it to see if I'm going wrong somewhere.
First off, here are three images that it's ouputting with greater magnification on each one.
//data members
float m_amplitude, m_frequency;
int m_period; //controls the tile size of the noise
vector<vector<float> m_points; //2D array to store the lattice
//The constructor generates the 2D square lattice and populates it.
Noise2D(int period, float frequency, float amplitude)
{
//initialize the lattice to the appropriate NxN size
m_points.resize(m_period);
for (int i = 0; i < m_period; ++i)
m_points[i].resize(m_period);
//populates the lattice with values between 0 and 1
int seed = 209;
srand(seed);
for(int i = 0; i < m_period; i++)
{
for(int j = 0; j < m_period; j++)
{
m_points[i][j] = abs(rand()/(float)RAND_MAX);
}
}
}
//Evaluates a position
float Evaluate(float x, float y)
{
x *= m_frequency;
y *= m_frequency;
//Gets the integer values from each component
int xFloor = (int) x;
int yFloor = (int) y;
//Gets the decimal data in the range of [0:1] for each of the components for interpolation
float tx = x - xFloor;
float ty = y - yFloor;
//Finds the appropriate boundary lattice array indices using the modulus technique to ensure periodic noise.
int xPeriodLower = xFloor % m_period;
int xPeriodUpper;
if(xPeriodLower == m_period - 1)
xPeriodUpper = 0;
else
xPeriodUpper = xPeriodLower + 1;
int yPeriodLower = yFloor % m_period;
int yPeriodUpper;
if(yPeriodLower == m_period - 1)
yPeriodUpper = 0;
else
yPeriodUpper = yPeriodLower + 1;
//The four random values at each boundary. The naming convention for these follow a single 2d coord system 00 for bottom left, 11 for top right
const float& random00 = m_points[xPeriodLower][yPeriodLower];
const float& random10 = m_points[xPeriodUpper][yPeriodLower];
const float& random01 = m_points[xPeriodLower][yPeriodUpper];
const float& random11 = m_points[xPeriodUpper][yPeriodUpper];
//Remap the weighting of each t dimension here if you wish to use an s-curve profile.
float remappedTx = tx;
float remappedTy = ty;
return MyMath::Bilinear<float>(remappedTx, remappedTy, random00, random10, random01, random11) * m_amplitude;
}
Here are the two interpolation functions that it relies on.
template <class T1>
static T1 Bilinear(const T1 &tx, const T1 &ty, const T1 &p00, const T1 &p10, const T1 &p01, const T1 &p11)
{
return Lerp( Lerp(p00,p10,tx),
Lerp(p01,p11,tx),
ty);
}
template <class T1> //linear interpolation aka Mix
static T1 Lerp(const T1 &a, const T1 &b, const T1 &t)
{
return a * (1 - t) + b * t;
}
Some of the artifacts are the result of linear interpolation. Using a higher order interpolation method would help, but it will only solve part of the problem. Crudely put, sharp transitions in the signal can lead to artifacts.
Additional artifacts result from distributing the starting noise values (I.E. the values you are interpolating among) at equal intervals - in this case, a grid. The highest & lowest values will only ever occur at these grid points - at least when using linear interpolation. Roughly speaking, patterns in the signal can lead to artifacts. Two potential ways I know of addressing this part of the problem are either using a nonlinear interpolation &/or randomly nudging the coordinates of the starting noise values to break up their regularity.
Libnoise has an explanation of generating coherent noise which covers these problems & solutions in greater depth with some nice illustrations. You could also peek at the source if you need see how it deals with these problems. And as richard-tingle already mentioned, simplex noise was designed to correct the artifact problems inherent in Perlin noise; it's a little tougher to get your head around, but it's a solid technique.

Determining if a point is inside a polyhedron

I'm attempting to determine if a specific point lies inside a polyhedron. In my current implementation, the method I'm working on take the point we're looking for an array of the faces of the polyhedron (triangles in this case, but it could be other polygons later). I've been trying to work from the info found here: http://softsurfer.com/Archive/algorithm_0111/algorithm_0111.htm
Below, you'll see my "inside" method. I know that the nrml/normal thing is kind of weird .. it's the result of old code. When I was running this it seemed to always return true no matter what input I give it. (This is solved, please see my answer below -- this code is working now).
bool Container::inside(Point* point, float* polyhedron[3], int faces) {
Vector* dS = Vector::fromPoints(point->X, point->Y, point->Z,
100, 100, 100);
int T_e = 0;
int T_l = 1;
for (int i = 0; i < faces; i++) {
float* polygon = polyhedron[i];
float* nrml = normal(&polygon[0], &polygon[1], &polygon[2]);
Vector* normal = new Vector(nrml[0], nrml[1], nrml[2]);
delete nrml;
float N = -((point->X-polygon[0][0])*normal->X +
(point->Y-polygon[0][1])*normal->Y +
(point->Z-polygon[0][2])*normal->Z);
float D = dS->dot(*normal);
if (D == 0) {
if (N < 0) {
return false;
}
continue;
}
float t = N/D;
if (D < 0) {
T_e = (t > T_e) ? t : T_e;
if (T_e > T_l) {
return false;
}
} else {
T_l = (t < T_l) ? t : T_l;
if (T_l < T_e) {
return false;
}
}
}
return true;
}
This is in C++ but as mentioned in the comments, it's really very language agnostic.
The link in your question has expired and I could not understand the algorithm from your code. Assuming you have a convex polyhedron with counterclockwise oriented faces (seen from outside), it should be sufficient to check that your point is behind all faces. To do that, you can take the vector from the point to each face and check the sign of the scalar product with the face's normal. If it is positive, the point is behind the face; if it is zero, the point is on the face; if it is negative, the point is in front of the face.
Here is some complete C++11 code, that works with 3-point faces or plain more-point faces (only the first 3 points are considered). You can easily change bound to exclude the boundaries.
#include <vector>
#include <cassert>
#include <iostream>
#include <cmath>
struct Vector {
double x, y, z;
Vector operator-(Vector p) const {
return Vector{x - p.x, y - p.y, z - p.z};
}
Vector cross(Vector p) const {
return Vector{
y * p.z - p.y * z,
z * p.x - p.z * x,
x * p.y - p.x * y
};
}
double dot(Vector p) const {
return x * p.x + y * p.y + z * p.z;
}
double norm() const {
return std::sqrt(x*x + y*y + z*z);
}
};
using Point = Vector;
struct Face {
std::vector<Point> v;
Vector normal() const {
assert(v.size() > 2);
Vector dir1 = v[1] - v[0];
Vector dir2 = v[2] - v[0];
Vector n = dir1.cross(dir2);
double d = n.norm();
return Vector{n.x / d, n.y / d, n.z / d};
}
};
bool isInConvexPoly(Point const& p, std::vector<Face> const& fs) {
for (Face const& f : fs) {
Vector p2f = f.v[0] - p; // f.v[0] is an arbitrary point on f
double d = p2f.dot(f.normal());
d /= p2f.norm(); // for numeric stability
constexpr double bound = -1e-15; // use 1e15 to exclude boundaries
if (d < bound)
return false;
}
return true;
}
int main(int argc, char* argv[]) {
assert(argc == 3+1);
char* end;
Point p;
p.x = std::strtod(argv[1], &end);
p.y = std::strtod(argv[2], &end);
p.z = std::strtod(argv[3], &end);
std::vector<Face> cube{ // faces with 4 points, last point is ignored
Face{{Point{0,0,0}, Point{1,0,0}, Point{1,0,1}, Point{0,0,1}}}, // front
Face{{Point{0,1,0}, Point{0,1,1}, Point{1,1,1}, Point{1,1,0}}}, // back
Face{{Point{0,0,0}, Point{0,0,1}, Point{0,1,1}, Point{0,1,0}}}, // left
Face{{Point{1,0,0}, Point{1,1,0}, Point{1,1,1}, Point{1,0,1}}}, // right
Face{{Point{0,0,1}, Point{1,0,1}, Point{1,1,1}, Point{0,1,1}}}, // top
Face{{Point{0,0,0}, Point{0,1,0}, Point{1,1,0}, Point{1,0,0}}}, // bottom
};
std::cout << (isInConvexPoly(p, cube) ? "inside" : "outside") << std::endl;
return 0;
}
Compile it with your favorite compiler
clang++ -Wall -std=c++11 code.cpp -o inpoly
and test it like
$ ./inpoly 0.5 0.5 0.5
inside
$ ./inpoly 1 1 1
inside
$ ./inpoly 2 2 2
outside
If your mesh is concave, and not necessarily watertight, that’s rather hard to accomplish.
As a first step, find the point on the surface of the mesh closest to the point. You need to keep track the location, and specific feature: whether the closest point is in the middle of face, on the edge of the mesh, or one of the vertices of the mesh.
If the feature is face, you’re lucky, can use windings to find whether it’s inside or outside. Compute normal to face (don't even need to normalize it, non-unit-length will do), then compute dot( normal, pt - tri[0] ) where pt is your point, tri[0] is any vertex of the face. If the faces have consistent winding, the sign of that dot product will tell you if it’s inside or outside.
If the feature is edge, compute normals to both faces (by normalizing a cross-product), add them together, use that as a normal to the mesh, and compute the same dot product.
The hardest case is when a vertex is the closest feature. To compute mesh normal at that vertex, you need to compute sum of the normals of the faces sharing that vertex, weighted by 2D angles of that face at that vertex. For example, for vertex of cube with 3 neighbor triangles, the weights will be Pi/2. For vertex of a cube with 6 neighbor triangles the weights will be Pi/4. And for real-life meshes the weights will be different for each face, in the range [ 0 .. +Pi ]. This means you gonna need some inverse trigonometry code for this case to compute the angle, probably acos().
If you want to know why that works, see e.g. “Generating Signed Distance Fields From Triangle Meshes” by J. Andreas Bærentzen and Henrik Aanæs.
I have already answered this question couple years ago. But since that time I’ve discovered much better algorithm. It was invented in 2018, here’s the link.
The idea is rather simple. Given that specific point, compute a sum of signed solid angles of all faces of the polyhedron as viewed from that point. If the point is outside, that sum gotta be zero. If the point is inside, that sum gotta be ±4·π steradians, + or - depends on the winding order of the faces of the polyhedron.
That particular algorithm is packing the polyhedron into a tree, which dramatically improves performance when you need multiple inside/outside queries for the same polyhedron. The algorithm only computes solid angles for individual faces when the face is very close to the query point. For large sets of faces far away from the query point, the algorithm is instead using an approximation of these sets, using some numbers they keep in the nodes of that BVH tree they build from the source mesh.
With limited precision of FP math, and if using that approximated BVH tree losses from the approximation, that angle will never be exactly 0 nor ±4·π. But still, the 2·π threshold works rather well in practice, at least in my experience. If the absolute value of that sum of solid angles is less than 2·π, consider the point to be outside.
It turns out that the problem was my reading of the algorithm referenced in the link above. I was reading:
N = - dot product of (P0-Vi) and ni;
as
N = - dot product of S and ni;
Having changed this, the code above now seems to work correctly. (I'm also updating the code in the question to reflect the correct solution).