Spline options for interpolating between control points on a population curve? - c++

I'm looking to model population curves using an interpolative spline between 7 control points. My problem is that I can't find any grokkable/digestable coding/math resource that compares the pros and cons of various splines in layman's terms.
First, here's a simplified illustration of the population curve in my code:
struct CurvePoint {
public:
short height; // the point's height along a population curve, measured as age in years (can be negative, representing a building trend whose peak/valley will happen in the future)
double width; // the (nonnegative) width of the population curve at this point, measured as number of people **born within a single year**
// Each CurvePoint represents one "bar" of a population pyramid that's one year tall.
};
class PopulationCurve {
public:
std::array<CurvePoint, 7> control_points; // assumes that control_points[i].height < control_points[i + 1].height and that control_points[i].width >= 0
// control_points[0] is the young end of the curve (typically a nonpositive number; see above)
// control_points[6] is the old end of the curve (typically representing the longevity of the population; if so, control_points[6].width will be 0 since no one is left alive at that point)
std::vector<CurvePoint> constructCurve() {
std::vector<CurvePoint> curve;
short youngest_age = control_points[0].height;
short oldest_age = control_points[6].height;
for (auto a = youngest_age; a <= oldest_age; ++a) {
CurvePoint p;
p.height = a;
// p.width = ??? (the interpolated width at age a)
curve.push_back(p);
}
return curve;
}
void deconstructCurve(std::vector<CurvePoint> curve) {
std::array<CurvePoint, 7> sampled_control_points;
// ??? (turn point samples from the input curve into control points as appropriate)
control_points = sampled_control_points;
}
};
The hardcoding of 7 control points is intentional. I'm implementing a choice between two compression schemes: virtually lossless compression of 7 control points in 44 bytes, and lossy compression of 7 control points in 20 bytes (my application is currently more memory/disk-limited than CPU-limited). I don't believe those compression schemes are relevant to the question, but let me know if I need to show their code, especially if there's a good reason I should be considering <7 or >7 control points.
Here are the criteria I'm looking for in a spline, in descending order of importance:
Interpolation between all control points. This is by far the most important criterion; otherwise, I would've used a Bézier curve or b-spline.
Interpolation between the first and last control point only. If all points aren't interpolated between, then only the first and last can be (i.e. what a Bézier curve or b-spline would've got me).
Fast constructability + deconstructability. There's a near-1:1 correlation between constructCurve() and deconstructCurve(); almost every call to construct will eventually be followed up by a call to deconstruct, so I'm only interested in the combined performance and the not the performance of either one individually. That being said, while I'm very interested in memory/disk optimization right now, I am not going to prematurely optimize speed, so this is a consideration only.
Reasonably accurate deconstructability.
Lossless deconstructability if no changes to the curve are made. i.e. If deconstructCurve(constructCurve()); is called, control_points will remain the same.
Prettiness =) (since linear interpolation between control points is the best match for the rest of the criteria...)
I didn't post this question on math since it's not entirely language-agnostic and contains C++ code. I didn't post it on gamedev since it's a question of implementation and not design.

Related

How approximation search works

[Prologue]
This Q&A is meant to explain more clearly the inner working of my approximations search class which I first published here
Increasing accuracy of solution of transcendental equation
I was requested for more detailed info about this few times already (for various reasons) so I decided to write Q&A style topic about this which I can easily reference in the future and do not need to explain it over and over again.
[Question]
How to approximate values/parameters in Real domain (double) to achieve fitting of polynomials,parametric functions or solve (difficult) equations (like transcendental) ?
Restrictions
real domain (double precision)
C++ language
configurable precision of approximation
known interval for search
fitted value/parameter is not strictly monotonic or not function at all
Approximation search
This is analogy to binary search but without its restrictions that searched function/value/parameter must be strictly monotonic function while sharing the O(log(n)) complexity.
For example Let assume following problem
We have known function y=f(x) and want to find x0 such that y0=f(x0). This can be basically done by inverse function to f but there are many functions that we do not know how to compute inverse to it. So how to compute this in such case?
knowns
y=f(x) - input function
y0 - wanted point y value
a0,a1 - solution x interval range
Unknowns
x0 - wanted point x value must be in range x0=<a0,a1>
Algorithm
probe some points x(i)=<a0,a1> evenly dispersed along the range with some step da
So for example x(i)=a0+i*da where i={ 0,1,2,3... }
for each x(i) compute the distance/error ee of the y=f(x(i))
This can be computed for example like this: ee=fabs(f(x(i))-y0) but any other metrics can be used too.
remember point aa=x(i) with minimal distance/error ee
stop when x(i)>a1
recursively increase accuracy
so first restrict the range to search only around found solution for example:
a0'=aa-da;
a1'=aa+da;
then increase precision of search by lowering search step:
da'=0.1*da;
if da' is not too small or if max recursions count is not reached then go to #1
found solution is in aa
This is what I have in mind:
On the left side is the initial search illustrated (bullets #1,#2,#3,#4). On the right side next recursive search (bullet #5). This will recursively loop until desired accuracy is reached (number of recursions). Each recursion increase the accuracy 10 times (0.1*da). The gray vertical lines represent probed x(i) points.
Here the C++ source code for this:
//---------------------------------------------------------------------------
//--- approx ver: 1.01 ------------------------------------------------------
//---------------------------------------------------------------------------
#ifndef _approx_h
#define _approx_h
#include <math.h>
//---------------------------------------------------------------------------
class approx
{
public:
double a,aa,a0,a1,da,*e,e0;
int i,n;
bool done,stop;
approx() { a=0.0; aa=0.0; a0=0.0; a1=1.0; da=0.1; e=NULL; e0=NULL; i=0; n=5; done=true; }
approx(approx& a) { *this=a; }
~approx() {}
approx* operator = (const approx *a) { *this=*a; return this; }
//approx* operator = (const approx &a) { ...copy... return this; }
void init(double _a0,double _a1,double _da,int _n,double *_e)
{
if (_a0<=_a1) { a0=_a0; a1=_a1; }
else { a0=_a1; a1=_a0; }
da=fabs(_da);
n =_n ;
e =_e ;
e0=-1.0;
i=0; a=a0; aa=a0;
done=false; stop=false;
}
void step()
{
if ((e0<0.0)||(e0>*e)) { e0=*e; aa=a; } // better solution
if (stop) // increase accuracy
{
i++; if (i>=n) { done=true; a=aa; return; } // final solution
a0=aa-fabs(da);
a1=aa+fabs(da);
a=a0; da*=0.1;
a0+=da; a1-=da;
stop=false;
}
else{
a+=da; if (a>a1) { a=a1; stop=true; } // next point
}
}
};
//---------------------------------------------------------------------------
#endif
//---------------------------------------------------------------------------
This is how to use it:
approx aa;
double ee,x,y,x0,y0=here_your_known_value;
// a0, a1, da,n, ee
for (aa.init(0.0,10.0,0.1,6,&ee); !aa.done; aa.step())
{
x = aa.a; // this is x(i)
y = f(x) // here compute the y value for whatever you want to fit
ee = fabs(y-y0); // compute error of solution for the approximation search
}
in the rem above for (aa.init(... are the operand named. The a0,a1 is the interval on which the x(i) is probed, da is initial step between x(i) and n is the number of recursions. so if n=6 and da=0.1 the final max error of x fit will be ~0.1/10^6=0.0000001. The &ee is pointer to variable where the actual error will be computed. I choose pointer so there are not collisions when nesting this and also for speed as passing parameter to heavily used function creates heap trashing.
[notes]
This approximation search can be nested to any dimensionality (but of coarse you need to be careful about the speed) see some examples
Approximation of n points to the curve with the best fit
Curve fitting with y points on repeated x positions (Galaxy Spiral arms)
Increasing accuracy of solution of transcendental equation
Find Minimum area ellipse enclosing a set of points in c++
2D TDoA Time Difference of Arrival
3D TDoA Time Difference of Arrival
In case of non-function fit and the need of getting "all" the solutions you can use recursive subdivision of search interval after solution found to check for another solution. See example:
Given an X co-ordinate, how do I calculate the Y co-ordinate for a point so that it rests on a Bezier Curve
What you should be aware of?
you have to carefully choose the search interval <a0,a1> so it contains the solution but is not too wide (or it would be slow). Also initial step da is very important if it is too big you can miss local min/max solutions or if too small the thing will got too slow (especially for nested multidimensional fits).
A combination of secant (with bracketing, but see correction at the bottom) and bisection method is much better (credit for the original graphics of course due to user Spektre in their answer above):
We find root approximations by secants, and keep the root bracketed as in bisection.
always keep the two edges of the interval so that the delta at one edge is negative, and at the other it is positive, so the root is guaranteed to be inside; and instead of halving, use the secant method.
Pseudocode:
Given a function f,
Given two points a, b, such that a < b and sign(f(a)) /= sign(f(b)),
Given tolerance tol,
TO FIND root z of f such that abs(f(z)) < tol -- stop_condition
DO:
x = root of f by linear interpolation of f between a and b
m = midpoint between a and b
if stop_condition holds at x or m, set z and STOP
[a,b] := [a,x,m,b].sort.choose_shortest_interval_with_
_opposite_signs_at_its_ends
This obviously halves the interval [a,b], or does even better, at each iteration; so unless the function is extremely bad behaving (like, say, sin(1/x) near x=0), this will converge very quickly, taking only two evaluations of f at the most, for each iteration step.
And we can detect the bad behaving cases by checking that b-a not becomes too small (esp. if we're working with finite precision, as in doubles).
update: apparently this is actually double false position method, which is secant with bracketing, as described by the pseudocode above. Augmenting it by the middle point as in bisection ensures convergence even in the most pathological cases.

Pick a matrix cell according to its probability

I have a 2D matrix of positive real values, stored as follow:
vector<vector<double>> matrix;
Each cell can have a value equal or greater to 0, and this value represents the possibility of the cell to be chosen. In particular, for example, a cell with a value equals to 3 has three times the probability to be chosen compared to a cell with value 1.
I need to select N cells of the matrix (0 <= N <= total number of cells) randomly, but according to their probability to be selected.
How can I do that?
The algorithm should be as fast as possible.
I describe two methods, A and B.
A works in time approximately N * number of cells, and uses space O(log number of cells). It is good when N is small.
B works in time approximately (number of cells + N) * O(log number of cells), and uses space O(number of cells). So, it is good when N is large (or even, 'medium') but uses a lot more memory, in practice it might be slower in some regimes for that reason.
Method A:
The first thing you need to do is normalize the entries. (It's not clear to me if you assume they are normalized or not.) That means, sum all the entries and divide by the sum. (This part is potentially slow, so it's better if you assume or require that it already happened.)
Then you sample like this:
Choose a random [i,j] entry of the matrix (by choosing i,j each uniformly randomly from the range of integers 0 to n-1).
Choose a uniformly random real number p in the range [0, 1].
Check if matrix[i][j] > p. If so, return the pair [i][j]. If not, go back to step 1.
Why does this work? The probability that we end at step 3 with any particular output, is equal to, the probability that [i][j] was selected (this is the same for each entry), times the probality that the number p was small enough. This is proportional to the value matrix[i][j], so the sampling is choosing each entry with the correct proportions. It's also possible that at step 3 we go back to the start -- does that bias things? Basically, no. The reason is, suppose we arbitrarily choose a number k and then consider the distribution of the algorithm, conditioned on stopping exactly after k rounds. Conditioned on the assumption that we stop at the k'th round, no matter what value k we choose, the distribution we sample has to be exactly right by the above argument. Since if we eliminate the case that p is too small, the other possibilities all have their proportions correct. Since the distribution is perfect for each value of k that we might condition on, and the overall distribution (not conditioned on k) is an average of the distributions for each value of k, the overall distribution is perfect also.
If you want to analyze the number of rounds that typically needed in a rigorous way, you can do it by analyzing the probability that we actually stop at step 3 for any particular round. Since the rounds are independent, this is the same for every round, and statistically, it means that the running time of the algorithm is poisson distributed. That means it is tightly concentrated around its mean, and we can determine the mean by knowing that probability.
The probability that we stop at step 3 can be determined by considering the conditional probability that we stop at step 3, given that we chose any particular entry [i][j]. By the formulas for conditional expectation, you get that
Pr[ stop at step 3 ] = sum_{i,j} ( 1/(n^2) * Matrix[i,j] )
Since we assumed the matrix is normalized, this sum reduces to just 1/n^2. So, the expected number of rounds is about n^2 (that is, n^2 up to a constant factor) no matter what the entries in the matrix are. You can't hope to do a lot better than that I think -- that's about the same amount of time it takes to just read all the entries of the matrix, and it's hard to sample from a distribution that you cannot even read all of.
Note: What I described is a way to correctly sample a single element -- to get N elements from one matrix, you can just repeat it N times.
Method B:
Basically you just want to compute a histogram and sample inversely from it, so that you know you get exactly the right distribution. Computing the histogram is expensive, but once you have it, getting samples is cheap and easy.
In C++ it might look like this:
// Make histogram
typedef unsigned int uint;
typedef std::pair<uint, uint> upair;
typedef std::map<double, upair> histogram_type;
histogram_type histogram;
double cumulative = 0.0f;
for (uint i = 0; i < Matrix.size(); ++i) {
for (uint j = 0; j < Matrix[i].size(); ++j) {
cumulative += Matrix[i][j];
histogram[cumulative] = std::make_pair(i,j);
}
}
std::vector<upair> result;
for (uint k = 0; k < N; ++k) {
// Do a sample (this should never repeat... if it does not find a lower bound you could also assert false quite reasonably since it means something is wrong with rand() implementation)
while(1) {
double p = cumulative * rand(); // Or, for best results use std::mt19937 or boost::mt19937 and sample a real in the range [0,1] here.
histogram_type::iterator it = histogram::lower_bound(p);
if (it != histogram.end()) {
result.push_back(it->second);
break;
}
}
}
return result;
Here the time to make the histogram is something like number of cells * O(log number of cells) since inserting into the map takes time O(log n). You need an ordered data structure in order to get cheap lookup N * O(log number of cells) later when you do repeated sampling. Possibly you could choose a more specialized data structure to go faster, but I think there's only limited room for improvement.
Edit: As #Bob__ points out in comments, in method (B) a written there is potentially going to be some error due to floating point round-off if the matrices are quite large, even using type double, at this line:
cumulative += Matrix[i][j];
The problem is that, if cumulative is much larger than Matrix[i][j] beyond what the floating point precision can handle then these each time this statement is executed you may observe significant errors which accumulate to introduce significant inaccuracy.
As he suggests, if that happens, the most straightforward way to fix it is to sort the values Matrix[i][j] first. You could even do this in the general implementation to be safe -- sorting these guys isn't going to take more time asymptotically than you already have anyways.

Efficient data structure for sparse data lookup

Situation:
Given some points with coordinate (x, y)
Range 0 < x < 100,000,000 and 0 < y <100,000,000
I have to find smallest square which contains at least N no of points on its edge and inside it.
I used vector to store coordinates and searched all squares with side length minLength upto side length maxLength (Appling Brute Force in relevant space)
struct Point
{
int x;
int y;
};
vector<Point> P;
int minLength = sqrt(N) - 1;
int maxLength = 0;
// bigx= largest x coordinate of any point
// bigy= largest y coordinate of any point
// smallx= smallest x coordinate of any point
// smally= smallest y coordinate of any point
(bigx - smallx) < (bigy - smally) ? maxLength = (bigx - smallx) : maxLength = (bigy - smally);
For each square I looked up, traversed through complete vector to see if at least N points are on its edge and inside it.
This was quite time inefficient.
Q1. What data structure should I use to improve time efficiency without changing Algorithm I used?
Q2. Efficient Algorithm for this problem?
There are points on 2 opposite edges - if not, you could shrink the square by 1 and still contain the same number of points. That means the possible coordinates of the edges are limited to those of the input points. The input points are probably not on the corners, though. (For a minimum rectangle, there would be points on all 4 edges as you can shrink one dimension without altering the other)
The next thing to realize is that each point divides the plane in 4 quadrants, and each quadrant contains a number of points. (These can add up to more than the total number of points as the quadrants have one pixel overlap). Lets say that NW(p) is the number of points to the northwest of point p, i.e. those that have x>=px and y>=py. Then the number of points in a square is NW(bottomleft) + NW(topright) - NW(bottomright) - NW(topleft).
It's fairly easy to calculate NW(p) for all input points. Sort them by x and for equal x by y. The most northwestern point has NW(p)==0. The next point can have NW(p)==1 if it's to the southeast of the first point, else it has NW(p)==0. It's also useful to keep track of SW(p) in this stage, as you're working through the points from west to east and they're therefore not sorted north to south. Having calculated NW(p), you can determine the number of points in a square S in O(1)
Recall that the square size is restricted by by the need to have points on opposite edges. Assume the points are on the left (western) and right edge - you still have the points sorted by x order. Start by assuming the left edge is at your leftmost x coordinate, and see what the right edge must be to contain N points. Now shift the left edge to the next x coordinate and find a new right edge (and thus a new square). Do this until the right edge of the square is the rightmost point.
Its also possible that the square is constrained in y direction. Just sort the points in y direction and repeat, then choose the smallest square between the two outcomes.
Since you're running linearly through the points in x and y direction, that part is just O(N) and the dominant factor is the O(N log N) sort.
Look at http://en.wikipedia.org/wiki/Space_partitioning for algorithms that use the Divide-and-Conquer technique to solve this. This is definitely solvable in Polynomial time.
Another variant algorithms can be on the following lines.
Generate a vornoi-diagram on the points to get neighbour information. [ O(nlog(n)) ]
Now use Dynamic Programming, the DP will be similar to the problem of finding the maximum subarray in a 2D array. Here instead of the sum of numbers, you will keep count of points before it.
2.a Essentially a recursion similar to this will hold. [ O(n) ]
Number of elements in square from (0,0) to (x,y ) = (Number of elems
from square (0,0 to ((x-1),y))+ (Number of elems in square 0,0 - ( x, y-1))
- (Number of elems in (0,0)-((x-1),(y-1)))
Your recurrence will have to change for all the points on its neighbourhood and to the left and above, instead of just the points above and left as above.
Once the DP is ready, you can query the points in a sqare in O(1).
Another O(n^2) loop to find from all possible combinations and find the least square.
You can even greedily start from the smallest squares first, that way you can end your search as soon as you find a suitable square..
The rtree allows spatial searching, but doesn't have stl implementation, although sqlite would allow binding. This can answer "get all points within range", "k nearest neighbours"
Finding a region which has the most dense data, is a problem similar to clustering.
Iterating over the points and finding the N nearest entries to each point. Then generate the smallest circle - centre would be the Max(x) - min(x), Max(y) - min(y). A square can be formed which contains all the neighbours, and would be somewhere between 2r length and 2sqrt(r) length sides compared to circle.
Time taken O(x) to build structure
O(X N log(X)) to search for smallest cluster
Note: There are a bunch of answers for your second question (which will probably reap bigger benefits), but I'm only referring to your first one, i.e. what data to use without changing the algorithm.
There, I think that your choice using a vector is already pretty good, because in general vectors offer the best payload/overhead ratio and also the fastest iteration. In order to find out specific bottlenecks, use a profiler, otherwise you are only guessing. With large vectors, there are a few things to avoid though:
Overallocation, this wastes space.
Underallocation, this causes copying when the vector is grown to the necessary size.
Copying.

Viola Jones AdaBoost running out of memory before even starts

I'm implementing the Viola Jones algorithm for face detection. I'm having issues with the first part of the AdaBoost learning part of the algorithm.
The original paper states
The weak classifier selection algorithm proceeds as follows. For each feature, the examples are sorted based on feature value.
I'm currently working with a relatively small training set of 2000 positive images and 1000 negative images. The paper describes having data sets as large as 10,000.
The main purpose of AdaBoost is to decrease the number of features in a 24x24 window, which totals 160,000+. The algorithm works on these features and selects the best ones.
The paper describes that for each feature, it calculates its value on each image, and then sorts them based on value. What this means is I need to make a container for each feature and store the values of all the samples.
My problem is my program runs out of memory after evaluating only 10,000 of the features (only 6% of them). The overall size of all the containers will end up being 160,000*3000, which is in the billions. How am I supposed to implement this algorithm without running out of memory? I've increased the heap size, and it got me from 3% to 6%, I don't think increasing it much more will work.
The paper implies that these sorted values are needed throughout the algorithm, so I can't discard them after each feature.
Here's my code so far
public static List<WeakClassifier> train(List<Image> positiveSamples, List<Image> negativeSamples, List<Feature> allFeatures, int T) {
List<WeakClassifier> solution = new LinkedList<WeakClassifier>();
// Initialize Weights for each sample, whether positive or negative
float[] positiveWeights = new float[positiveSamples.size()];
float[] negativeWeights = new float[negativeSamples.size()];
float initialPositiveWeight = 0.5f / positiveWeights.length;
float initialNegativeWeight = 0.5f / negativeWeights.length;
for (int i = 0; i < positiveWeights.length; ++i) {
positiveWeights[i] = initialPositiveWeight;
}
for (int i = 0; i < negativeWeights.length; ++i) {
negativeWeights[i] = initialNegativeWeight;
}
// Each feature's value for each image
List<List<FeatureValue>> featureValues = new LinkedList<List<FeatureValue>>();
// For each feature get the values for each image, and sort them based off the value
for (Feature feature : allFeatures) {
List<FeatureValue> thisFeaturesValues = new LinkedList<FeatureValue>();
int index = 0;
for (Image positive : positiveSamples) {
int value = positive.applyFeature(feature);
thisFeaturesValues.add(new FeatureValue(index, value, true));
++index;
}
index = 0;
for (Image negative : negativeSamples) {
int value = negative.applyFeature(feature);
thisFeaturesValues.add(new FeatureValue(index, value, false));
++index;
}
Collections.sort(thisFeaturesValues);
// Add this feature to the list
featureValues.add(thisFeaturesValues);
++currentFeature;
}
... rest of code
This should be the pseudocode for the selection of one of the weak classifiers:
normalize the per-example weights // one float per example
for feature j from 1 to 45,396:
// Training a weak classifier based on feature j.
- Extract the feature's response from each training image (1 float per example)
// This threshold selection and error computation is where sorting the examples
// by feature response comes in.
- Choose a threshold to best separate the positive from negative examples
- Record the threshold and weighted error for this weak classifier
choose the best feature j and threshold (lowest error)
update the per-example weights
Nowhere do you need to store billions of features. Just extract the feature responses on the fly on each iteration. You're using integral images, so extraction is fast. That is the main memory bottleneck, and it's not that much, just one integer for every pixel in every image... basically the same amount of storage as your images required.
Even if you did just compute all the feature responses for all images and save them all so you don't have to do that every iteration, that still only:
45396 * 3000 * 4 bytes =~ 520 MB, or if you're convinced there are 160000 possible features,
160000 * 3000 * 4 bytes =~ 1.78 GB, or if you use 10000 training images,
160000 * 10000 * 4 bytes =~ 5.96 GB
Basically, you shouldn't be running out of memory even if you do store all the feature values.

Algorithms for 3D Mazes [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Are there algorithms to produce 3 dimensional mazes? Essentially the same as a 2D maze but the Z depth axis can be traversed? The idea is still the same though, to get from Start to End. Could backtracking still be used?
Which algorithm should I use to generate a 3D maze?
See here. I mean that you can go into the cube too, not just iterate the faces of it.
I made 2d mazes a few years ago using Kruskal's Algorithm here. There should be no reason this couldn't work with the 3d case you described. Basically you'd consider a cell a cube, and have a large array that has (for every cells), 6 walls in the +/- x, y, and z directions. The algorithm initially starts with all walls everywhere and randomly makes walls disappear until every cell in the maze is connected.
I have the code for generating a 2D maze in, of all things, RPGLE (something I did as a self-exercise while learning the language). Because of the way I wrote it, about the only changes necessary for the general alogrithm would be to add the Z dimension as an additional dimension...
The entire thing is 20 pages long (although this includes input/output), so here's some code. You should be able to translate this into whatever language you need: I translated it from spaghetti-code BASIC (gotos were way overused here, yeah. But it was a fun exercise).
//set out maximum maze size
maximumMazeSquareCounter = mazeHorizontalSize * mazeVerticalSize + 1;
// generate a starting horizontal positiongetRandomNumber(seed : randomNumber);
currentHorizontalPosition = %inth(randomNumber * (mazeHorizontalSize - 1)) + 1;
currentVerticalPosition = 1;
mazeSquareCounter = 1;
// generate the top row of the maze (with entrance)
mazeTopRow = generateEntrance(currentHorizontalPosition);
//write to the printer file
writeMazeDataLine(mazeTopRow);
mazeSquareCounter += 1;
//set the first position in the maze(the entry square
setPathPoint(currentHorizontalPosition : currentVerticalPosition);
//do until we've reached every square in the maze
dou mazeSquareCounter >= maximumMazeSquareCounter;
//get the next available random direction
mazeDirection = getNextRandomDirection(getNextAvailableDirection(currentHorizontalPosition : currentVerticalPosition));
//select what to do by the returned results
select;
//when FALSE is returned - when the maze is trapped
when mazeDirection = FALSE;
//if not at the horizontal end of the maze
if currentHorizontalPosition <> mazeHorizontalSize;
//add one to the position
currentHorizontalPosition += 1;
//else if not at the vertical end of the maze
elseif currentVerticalPosition <> mazeVerticalSize;
//reset the horizontal position
currentHorizontalPosition = 1;
//increment the vertical position
currentVerticalPosition += 1;
//otherwise
else;
//reset both positions
currentHorizontalPosition = 1;
currentVerticalPosition = 1;
endif;
//when 'N' is returned - going up (other directions removed)
when mazeDirection = GOING_NORTH;
//set the point above current as visited
setPathPoint(currentHorizontalPosition : currentVerticalPosition - 1);
//set the wall point to allow passage
setWallDirection(currentHorizontalPosition : currentVerticalPosition : GOING_NORTH);
//change the position variable to reflect change
currentVerticalPosition -= 1;
//increment square counter
mazeSquareCounter += 1;
endsl;
enddo;
//generate a random exit
// get a random number
getRandomNumber(seed : randomNumber);
// set to the horzontal position
currentHorizontalPosition = %inth(randomNumber * (mazeHorizontalSize - 1)) + 1;
//set the vertical position
currentVerticalPosition = mazeVerticalSize;
//set wall to allow for exit
setWallDirection(currentHorizontalPosition : currentVerticalPosition : GOING_SOUTH);
The entire thing is backed by two two-dimensional arrays (well, the RPG equivalent): One for the walls that occupy the 'square', and the other for whether or not that square has been visited. The maze is created after every square has been visited. Garuanteed one-path only, worm-turns maze.
To make this three-dimensional, make it use three-dimensional arrays, and add the necessary dimension index.
I designed an algorithm some time ago for 2D mazes on a square grid, there is no reason why this shouldn't also work for a 3D maze on a cubic grid.
Start with a 3D grid initially fully populated with wall cells.
...
Start an agent at an edge of the grid, the agent travels in a straight line in the X, Y, Z, -X, -Y or -Z direction clearing wall as she travels.
Action 'N' has a small chance of occurring each step.
Action 'M' occurs when the cell directly in front of the agent is wall and the cell in front of that is empty.
'N' is a random choice of:
removing that agent
turning left or right 90 degrees
and creating an agent on the same square turned 90 degrees left, right or both (two agents).
'M' is a random choice of:
removing that agent
removing the wall in front of that agent and then removing that agent
and doing nothing, carrying on
turning left or right 90 degrees.
and creating an agent on the same square turned 90 degrees left, right or both (two agents).
The mazes are distinctive, and their character is highly flexible by adjusting the trigger for 'M' (to do with valid junctions) and by also adjusting the chances of 1 to 8 occurring. You may want to remove an action or two, or introduce your own actions, for example one to make a small clearing or sidestep one step.
The trigger for 'N' can also be another sort of randomness, for example the example below can be used to create fairly branchy mazes that still have some long straight parts.
float n = 1;
while (random_0_to_1 > 0.15)
{
n *= 1.2;
}
return (int)n;
Some small adjustments will be needed from my simple description, for example trigger for action 'M' will need to check the cells adjacent to the cells it checks as well depending on what sort of junctions are desirable.
Either 5 or 6 are needed for the maze to contain cycles and at least one alternative 'M' action to 5 and 6 is required for the maze to contain dead ends.
Some choices of chances/actions and 'M' triggers will tend to make mazes that don't work, for example are unsolvable or full of empty or wall cells, but many will produce consistently nice results.