clever way to avoid repetition in Perlin noise - gradient

Hey is there an (easy) way to avoid that Perlin Noise will repeat after a while? I want to use it for endless terrain and do not want to have the same terrain over and over again so i need a solution for this. I would also combine it with fractal brownian motion and write it on the gpu with hls to save some calculation-time.

Sort of, yes.
The repetition you wish to avoid is due to using only the low 16 bits (frequently lowest 8 bits) to determine which pseudo-random gradient vector to associate with each vertex. To get the arbitrarily large repetition scale you desire I would recommend using a wrapped hash for the vertices.
I.e: let Xn = x & 255, Yn = y & 255, etc, then instead of using perm[Xn + perm[Yn]], replace them with a new Xi = (x&255) ^ ((x / 256) & 255) ^ ((x >> 16) & 255) ^ ((x >> 24) & 255) ^ ...
{The bitwise (X)OR being used instead of addition to ensure no overflow occurs in the look up table that is randomizing vector assignment.}
Then use the perm[Xi] ^ perm[Yi] (or perm[Xi + perm[Yi]], whatever) for the gradient selection process.
This way the higher-multiple (large scale) index of your points continues to effect the assignment of noise vectors.
Since you are using fBm as well, your terrain will only truly repeat after all the octaves sync, so if you use a dissonant chord it will take longer. {that being not the simple 2**n normally used, this option however definitely increases the computation time as you would be actually using integer division rather than bit-shifts.} Using the sequence of primes should give the most extreme repetition length possible. Depending on your choice of persistence parameter however, such a treatment may not be worth the extra flops, or even noticeable.

Related

Maximizing a Ratio/Percent

I'm using cvxpy to model a problem.
Inside a very large and complex LP, I create two continuous, affine (unconstrained) expressions:x and y.
Due to how they are created, I know that 0 < x < y <= U. Obviously: x/y < 1.
In my LP objective, how do I maximize the ratio x/y?
Things I tried:
Maximizing x*cp.inv_pos(Y) states my problem is non DCP (also if I try to minimize the inverse)
I found various LP formulations for maximizing ratios (e.g. here or here) but these requires rewriting the constraints on all the terms in the expressions for x - I have no idea how to do that with cvxpy.
If this is the way to go then an example would be very helpful!

Histogram Binning of Gradient Vectors

I am working on a project that has a small component requiring the comparison of distributions over image gradients. Assume I have computed the image gradients in the x and y directions using a Sobel filter and have for each pixel a 2-vector. Obviously getting the magnitude and direction is reasonably trivial and is as follows:
However, what is not clear to me is how to bin these two components in to a two dimensional histogram for an arbitrary number of bins.
I had considered something along these lines(written in browser):
//Assuming normalised magnitudes.
//Histogram dimensions are bins * bins.
int getHistIdx(float mag, float dir, int bins) {
const int magInt = reinterpret_cast<int>(mag);
const int dirInt = reinterpret_cast<int>(dir);
const int magMod = reinterpret_cast<int>(static_cast<float>(1.0));
const int dirMod = reinterpret_cast<int>(static_cast<float>(TWO_PI));
const int idxMag = (magInt % magMod) & bins
const int idxDir = (dirInt % dirMod) & bins;
return idxMag * bins + idxDir;
}
However, I suspect that the mod operation will introduce a lot of incorrect overlap, i.e. completely different gradients getting placed in to the same bin.
Any insight in to this problem would be very much appreciated.
I would like to avoid using any off the shelf libraries as I want to keep this project as dependency light as possible. Also I intend to implement this in CUDA.
This is more of a what is an histogram question? rather than one of your tags. Two things:
In a 2D plain two directions equal by modulation of 2pi are in fact the same - so it makes sense to modulate.
I see no practical or logical reason of modulating the norms.
Next, you say you want a "two dimensional histogram", but return a single number. A 2D histogram, and what would make sense in your context, is a 3D plot - the plane is theta/R, 2 indexed, while the 3D axis is the "count".
So first suggestion, return
return Pair<int,int>(idxMag,idxDir);
Then you can make a 2D histogram, or 2 2D histograms.
Regarding the "number of bins"
this is use case dependent. You need to define the number of bins you want (maybe different for theta and R). Maybe just some constant 10 bins? Maybe it should depend on the amount of vectors? In any case, you need a function that receives either the number of vectors, or the total set of vectors, and returns the number of bins for each axis. This could be a constant (10 bins) initially, and you can play with it. Once you decide on the number of bins:
Determine the bins
For a bounded case such as 0<theta<2 pi, this is easy. Divide the interval equally into the number of bins, assuming a flat distribution. Your modulation actually handles this well - if you would have actually modulated by 2*pi, which you didn't. You would still need to determine the bin bounds though.
For R this gets trickier, as this is unbounded. Two options here, but both rely on the same tactic - choose a maximal bin. Either arbitrarily (Say R=10), so any vector longer than that is placed in the "longer than max" bin. The rest is divided equally (for example, though you could choose other distributions). Another option is for the longest vector to determine the edge of the maximal bin.
Getting the index
Once you have the bins, you need to search the magnitude/direction of the current vector in your bins. If bins are pairs representing min/max of bin (and maybe an index), say in a linked list, then it would be something like (for mag for example):
bin = histogram.first;
while ( mag > bin.min ) bin = bin.next;
magIdx = bin.index;
If the bin does not hold the index you can just use a counter and increase it in the while. Also, for the magnitude the final bin should hold "infinity" or some large number as a limit. Note this has nothing to do with modulation, though that would work for your direction - as you have coded. I don't see how this makes sense for the norm.
Bottom line though, you have to think a bit about what you want. In any case all the "objects" here are trivial enough to write yourself, or even use small arrays.
I think you should arrange your bins in a square array, and then bin by vx and vy independently.
If your gradients are reasonably even you just need to scan the data first to accumulate the min and max in x and y, and then split the gradients evenly.
If the gradients are very unevenly distributed, you might want to sort the (eg) vx first and arrange that the boundaries between each bin exactly evenly divides the values.
An intermediate solution might be to obtain the min and max ignoring the (eg) 10% most extreme values.

Desert fractal OpenGL

we're trying to generate a 3d world using a 2d perlin noise (with a recorsive/fractal technique). We have generated mountains and valleys quite fine but now we are having problems with desert and dunes because we only worked on persistence and octaves and we aren't able to make the classic shape of the dune. Has anybody already experienced that? Any solution, possibly still using perlin noise, or also other algorithms which allow you to do this?
You could give the Musgrave ridged multifractal a try. It gives nice ridged structures and you can use your existing noise algorithms for it.
The C reference implementation for it is here
Dunes are lobsided: .='\ cross section... you may want to use an initial shape of that kind
They are regular, like waves in the sea. not completely noise
they are elongated towards the wind
I didnt use the first condition, but i have made great dunes by multiplying 2 1d perin noises together, or even 2 sin/parabol functions, where they are both lined to one axis. i.e. Z, and they have a small low frequency Sin or noise wobbling them along X axis, so they aren't alined.
try this:
dunes = sin ( X + 1dperlin(Z) *.2 ) * sin ( X + 1dperlin(Z+432) *.2 );
otherwise to test it:
dunes = sin ( X + sin(Z) *.2 ) (plus or times or devided by) sin ( X + sin(Z+432) *.2 );
0.2 makes dunes 10 times longer than wide, and it's like when two straight water waves meet at almost the same angle, plus an uncertainty variable using noise for the angle.
Maybe turbulence is yet enough for what you need... Try to play with turbulence using the absolute value of your octaves return values instead of the normal values. You can also evaluate separately and combine your noise and your turbulence to mix both effects in some areas.

Filtering 1bpp images

I'm looking to filter a 1 bit per pixel image using a 3x3 filter: for each input pixel, the corresponding output pixel is set to 1 if the weighted sum of the pixels surrounding it (with weights determined by the filter) exceeds some threshold.
I was hoping that this would be more efficient than converting to 8 bpp and then filtering that, but I can't think of a good way to do it. A naive method is to keep track of nine pointers to bytes (three consecutive rows and also pointers to either side of the current byte in each row, for calculating the output for the first and last bits in these bytes) and for each input pixel compute
sum = filter[0] * (lastRowPtr & aMask > 0) + filter[1] * (lastRowPtr & bMask > 0) + ... + filter[8] * (nextRowPtr & hMask > 0),
with extra faff for bits at the edge of a byte. However, this is slow and seems really ugly. You're not gaining any parallelism from the fact that you've got eight pixels in each byte and instead are having to do tonnes of extra work masking things.
Are there any good sources for how to best do this sort of thing? A solution to this particular problem would be amazing, but I'd be happy being pointed to any examples of efficient image processing on 1bpp images in C/C++. I'd like to replace some more 8 bpp stuff with 1 bpp algorithms in future to avoid image conversions and copying, so any general resouces on this would be appreciated.
I found a number of years ago that unpacking the bits to bytes, doing the filter, then packing the bytes back to bits was faster than working with the bits directly. It seems counter-intuitive because it's 3 loops instead of 1, but the simplicity of each loop more than made up for it.
I can't guarantee that it's still the fastest; compilers and especially processors are prone to change. However simplifying each loop not only makes it easier to optimize, it makes it easier to read. That's got to be worth something.
A further advantage to unpacking to a separate buffer is that it gives you flexibility for what you do at the edges. By making the buffer 2 bytes larger than the input, you unpack starting at byte 1 then set byte 0 and n to whatever you like and the filtering loop doesn't have to worry about boundary conditions at all.
Look into separable filters. Among other things, they allow massive parallelism in the cases where they work.
For example, in your 3x3 sample-weight-and-filter case:
Sample 1x3 (horizontal) pixels into a buffer. This can be done in isolation for each pixel, so a 1024x1024 image can run 1024^2 simultaneous tasks, all of which perform 3 samples.
Sample 3x1 (vertical) pixels from the buffer. Again, this can be done on every pixel simultaneously.
Use the contents of the buffer to cull pixels from the original texture.
The advantage to this approach, mathematically, is that it cuts the number of sample operations from n^2 to 2n, although it requires a buffer of equal size to the source (if you're already performing a copy, that can be used as the buffer; you just can't modify the original source for step 2). In order to keep memory use at 2n, you can perform steps 2 and 3 together (this is a bit tricky and not entirely pleasant); if memory isn't an issue, you can spend 3n on two buffers (source, hblur, vblur).
Because each operation is working in complete isolation from an immutable source, you can perform the filter on every pixel simultaneously if you have enough cores. Or, in a more realistic scenario, you can take advantage of paging and caching to load and process a single column or row. This is convenient when working with odd strides, padding at the end of a row, etc. The second round of samples (vertical) may screw with your cache, but at the very worst, one round will be cache-friendly and you've cut processing from exponential to linear.
Now, I've yet to touch on the case of storing data in bits specifically. That does make things slightly more complicated, but not terribly much so. Assuming you can use a rolling window, something like:
d = s[x-1] + s[x] + s[x+1]
works. Interestingly, if you were to rotate the image 90 degrees during the output of step 1 (trivial, sample from (y,x) when reading), you can get away with loading at most two horizontally adjacent bytes for any sample, and only a single byte something like 75% of the time. This plays a little less friendly with cache during the read, but greatly simplifies the algorithm (enough that it may regain the loss).
Pseudo-code:
buffer source, dest, vbuf, hbuf;
for_each (y, x) // Loop over each row, then each column. Generally works better wrt paging
{
hbuf(x, y) = (source(y, x-1) + source(y, x) + source(y, x+1)) / 3 // swap x and y to spin 90 degrees
}
for_each (y, x)
{
vbuf(x, 1-y) = (hbuf(y, x-1) + hbuf(y, x) + hbuf(y, x+1)) / 3 // 1-y to reverse the 90 degree spin
}
for_each (y, x)
{
dest(x, y) = threshold(hbuf(x, y))
}
Accessing bits within the bytes (source(x, y) indicates access/sample) is relatively simple to do, but kind of a pain to write out here, so is left to the reader. The principle, particularly implemented in this fashion (with the 90 degree rotation), only requires 2 passes of n samples each, and always samples from immediately adjacent bits/bytes (never requiring you to calculate the position of the bit in the next row). All in all, it's massively faster and simpler than any alternative.
Rather than expanding the entire image to 1 bit/byte (or 8bpp, essentially, as you noted), you can simply expand the current window - read the first byte of the first row, shift and mask, then read out the three bits you need; do the same for the other two rows. Then, for the next window, you simply discard the left column and fetch one more bit from each row. The logic and code to do this right isn't as easy as simply expanding the entire image, but it'll take a lot less memory.
As a middle ground, you could just expand the three rows you're currently working on. Probably easier to code that way.

Converting an FFT to a spectogram

I have an audio file and I am iterating through the file and taking 512 samples at each step and then passing them through an FFT.
I have the data out as a block 514 floats long (Using IPP's ippsFFTFwd_RToCCS_32f_I) with real and imaginary components interleaved.
My problem is what do I do with these complex numbers once i have them? At the moment I'm doing for each value
const float realValue = buffer[(y * 2) + 0];
const float imagValue = buffer[(y * 2) + 1];
const float value = sqrt( (realValue * realValue) + (imagValue * imagValue) );
This gives something slightly usable but I'd rather some way of getting the values out in the range 0 to 1. The problem with he above is that the peaks end up coming back as around 9 or more. This means things get viciously saturated and then there are other parts of the spectrogram that barely shows up despite the fact that they appear to be quite strong when I run the audio through audition's spectrogram. I fully admit I'm not 100% sure what the data returned by the FFT is (Other than that it represents the frequency values of the 512 sample long block I'm passing in). Especially my understanding is lacking on what exactly the compex number represents.
Any advice and help would be much appreciated!
Edit: Just to clarify. My big problem is that the FFT values returned are meaningless without some idea of what the scale is. Can someone point me towards working out that scale?
Edit2: I get really nice looking results by doing the following:
size_t count2 = 0;
size_t max2 = kFFTSize + 2;
while( count2 < max2 )
{
const float realValue = buffer[(count2) + 0];
const float imagValue = buffer[(count2) + 1];
const float value = (log10f( sqrtf( (realValue * realValue) + (imagValue * imagValue) ) * rcpVerticalZoom ) + 1.0f) * 0.5f;
buffer[count2 >> 1] = value;
count2 += 2;
}
To my eye this even looks better than most other spectrogram implementations I have looked at.
Is there anything MAJORLY wrong with what I'm doing?
The usual thing to do to get all of an FFT visible is to take the logarithm of the magnitude.
So, the position of the output buffer tells you what frequency was detected. The magnitude (L2 norm) of the complex number tells you how strong the detected frequency was, and the phase (arctangent) gives you information that is a lot more important in image space than audio space. Because the FFT is discrete, the frequencies run from 0 to the nyquist frequency. In images, the first term (DC) is usually the largest, and so a good candidate for use in normalization if that is your aim. I don't know if that is also true for audio (I doubt it)
For each window of 512 sample, you compute the magnitude of the FFT as you did. Each value represents the magnitude of the corresponding frequency present in the signal.
mag
/\
|
| ! !
| ! ! !
+--!---!----!----!---!--> freq
0 Fs/2 Fs
Now we need to figure out the frequencies.
Since the input signal is of real values, the FFT is symmetric around the middle (Nyquist component) with the first term being the DC component. Knowing the signal sampling frequency Fs, the Nyquist frequency is Fs/2. And therefore for the index k, the corresponding frequency is k*Fs/512
So for each window of length 512, we get the magnitudes at specified frequency. The group of those over consecutive windows form the spectrogram.
Just so people know I've done a LOT of work on this whole problem. The main thing I've discovered is that the FFT requires normalisation after doing it.
To do this you average all the values of your window vector together to get a value somewhat less than 1 (or 1 if you are using a rectangular window). You then divide that number by the number of frequency bins you have post the FFT transform.
Finally you divide the actual number returned by the FFT by the normalisation number. Your amplitude values should now be in the -Inf to 1 range. Log, etc, as you please. You will still be working with a known range.
There are a few things that I think you will find helpful.
The forward FT will tend to give larger numbers in the output than in the input. You can think of it as all of the intensity at a certain frequency being displayed at one place rather than being distributed through the dataset. Does this matter? Probably not because you can always scale the data to fit your needs. I once wrote an integer based FFT/IFFT pair and each pass required rescaling to prevent integer overflow.
The real data that are your input are converted into something that is almost complex. As it turns out buffer[0] and buffer[n/2] are real and independent. There is a good discussion of it here.
The input data are sound intensity values taken over time, equally spaced. They are said to be, appropriately enough, in the time domain. The output of the FT is said to be in the frequency domain because the horizontal axis is frequency. The vertical scale remains intensity. Although it isn't obvious from the input data, there is phase information in the input as well. Although all of the sound is sinusoidal, there is nothing that fixes the phases of the sine waves. This phase information appears in the frequency domain as the phases of the individual complex numbers, but often we don't care about it (and often we do too!). It just depends upon what you are doing. The calculation
const float value = sqrt((realValue * realValue) + (imagValue * imagValue));
retrieves the intensity information but discards the phase information. Taking the logarithm essentially just dampens the big peaks.
Hope this is helpful.
If you are getting strange results then one thing to check is the documentation for the FFT library to see how the output is packed. Some routines use a packed format where real/imaginary values are interleaved, or they may begin at the N/2 element and wrap around.
For a sanity check I would suggest creating sample data with known characteristics, eg Fs/2, Fs/4 (Fs = sample frequency) and compare the output of the FFT routine with what you'd expect. Try creating both a sine and cosine at the same frequency, as these should have the same magnitude in the spectrum, but have different phases (ie the realValue/imagValue will differ, but the sum of squares should be the same.
If you're intending on using the FFT though then you really need to know how it works mathematically, otherwise you're likely to encounter other strange problems such as aliasing.