"lossless" float to BYTE conversion - c++

I'm using libnoise to generate Perlin noise on a 1024x1024 terrain grid. I want to convert its float output to a BYTE between 0 and 255. The question is ultimately a math one: how can I convert a value in the real interval (-1,1) to the integer one (0,255) minimizing loss?

This formula will give you a number in [0, 255] if your input range excludes the endpoints:
(int)((x + 1.0) * (256.0 / 2.0))
If you are including the endpoints (and then it usually written [-1,1] rather than (-1,1)) you will need a special case for x == 1.0 to round that down to 255.

The best way might depend on the distribution of floats in (-1,1); if in some areas there are more of them and in some there are less, you might want to increase the "precision" in the former at the expense of the latter. Basically, if you have a probability function for the output defined at this interval, you may split (0,1) - the interval of probability values - to 256 equal sub-intervals, and for each given float you calculate into which sub-interval its probability function value falls. For noise the probability function is (or at least should be) close to linear, so perhaps the answer of Mark Byers is the way to go.

Related

How can I effectively calculate the phase angle of a complex number that is (essentially) equal to zero?

I'm writing a C++ program that takes the FFT of a real input signal containing double values and returns a vector X containing std::complex<double> values. Once I have the vector of results I then attempt to calculate the magnitude and phase of the result.
I am running into an issue with calculating the phase angle when one of the outputs is "zero". Zero is in quotes because when a calculation that results in 0 returns a double, the returned value will be very near zero, but not quite exactly zero.
For example, at index 3 my output array has the calculated "zero" value:
X[3] = 3.0531133177191805e-16 - i*5.5511151231257827e-17
I am trying to use the standard library std::arg function that is supposed to return the phase angle of a complex number. std::arg(X[3])
While X[3] is essentially 0, it is not EXACTLY 0 and the way phase is calculated this causes a problem because the calculation uses the ratio of the imaginary part divided by the ratio of the real part which is far from 0!
Doing the actual calculation results in a far from desirable result.
How can I make C++ realize that the result is really 0 so I can get the correct phase angle?
I'm looking for a more elegant solution than using an arbitrary hard-coded "epsilon" value to compare the double to, but so far searching online I haven't had any luck coming up with something better.
If you are computing the floating-point FFT of an input signal, then that signal will include noise, thus have a signal-to-noise ratio, including sensor noise, thermal noise, quantization noise, timing jitter noise, etc.
Thus the threshold for discarding FFT results as below your noise floor most likely isn't a matter of computational mathematics, but part of your physical or electronic data acquisition analysis. You will have to plug that number in, and set the phase to 0.0 or NaN or whatever your default flagging value is for a non-useful (at or below the noise floor) FFT result.
It was brought to my attention that my original answer will not work when the input to the FFT has been scaled. I believe I have an actual valid solution now... The original answer is kept below so that the comments still make sense.
From the comments on this answer and others, I've gathered that calculating the exact rounding error in the language may technically be possible but it is definitely not practical. The best practical solution seems to be to allow the user to provide their own noise threshold (in dB) and ignore any data points whose power level falls below that threshold. It would be impossible to come up with a generic threshold for all situations, but the user can provide a reasonable threshold based on the signal-to-noise ratio of the signal being analyzed and pass that in.
A generic phase calculation function is shown below that calculates the phase angles for a vector of complex data points.
std::vector<double> Phase(std::vector<std::complex<double>> X, double threshold, double amplitude)
{
size_t N = X.size();
std::vector<double> X_phase(N);
std::transform(X.begin(), X.end(), X_phase.begin(), [threshold, amplitude](const std::complex<double>& value) {
double level = 10.0 * std::log10(std::norm(value) / std::pow(amplitude, 2.0));
return level > threshold ? std::arg(value) : 0.0;
});
return X_phase;
}
This function takes 3 arguments:
The vector of complex signal data you want to calculate the phase of.
A sensible threshold -- Can be calculated from the signal-to-noise ratio of whatever measurement device was used to capture the signal. If your signal contains no noise other than the rounding errors of the language itself you can set this to some arbitrary really low value, like -120dB.
The maximum possible amplitude of your input signal. If your signal is calculated, this should simply be set to the amplitude of your signal. If your signal is measured, this should be set to the maximum amplitude the measuring device is capable of measuring (If your signal comes from reading an audio file, often its data will be normalized between -1.0 and 1.0. In this case you would just set the amplitude value to 1.0).
This new implementation still provides me with the correct results, but is much more robust. By leaving the threshold calculation to the user they can set the most sensible value themselves based on the characteristics of the measurement device used to measure their input signal.
Please let me know if you notice any errors or any ways I can further improve the design!
Original Answer
I found a solution that seems generic enough.
In the #include <limits> header, there is a constant value for std::numeric_limits<double>::digits10.
According the the documentation:
The value of std::numeric_limits<T>::digits10 is the number of base-10 digits that can be represented by the type T without change, that is, any number with this many significant decimal digits can be converted to a value of type T and back to decimal form, without change due to rounding or overflow.
Using this I can filter out any output values that have a magnitude lower than this limit:
Calculate the phase of X[3]:
int N = X.size();
auto tmp = std::abs(X[3])/N > std::pow(10, -std::numeric_limits<double>::digits10)
? value
: 0.0
double phase = std::arg(tmp);
This effectively filters out any values that are not precisely zero due to rounding errors within the C++ language itself. It will NOT however filter out garbage data caused by noise in the input signal.
After adding this to my phase calculation I get the expected results.
The map from complex numbers to magnitude and phase is discontinuous at 0.
This is a discontinuity caused by the choice of coordinates you are using.
The solution will depend on why you chose those coordinates in a situation where values near the discontinuity are possible.
It isn't "really" zero. If you factored in error bars properly, your answer would really be a small magnitude (hopefully) and a unconstrained angle.

WASAPI shared mode: What amplitude does the audio engine expect?

I previously messed up with this question. I made it sound as though I'm asking about my particular implementation, but my question is actually about the general topic. I am pretty confident, that my implementation is OK. So I am rewriting this question:
WASAPI gives me information about the audio format that the audio engine accepts in shared mode. I know the expected bit depth of the samples I provide to the buffer. What I don't know, is the expected representation of the signal amplitude in the samples. For example, if the audio engine expects 32 bit samples, does it mean, that I should represent a sine wave amplitude as:
long in range [min, max]
unsigned long in range [0, max]
float in range [min, max]
or even something like float in range [-1, 1]?
(max = std::numeric_limits<type>::max() and min = ...::min() in C++)
So far I've been experimenting with this with different values by trial and error method. It seems, that only when my samples contain numbers max/2 or -min/2 (as a long) alternating (along with other numbers), it produces a sound. Even numbers close to these (+- a few integers) produce the same results. When these two numbers (or numbers close to them) are not present in the samples, the result is silence no matter what I do.
It may be irrelevant, but I noticed, that these numbers' (max/2 and min/2) bit representation (as longs) is identical to IEEE float bit representation of 2.0 and -2.0. It still makes no sense to me, why it works like that.
The typical representation is float -1 to 1 scaled to a fixed point representation. For 32-bit signed you'd ideally like 1 to map to 0x7fffffff and -1 to map to 0x8000000. However, you need to keep in mind that there is asymmetry around 0 such that there is one more negative value than there are positive values. In other words, you shouldn't use 0x80000000 otherwise you'll risk overflow on the positive side.
int xfixed = (int)(xfloat * 0x7fffffff);
More explicitly:
int xfixed = (int)(xfloat * ((1<<(32-1)) - 1));
After a deeper look into the WAVEFORMATEXTENSIBLE structure I found out, that the information I needed might be stored in the SubFormat property. In my case, it was KSDATAFORMAT_SUBTYPE_IEEE_FLOAT. So the audio engine was expecting 32 bit floats in a range of [-1, +1]. For some reason my previous test with float values was unsuccessful, so I kept on trying with integers. Now a simple sine function in the [-1, +1] range provides a correct result. There are some glitches in the sound, but this may be related to some timing issues, when waiting for the buffer.

How can you transform a set of numbers into mostly whole ones?

Small amount of background: I am working on a converter that bridges between a map maker (Tiled) that outputs in XML, and an engine (Angel2D) that inputs lua tables. Most of this is straight forward
However, Tiled outputs in pixel offsets (integers of absolute values), while Angel2D inputs OpenGL units (floats of relative values); a conversion factor between these two is needed (for example, 32px = 1gu). Since OpenGL units are abstract, and the camera can zoom in or out if the objects are too small or big, the actual conversion factor isn't important; I could use a random number, and the user would merely have to zoom in or out.
But it would be best if the conversion factor was selected such that most numbers outputted were small and whole (or fractions of small whole numbers), because that makes it easier to work with (and the whole point of the OpenGL units is that they are easy to work with).
How would I find such a conversion factor reliably?
My first attempt was to use the smallest number given; this resulted in no fractions below 1, but often lead to lots of decimal places where the factors didn't line up.
Then I tried the mode of the sequence, which lead to the largest number of 1's possible, but often lead to very long floats for background images.
My current approach gets the GCD of the whole sequence, which, when it works, works great, but can easily be thrown off course by a single bad apple.
Note that while I could easily just pass the numbers I am given along, or pick some fixed factor, or use one of the conversions I specified above, I am looking for a method to reliably scale this list of integers to small, whole numbers or simple fractions, because this would most likely be unsurprising to the end user; this is not a one off conversion.
The end users tend to use 1.0 as their "base" for manipulations (because it's simple and obvious), so it would make more sense for the sizes of entities to cluster around this.
How about the 'largest number which is a factor of some % of the values'.
So the GCD is the 'largest number which is a factor of 100%' of the values.
You could pick the largest number which is a factor of, say 60%, of the values. I don't know if it's a technical term but it's sort of a 'rough GCD if not a precise GCD'.
You might have to do trail and error to find it (possibly a binary search). But you could also consider sampling. I.e. if you have a million data points, just pick 100 or 1000 at random to find a number which divides evenly into your goal percentage of the sample set and that might be good enough.
some crummy pseudo C.
/** return percent of values in sampleset for which x is a factor */
double percentIsFactorOf(x, sampleset) {
int factorCount = 0;
for (sample : sampleset)
if (sample%x == 0) factorCount++;
return (double)factorCount/sampleset.size;
}
/** find largest value which is a factor of goalPercentage of sampleset */
double findGoodEnoughCommonFactor(sampleset, goalPercentage) {
// slow n^2 alogrithm here - add binary search, sampling, or something smarter to improve if you like
int start = max(sampleset);
while (percentIsFactorOf(start, sampleset)< goalPercent)
start--;
}
If your input is in N^2 (two dimensional space over the field the natural numbers, i.e. non-negative integers), and you need to output to R^2 (two dimensional space over the field of real numbers, which in this case will be represented/approximated with a float).
Forget about scaling for a minute and let the output be of the same scale as the input. The first step is to realize that you the input coordinate <0, 0> does not represent <0, 0> in the output, it represents <0.5f, 0.5f>, the center of the pixel. Similarly the input <2, 3> becomes <2.5, 3.5>. In general the conversion can be performed like this:
float x_prime = (float)x + 0.5f;
float y_prime = (float)y + 0.5f;
Next, you probably want to pick a scaling factor, as you have mentioned. I've always found it useful to pick some real-world unit, usually meters. This way you can reason about other physical aspects of what you're trying to model, because they have units; i.e. speeds, accelerations, can now be in meters per second, or meters per second squared. How many meters tall or wide is the thing you are making? How many meters is a pixel? Pick something that makes sense, and then your formula becomes this:
float x_prime = ((float)x + 0.5f) * (float)units_per_pixel;
float y_prime = ((float)y + 0.5f) * (float)units_per_pixel;
You may not want all of your output coordinates to be in the positive quadrant; that is you may want the origin to be in the center of the object. If you do, you probably want your starting coordinate system's field to include negative integers, or provide some offset to the true center. Lets say you provide a pixel offset to the true center. Your conversion then becomes this:
float x_prime = ((float)x + 0.5f - (float)x_offset) * (float)units_per_pixel;
float y_prime = ((float)y + 0.5f - (float)y_offset) * (float)units_per_pixel;
Discarding your background information, I understand that the underlying problem you are trying to solve is the following:
Given a finite number of (positive) integers {x_1, ... x_N} find some (rational) number f such that all x_i / f are "nice".
If you insist on "nice" meaning integer and as small as possible, then f = GCD is the (mathematically) exact answer to this question. There just is nothing "better", if the GCD is 1, tough luck.
If "nice" is supposed to mean rational with small numerator and denominator, the question gets more interesting and depending on what "small" means, find your trade off between small absolute value (f = max) and small denominator (f = GCD). Notice, however, that small numerator/denominator does not mean small floating point representation, e.g. 1/3 = 0.333333... in base 10.
If you want short floating points, make sure that f is a power of your base, i.e. 10 or 2, depending on whether the numbers should look short to the user or actually have a reasonable machine representation. This is what is used for scientific representation of floating points, which might be the best answer to the question of how to make decimal numbers look nice in the first place.
I have no idea what you are talking about with "GL units".
At the most abstract level, GL has no unit. Vertex coordinates are in object-space initially, and go through half a dozen user-defined transformations before they eventually produce coordinates (window-space) with familiar units (pixels).
You are absolutely correct that even in window-space, coordinates are still not whole numbers. You would not want this in fact, or triangles would jump all over the place and generally would not resemble triangles if their vertex positions were snapped to integer pixel coordinates.
Instead, GL throws sub-pixel precision into the mix. Coordinates still ultimately wind up quantized to integer values, but each integer may cover 1/256th of a pixel given 8-bit sub-pixel precision. Pixel coverage testing is done at the sub-pixel level as you can see here:
(source: microsoft.com)
GL never attempts to find any conversion factor like you are discussing, it just splits the number space for pixel coordinates up into a fixed division between integral and fractional... fixed-point in other words. You might consider doing the same thing.
You can recycle the code you probably currently use for vector normalisation, normalise the values to fit within a max. value of 1; for example:
the formula for 3d normalisation of a vector works fine here
Get the length first:
|a| = sqrt((ax * ax) + (ay * ay) + (az * az))
Then you will need to divide the values of each component by the length:
x = ax/|a|
y = ay/|a|
z = az/|a|
Now all the x, y, z values will fall into the maxima of -1 to 1, the same as the OpenGL base coordinate system.
I know this does not generate the whole numbers system you would like, however it does give a smaller more unified feel to the range.
Say you want to limit the range to whole numbers only, simply use a function like the following, which will take the normalised value and convert it to an int-only range value:
#include <algorithm> // this allows the use of std::min
int maxVal = 256
unsigned char convertToSpread(float floatValueToConvert){
return (unsigned char) (std::min((maxVal-1), (int) (floatValueToConvert * maxVal)));
}
The above will spread your values between 0 and 255, simply increase the value of maxVal to what you need and change the unsigned char to a datatype which suits your needs.
So if you want 1024 values, simply change maxVal to 1024 and unsigned char tounsigned int`
Hope this helps, however, let me know if you need more information as well, and I can elaborate:)

Fourier transform floating point issues

I am implementing a conventional (that means not fast), separated Fourier transform for images. I know that in floating point a sum over one period of sin or cos in equally spaced samples is not perfectly zero, and that this is more a problem with the conventional transform than with the fast.
The algorithm works with 2D double arrays and is correct. The inverse is done inside (over a double sign flag and conditional check when using the asymmetric formula), not outside with conjugations. Results are nearly 100% like expected, so its a question about details:
When I perform a forward transform, save logarithmed magnitude and angle to images, reload them, and do an inverse transform, I experience different types of rounding errors with different types of implemented formulas:
F(u,v) = Sum(x=0->M-1) Sum(y=0->N-1) f(x,y) * e^(-i*2*pi*u*x/M) * e^(-i*2*pi*v*y/N)
f(x,y) = 1/M*N * (like above)
F(u,v) = 1/sqrt(M*N) * (like above)
f(x,y) = 1/sqrt(M*N) * (like above)
So the first one is the asymmetric transform pair, the second one the symmetric. With the asymmetric pair, the rounding errors are more in the bright spots of the image (some pixel are rounded slightly outside value range (e.g. 256)). With the symmetric pair, the errors are more in the constant mid-range area of the image (no exceeding of value range!). In total, it seems that the symmetric pair produces a bit more rounding errors.
Then, it also depends of the input: when image stored in [0,255] the rounding errors are other than when in [0,1].
So my question: how should an optimal, most accurate algorithm be implemented (theoretically, no code): asymmetric/symmetric pair? value range of input in [0,255] or [0,1]? How linearly upscaling result before saving logarithmed one to file?
Edit:
my algorithm simply computes the separated asymmetric or symmetric DFT formula. Factors are decomposed into real and imaginary part using Eulers identity, then expanded and sumed up separately as real and imaginary part:
sum_re += f_re * cos(-mode*pi*((2.0*v*y)/N)) - // mode = 1 for forward, -1
f_im * sin(-mode*pi*((2.0*v*y)/N)); // for inverse transform
// sum_im permutated in the known way and + instead of -
This value grouping indside cos and sin should give in my eyes the lowest rounding error (compared to e.g. cos(-mode*2*pi*v*y/N)), because not multiplicating/dividing significantly false rounded transcedental pi several times, but only one time. Isn't it?
The scale factor 1/M*N or 1/sqrt(M*N) is applied separately after each separation outside of the innermost sum. Better inside? Or combined completely at the end of both separations?
For some deeper analysis, I have quitted the input->transform->save-to-file->read-from-file->transform^-1->output workflow and chosen to compare directly in double-precision: input->transform->transform^-1->output.
Here the results for an real life 704x528 8-bit image (delta = max absolute difference between real part of input and output):
with input inside [0,1] and asymmetric formula: delta = 2.6609e-13 (corresponds to 6.785295e-11 for [0,255] range).
with input insde [0,1] and symmetric formula: delta = 2.65232e-13 (corresponds to 6.763416e-11 for [0,255] range).
with input inside [0,255] and asymmetric formula: delta = 6.74731e-11.
with input inside [0,255] and symmetric formula: delta = 6.7871e-11.
These are no real significant differences, however, the full ranged input with the asymmetric transform performs best. I think the values may get worse with 16-bit input.
But in general I see, that my experienced issues are more because of scaling-before-saving-to-file (or inverse) rounding errors, than real transformation rounding errors.
However, I am curious: what is the most used implementation of the Fourier transform: the symmetric or asymmetric? Which value range is in general used for the input: [0,1] or [0,255]? And usual shown spectra in log scale: e.g. [0,M*N] after asymmetric transform of [0,1] input is directly log-scaled to [0,255] or before linearly scaled to [0,255*M*N]?
The errors you report are tiny, normal, and generally can be ignored. Simply scale your results and clamp any results outside the target interval to the endpoints.
In library implementations of FFTs (that is, FFT routines written to be used generally by diverse applications, not custom designed for a single application), little regard is given to scaling; the routine often simply returns data that has been naturally scaled by the arithmetic, with no additional multiplication operations used to adjust the scale. This is because the scale is often either irrelevant for the application (e.g., finding the frequencies with the largest energies works no matter what the scale is) or that the scale may be distributed through multiply operations and performed just once (e.g., instead of scaling in a forward transform and in an inverse transform, the application can get the same effect by explicitly scaling just once). So, since scaling is often not needed, there is no point in including it in a library routine.
The target interval that data are scaled to depends on the application.
Regarding the question on what transform to use (logarithmic or linear) for showing spectra, I cannot advise; I do not work with visualizing spectra.
Scaling causes roundoff errors. Hence, solution 1 (which scales once) is better than solution 2 (which does it twice). Similarly, scaling once after summation is better than scaling everything before summation.
Do you run y from 0 to 2*N or from -N to +N ? Mathematically it's the same, but you have an extra bit of precision in the latter case.
BTW, what's mode doing in cos(-mode * stuff) ?

Proper way to generate a random float given a binary random number generator?

Let's say we have a binary random number generator, int r(); that will return a zero or a one both with propability 0.5.
I looked at Boost.Random, and they generate, say, 32 bits and do something like this (pseudocode):
x = double(rand_int32());
return min + x / (2^32) * (max - min);
I have some serious doubts about this. A double has 53 bits of mantissa, and 32 bits can never properly generate a fully random mantissa, among other things such as rounding errors, etc.
What would be a fast way to create a uniformly distributed float or double in the half-open range [min, max), assuming IEEE754? The emphasis here lies on correctness of distribution, not speed.
To properly define correct, the correct distribution would be equal to the one that we would get if we would take an infinitely precise uniformly distributed random number generator and for each number we would round to the nearest IEEE754 representation, if that representation would still be within [min, max), otherwise the number would not count for the distribution.
P.S.: I would be interested in correct solutions for open ranges as well.
AFAIK, the correct (and probably also fastest) way is to first create a 64 bit unsigned integer where the 52 fraction bits are random bits, and the exponent is 1023, which if type punned into a (IEEE 754) double will be a uniformly distributed random value in the range [1.0, 2.0). So the last step is to subtract 1.0 from that, resulting in a uniformly distributed random double value in the range [0.0, 1.0).
In pseudo code:
rndDouble = bitCastUInt64ToDouble(1023 << 52 | rndUInt64 & 0xfffffffffffff) - 1.0
This method is mentioned here:
http://xoroshiro.di.unimi.it
(See "Generating uniform doubles in the unit interval")
EDIT: The recommended method has since changed to:
(x >> 11) * (1. / (UINT64_C(1) << 53))
See above link for details.
Here is a correct approach with no attempt at efficiency.
We start with a bignum class, and then a rational wrapper of said bignums.
We produce a range "sufficiently bigger than" our [min, max) range, so that rounding of our smaller_min and bigger_max produces floating point values outside that range, in our rational built on the bignum.
Now we subdivide the range into two parts perfectly down the middle (which we can do, as we have a rational bignum system). We pick one of the two parts at random.
If, after rounding, the top and bottom of the picked range would be (A) outside of [min, max) (on the same side, mind you!) you reject and restart from the beginning.
If (B) the top and bottom of your range rounds to the same double (or float if you are returning a float), you are done, and you return this value.
Otherwise (C) you recurse on this new, smaller range (subdivide, pick randomly, test).
There are no guarantees that this procedure halts, because you can either constantly drill down to the "edge" between two rounding doubles, or you could constantly pick values outside of the [min, max) range. The probability of this happening is (never halting), however, zero (assuming a good random number generator, and a [min, max) of non-zero size).
This also works for (min, max), or even picking a number in the rounded sufficiently fat Cantor set. So long as the measure of the valid range of reals that round to the correct floating point values is non zero, and the range has a compact support, this procedure can be run and has a probability of 100% of terminating, but no hard upper bound on the time it takes can be made.
The problem here is that in IEEE754 the doubles which may be represented are not equi-distributed. That is, if we have a generator generating real numbers, say in (0,1) and then map to IEEE754 representable numbers, the result will not be equi-distributed.
Thus, we have to define "equi-distribution". That said, assuming that each IEEE754 number is just a representative for the probability of lying in the interval defined by the IEEE754 rounding, the procedure of first generating equi-distributed "numbers" and the round to IEEE754 will generate (by definition) an "equi-distribution" of IEEE754 numbers.
Hence, I believe that the above formula will become arbitrary close to such a distribution if we just choose the accuracy high enough. If we restrict the problem to finding a number in [0,1) this means to restricting to the set of denomalized IEEE 754 numbers, which are one-to-one to a 53 bit integer. Thus it should be fast and correct to generate just the mantissa by a 53 bit binary random number generator.
IEEE 754 arithmetic is always "arithmetic at infinite precision followed by rounding", i.e. the IEEE754 number representing ab is the one being closest to ab (put differently, you can think of a*b calculated at infinite precision, then rounded to the closes IEEE754 number). Hence I believe that min + (max-min) * x, where x is a denomalized number, is a feasible approach.
(Note: As clear from my comment, I was first not aware that you where pointing to the case with min and max different from 0,1. The denormalized numbers have the property that they are evenly spaced. Hence you get the equi distribution by mapping the 53 bits to the mantissa. Next you can use the floating point arithmetic, due fact that it is correct up to machine precistion. If you use the reverse mapping you will recover the equi-distribution.
See this question for another aspect of this problem: Scaling Int uniform random range into Double one
std::uniform_real_distribution.
There's a really good talk by S.T.L. from this year’s Going Native conference that explains why you should use the standard distributions whenever possible. In short, hand-rolled code tends to be of laughably poor quality (think std::rand() % 100), or have more subtle uniformity flaws, such as in (std::rand() * 1.0 / RAND_MAX) * 99, which is the example given in the talk and is a special case of the code posted in the question.
EDIT: I took a look at libstdc++’s implementation of std::uniform_real_distribution, and this is what I found:
The implementation produces a number in the range [dist_min, dist_max) by using a simple linear transformation from some number produced in the range [0, 1). It generates this source number using std::generate_canonical, the implementation of which my be found here (at the end of the file). std::generate_canonical determines the number of times (denoted as k) the range of the distribution, expressed as an integer and denoted here as r*, will fit in the mantissa of the target type. What it then does is essentially to generate one number in [0, r) for each r-sized segment of the mantissa and, using arithmetic, populate each segment accordingly. The formula for the resulting value may be expressed as
Σ(i=0, k-1, X/(r^i))
where X is a stochastic variable in [0, r). Each division by the range is equivalent to a shift by the number of bits used to represent it (i.e., log2(r)), and so fills the corresponding mantissa segment. This way, the whole of the precision of the target type is used, and since the range of the result is [0, 1), the exponent remains 0** (modulo bias) and you don’t get the uniformity issues you have when you start messing with the exponent.
I would not trust implicity that this method is cryptographically secure (and I have suspicions about possible off-by-one errors in the calculation of the size of r), but I imagine it is significantly more reliable in terms of uniformity than the Boost implementation you posted, and definitely better than fiddling about with std::rand.
It may be worth noting that the Boost code is in fact a degenerate case of this algorithm where k = 1, meaning that it is equivalent if the input range requires at least 23 bits to represent its size (IEE 754 single-precision) or at least 52 bits (double-precision). This means a minimum range of ~8.4 million or ~4.5e15, respectively. In light of this information, I don’t think that if you’re using a binary generator, the Boost implementation is quite going to cut it.
After a brief look at libc++’s implementation, it looks like they are using what is the same algorithm, implemented slightly differently.
(*) r is actually the range of the input plus one. This allows using the max value of the urng as valid input.
(**) Strictly speaking, the encoded exponent is not 0, as IEEE 754 encodes an implicit leading 1 before the radix of the significand. Conceptually, however, this is irrelevant to this algorithm.