I previously messed up with this question. I made it sound as though I'm asking about my particular implementation, but my question is actually about the general topic. I am pretty confident, that my implementation is OK. So I am rewriting this question:
WASAPI gives me information about the audio format that the audio engine accepts in shared mode. I know the expected bit depth of the samples I provide to the buffer. What I don't know, is the expected representation of the signal amplitude in the samples. For example, if the audio engine expects 32 bit samples, does it mean, that I should represent a sine wave amplitude as:
long in range [min, max]
unsigned long in range [0, max]
float in range [min, max]
or even something like float in range [-1, 1]?
(max = std::numeric_limits<type>::max() and min = ...::min() in C++)
So far I've been experimenting with this with different values by trial and error method. It seems, that only when my samples contain numbers max/2 or -min/2 (as a long) alternating (along with other numbers), it produces a sound. Even numbers close to these (+- a few integers) produce the same results. When these two numbers (or numbers close to them) are not present in the samples, the result is silence no matter what I do.
It may be irrelevant, but I noticed, that these numbers' (max/2 and min/2) bit representation (as longs) is identical to IEEE float bit representation of 2.0 and -2.0. It still makes no sense to me, why it works like that.
The typical representation is float -1 to 1 scaled to a fixed point representation. For 32-bit signed you'd ideally like 1 to map to 0x7fffffff and -1 to map to 0x8000000. However, you need to keep in mind that there is asymmetry around 0 such that there is one more negative value than there are positive values. In other words, you shouldn't use 0x80000000 otherwise you'll risk overflow on the positive side.
int xfixed = (int)(xfloat * 0x7fffffff);
More explicitly:
int xfixed = (int)(xfloat * ((1<<(32-1)) - 1));
After a deeper look into the WAVEFORMATEXTENSIBLE structure I found out, that the information I needed might be stored in the SubFormat property. In my case, it was KSDATAFORMAT_SUBTYPE_IEEE_FLOAT. So the audio engine was expecting 32 bit floats in a range of [-1, +1]. For some reason my previous test with float values was unsuccessful, so I kept on trying with integers. Now a simple sine function in the [-1, +1] range provides a correct result. There are some glitches in the sound, but this may be related to some timing issues, when waiting for the buffer.
Related
I may get all kinds of flags and penalties thrown at me for this. So please be patient. 2 questions
If the minimal number of bits to represent an arbitrary number of decimals is calculated by log2 (n)*(x)....n is range x is length, then you should be able to calculate max compression by turning the file into decimals by the>>> bin to dec.?
Is this result a law that one can not compress below the theoretical min compression limit, or is it an approximated limit?
Jon Hutton
It's actually a bit (ha) trickier. That formula assumes that the number is drawn from a uniform distribution, which is often not the case, but notably is the case for what is commonly called "random data" (though that is an inaccurate name, since data may be random but drawn from a non-uniform distribution).
The entropy H of X in bits is given by the formula:
H(X) = - sum[i](P(x[i]) log2(P(x[i])))
Where P gives the probability of every value x[i] that X may take. The bounds of i are implied and irrelevant, impossible options have a probability of zero anyway. In the uniform case, P(x[i]) is (by definition) 1/N for any possible x[i], we have H(X) = -N * (1/N log2(1/N)) = -log2(1/N) = log2(N).
The formula should in general not simply be multiplied by the length of the data, that only works if all symbols are independent and identically distributed (so for example on your file with IID uniform-random digits, it does work). Often for meaningful data, the probability distribution for a symbol depends on its context, and indeed a lot of compression techniques are aimed at exploiting this.
There is no law that says you cannot get lucky and thereby compress an individual file to fewer bits than are suggested by its entropy. You can arrange for it to be possible on purpose (but it won't necessarily happen), for example, let's say we expect that any letter is equally probable, but we decide to go against the flow and encode an A with the single bit 0, and any other letter as a 1 followed by 5 bits that indicate which letter it is. This is obviously a bad encoding given the expectation, there are only 26 letters and they're equally probable but we're using more than log2(26) ≈ 4.7 bits on average, the average would be (1 + 25 * 6)/26 ≈ 5.8. However, if by some accident we happen to actually get an A (there is a chance of 1/26th that this happens, the odds are not too bad), we compress it to a single bit, which is much better than expected. Of course one cannot rely on luck, it can only come as a surprise.
For further reference you could read about entropy (information theory) on Wikipedia.
Encoding
As part of a graphical application I'm currently working on, I need to store three signed floats per each pixel of a 32 bit texture. At the moment, to reach this goal, I'm using the following C++ function:
void encode_full(float* rgba, unsigned char* c) {
int range = 8;
for (int i = 0; i < 3; i++) {
rgba[i] += range / 2;
rgba[i] /= range;
rgba[i] *= 255.0f;
c[i] = floor(rgba[i]);
}
c[3] = 255;
}
Although this encoding function brings along a considerable loss in precision, things are made better by the fact that the range of considered values is limited to the interval (-4,4).
Nonetheless, even though the function yields decent results, I think I could do a considerably better job by exploiting the alpha channel (currently unused) to get additional precision. In particular I was thinking to use 11 bits for the first float, 11 bits for the second, and 10 bits for the last float, or 10 - 10 - 10 - 2 (unused). OpenGL has a similar format, called R11F_G11F_B10F.
However, I'm having some difficulties coming up with an encoding function for this particular format. Does anyone know how to write such a function in C++?
Decoding
On the decoding side, this is the function I'm using within my shader.
float3 decode(float4 color) {
int range = 8;
return color.xyz * range - range / 2;
}
Please, notice that the shader is written in Cg, and used within the Unity engine. Furthermore, notice that Unity's implementation of Cg shaders handles only a subset of the Cg language (for instance pack/unpack functions are not supported).
If possible, along with the encoding function, a bit of help for the decoding function would be highly appreciated. Thanks!
Edit
I've mentioned the R11F_G11F_B10F only as a frame of reference for the way the bits are to be split among the color channels. I don't want a float representation, since this would actually imply a loss of precision for the given range, as pointed out in some of the answers.
"10 bits" translates to an integer between 0 and 1023, so the mapping from [-4.0,+4.0] trivially is floor((x+4.0) * (1023.0/8.0)). For 11 bits, substitute 2047.
Decoding is the other way around, (y*8.0/1023.0) - 4.0.
I think using GL_R11F_G11F_B10F is not going to help in your case. As the format name suggests, the components here are 11-bit and 10-bit float numbers, meaning that they are stored as a mantissa and exponent. More specifically, from the spec:
An unsigned 11-bit floating-point number has no sign bit, a 5-bit exponent (E), and a 6-bit mantissa (M).
An unsigned 10-bit floating-point number has no sign bit, a 5-bit exponent (E), and a 5-bit mantissa (M).
In both cases, as common for floating point formats, there is an implicit leading 1 bit for the mantissa. So effectively, the mantissa has 7 bits of precision for the 11-bit case, 6 bits for the 10-bit case.
This is less than the 8-bit precision you're currently using. Now, it's important to understand that the precision for the float case is non-uniform, and relative to the size of the number. So very small numbers would actually have better precision than an 8-bit fixed point number, while numbers towards the top of the range would have worse precision. If you use the natural mapping of your [-4.0, 4.0] range to positive floats, for example by simply adding 4.0 before converting to the 11/10-bit signed float, you would get better precision for values close to -4.0, but worse precision for values close to 4.0.
The main advantage of float formats is really that they can store a much wider range of values, while still maintaining good relative precision.
As long as you want to keep memory use at 4 bytes/pixel, a much better choice for you would be a format like GL_RGB10, giving you an actual precision of 10 bits for each component. This is very similar to GL_RGB10_A2 (and its unsigned sibling GL_RGB10_A2UI), except that it does not expose the alpha component you are not using.
If you're willing to increase memory usage beyond 4 bytes/pixel, you have numerous options. For example, GL_RGBA16 will give you 16 bits of fixed point precision per component. GL_RGB16F gives you 16-bit floats (with 11 bits relative precision). Or you can go all out with GL_RGB32F, which gives you 32-bit float for each component.
Small amount of background: I am working on a converter that bridges between a map maker (Tiled) that outputs in XML, and an engine (Angel2D) that inputs lua tables. Most of this is straight forward
However, Tiled outputs in pixel offsets (integers of absolute values), while Angel2D inputs OpenGL units (floats of relative values); a conversion factor between these two is needed (for example, 32px = 1gu). Since OpenGL units are abstract, and the camera can zoom in or out if the objects are too small or big, the actual conversion factor isn't important; I could use a random number, and the user would merely have to zoom in or out.
But it would be best if the conversion factor was selected such that most numbers outputted were small and whole (or fractions of small whole numbers), because that makes it easier to work with (and the whole point of the OpenGL units is that they are easy to work with).
How would I find such a conversion factor reliably?
My first attempt was to use the smallest number given; this resulted in no fractions below 1, but often lead to lots of decimal places where the factors didn't line up.
Then I tried the mode of the sequence, which lead to the largest number of 1's possible, but often lead to very long floats for background images.
My current approach gets the GCD of the whole sequence, which, when it works, works great, but can easily be thrown off course by a single bad apple.
Note that while I could easily just pass the numbers I am given along, or pick some fixed factor, or use one of the conversions I specified above, I am looking for a method to reliably scale this list of integers to small, whole numbers or simple fractions, because this would most likely be unsurprising to the end user; this is not a one off conversion.
The end users tend to use 1.0 as their "base" for manipulations (because it's simple and obvious), so it would make more sense for the sizes of entities to cluster around this.
How about the 'largest number which is a factor of some % of the values'.
So the GCD is the 'largest number which is a factor of 100%' of the values.
You could pick the largest number which is a factor of, say 60%, of the values. I don't know if it's a technical term but it's sort of a 'rough GCD if not a precise GCD'.
You might have to do trail and error to find it (possibly a binary search). But you could also consider sampling. I.e. if you have a million data points, just pick 100 or 1000 at random to find a number which divides evenly into your goal percentage of the sample set and that might be good enough.
some crummy pseudo C.
/** return percent of values in sampleset for which x is a factor */
double percentIsFactorOf(x, sampleset) {
int factorCount = 0;
for (sample : sampleset)
if (sample%x == 0) factorCount++;
return (double)factorCount/sampleset.size;
}
/** find largest value which is a factor of goalPercentage of sampleset */
double findGoodEnoughCommonFactor(sampleset, goalPercentage) {
// slow n^2 alogrithm here - add binary search, sampling, or something smarter to improve if you like
int start = max(sampleset);
while (percentIsFactorOf(start, sampleset)< goalPercent)
start--;
}
If your input is in N^2 (two dimensional space over the field the natural numbers, i.e. non-negative integers), and you need to output to R^2 (two dimensional space over the field of real numbers, which in this case will be represented/approximated with a float).
Forget about scaling for a minute and let the output be of the same scale as the input. The first step is to realize that you the input coordinate <0, 0> does not represent <0, 0> in the output, it represents <0.5f, 0.5f>, the center of the pixel. Similarly the input <2, 3> becomes <2.5, 3.5>. In general the conversion can be performed like this:
float x_prime = (float)x + 0.5f;
float y_prime = (float)y + 0.5f;
Next, you probably want to pick a scaling factor, as you have mentioned. I've always found it useful to pick some real-world unit, usually meters. This way you can reason about other physical aspects of what you're trying to model, because they have units; i.e. speeds, accelerations, can now be in meters per second, or meters per second squared. How many meters tall or wide is the thing you are making? How many meters is a pixel? Pick something that makes sense, and then your formula becomes this:
float x_prime = ((float)x + 0.5f) * (float)units_per_pixel;
float y_prime = ((float)y + 0.5f) * (float)units_per_pixel;
You may not want all of your output coordinates to be in the positive quadrant; that is you may want the origin to be in the center of the object. If you do, you probably want your starting coordinate system's field to include negative integers, or provide some offset to the true center. Lets say you provide a pixel offset to the true center. Your conversion then becomes this:
float x_prime = ((float)x + 0.5f - (float)x_offset) * (float)units_per_pixel;
float y_prime = ((float)y + 0.5f - (float)y_offset) * (float)units_per_pixel;
Discarding your background information, I understand that the underlying problem you are trying to solve is the following:
Given a finite number of (positive) integers {x_1, ... x_N} find some (rational) number f such that all x_i / f are "nice".
If you insist on "nice" meaning integer and as small as possible, then f = GCD is the (mathematically) exact answer to this question. There just is nothing "better", if the GCD is 1, tough luck.
If "nice" is supposed to mean rational with small numerator and denominator, the question gets more interesting and depending on what "small" means, find your trade off between small absolute value (f = max) and small denominator (f = GCD). Notice, however, that small numerator/denominator does not mean small floating point representation, e.g. 1/3 = 0.333333... in base 10.
If you want short floating points, make sure that f is a power of your base, i.e. 10 or 2, depending on whether the numbers should look short to the user or actually have a reasonable machine representation. This is what is used for scientific representation of floating points, which might be the best answer to the question of how to make decimal numbers look nice in the first place.
I have no idea what you are talking about with "GL units".
At the most abstract level, GL has no unit. Vertex coordinates are in object-space initially, and go through half a dozen user-defined transformations before they eventually produce coordinates (window-space) with familiar units (pixels).
You are absolutely correct that even in window-space, coordinates are still not whole numbers. You would not want this in fact, or triangles would jump all over the place and generally would not resemble triangles if their vertex positions were snapped to integer pixel coordinates.
Instead, GL throws sub-pixel precision into the mix. Coordinates still ultimately wind up quantized to integer values, but each integer may cover 1/256th of a pixel given 8-bit sub-pixel precision. Pixel coverage testing is done at the sub-pixel level as you can see here:
(source: microsoft.com)
GL never attempts to find any conversion factor like you are discussing, it just splits the number space for pixel coordinates up into a fixed division between integral and fractional... fixed-point in other words. You might consider doing the same thing.
You can recycle the code you probably currently use for vector normalisation, normalise the values to fit within a max. value of 1; for example:
the formula for 3d normalisation of a vector works fine here
Get the length first:
|a| = sqrt((ax * ax) + (ay * ay) + (az * az))
Then you will need to divide the values of each component by the length:
x = ax/|a|
y = ay/|a|
z = az/|a|
Now all the x, y, z values will fall into the maxima of -1 to 1, the same as the OpenGL base coordinate system.
I know this does not generate the whole numbers system you would like, however it does give a smaller more unified feel to the range.
Say you want to limit the range to whole numbers only, simply use a function like the following, which will take the normalised value and convert it to an int-only range value:
#include <algorithm> // this allows the use of std::min
int maxVal = 256
unsigned char convertToSpread(float floatValueToConvert){
return (unsigned char) (std::min((maxVal-1), (int) (floatValueToConvert * maxVal)));
}
The above will spread your values between 0 and 255, simply increase the value of maxVal to what you need and change the unsigned char to a datatype which suits your needs.
So if you want 1024 values, simply change maxVal to 1024 and unsigned char tounsigned int`
Hope this helps, however, let me know if you need more information as well, and I can elaborate:)
Let's say we have a binary random number generator, int r(); that will return a zero or a one both with propability 0.5.
I looked at Boost.Random, and they generate, say, 32 bits and do something like this (pseudocode):
x = double(rand_int32());
return min + x / (2^32) * (max - min);
I have some serious doubts about this. A double has 53 bits of mantissa, and 32 bits can never properly generate a fully random mantissa, among other things such as rounding errors, etc.
What would be a fast way to create a uniformly distributed float or double in the half-open range [min, max), assuming IEEE754? The emphasis here lies on correctness of distribution, not speed.
To properly define correct, the correct distribution would be equal to the one that we would get if we would take an infinitely precise uniformly distributed random number generator and for each number we would round to the nearest IEEE754 representation, if that representation would still be within [min, max), otherwise the number would not count for the distribution.
P.S.: I would be interested in correct solutions for open ranges as well.
AFAIK, the correct (and probably also fastest) way is to first create a 64 bit unsigned integer where the 52 fraction bits are random bits, and the exponent is 1023, which if type punned into a (IEEE 754) double will be a uniformly distributed random value in the range [1.0, 2.0). So the last step is to subtract 1.0 from that, resulting in a uniformly distributed random double value in the range [0.0, 1.0).
In pseudo code:
rndDouble = bitCastUInt64ToDouble(1023 << 52 | rndUInt64 & 0xfffffffffffff) - 1.0
This method is mentioned here:
http://xoroshiro.di.unimi.it
(See "Generating uniform doubles in the unit interval")
EDIT: The recommended method has since changed to:
(x >> 11) * (1. / (UINT64_C(1) << 53))
See above link for details.
Here is a correct approach with no attempt at efficiency.
We start with a bignum class, and then a rational wrapper of said bignums.
We produce a range "sufficiently bigger than" our [min, max) range, so that rounding of our smaller_min and bigger_max produces floating point values outside that range, in our rational built on the bignum.
Now we subdivide the range into two parts perfectly down the middle (which we can do, as we have a rational bignum system). We pick one of the two parts at random.
If, after rounding, the top and bottom of the picked range would be (A) outside of [min, max) (on the same side, mind you!) you reject and restart from the beginning.
If (B) the top and bottom of your range rounds to the same double (or float if you are returning a float), you are done, and you return this value.
Otherwise (C) you recurse on this new, smaller range (subdivide, pick randomly, test).
There are no guarantees that this procedure halts, because you can either constantly drill down to the "edge" between two rounding doubles, or you could constantly pick values outside of the [min, max) range. The probability of this happening is (never halting), however, zero (assuming a good random number generator, and a [min, max) of non-zero size).
This also works for (min, max), or even picking a number in the rounded sufficiently fat Cantor set. So long as the measure of the valid range of reals that round to the correct floating point values is non zero, and the range has a compact support, this procedure can be run and has a probability of 100% of terminating, but no hard upper bound on the time it takes can be made.
The problem here is that in IEEE754 the doubles which may be represented are not equi-distributed. That is, if we have a generator generating real numbers, say in (0,1) and then map to IEEE754 representable numbers, the result will not be equi-distributed.
Thus, we have to define "equi-distribution". That said, assuming that each IEEE754 number is just a representative for the probability of lying in the interval defined by the IEEE754 rounding, the procedure of first generating equi-distributed "numbers" and the round to IEEE754 will generate (by definition) an "equi-distribution" of IEEE754 numbers.
Hence, I believe that the above formula will become arbitrary close to such a distribution if we just choose the accuracy high enough. If we restrict the problem to finding a number in [0,1) this means to restricting to the set of denomalized IEEE 754 numbers, which are one-to-one to a 53 bit integer. Thus it should be fast and correct to generate just the mantissa by a 53 bit binary random number generator.
IEEE 754 arithmetic is always "arithmetic at infinite precision followed by rounding", i.e. the IEEE754 number representing ab is the one being closest to ab (put differently, you can think of a*b calculated at infinite precision, then rounded to the closes IEEE754 number). Hence I believe that min + (max-min) * x, where x is a denomalized number, is a feasible approach.
(Note: As clear from my comment, I was first not aware that you where pointing to the case with min and max different from 0,1. The denormalized numbers have the property that they are evenly spaced. Hence you get the equi distribution by mapping the 53 bits to the mantissa. Next you can use the floating point arithmetic, due fact that it is correct up to machine precistion. If you use the reverse mapping you will recover the equi-distribution.
See this question for another aspect of this problem: Scaling Int uniform random range into Double one
std::uniform_real_distribution.
There's a really good talk by S.T.L. from this year’s Going Native conference that explains why you should use the standard distributions whenever possible. In short, hand-rolled code tends to be of laughably poor quality (think std::rand() % 100), or have more subtle uniformity flaws, such as in (std::rand() * 1.0 / RAND_MAX) * 99, which is the example given in the talk and is a special case of the code posted in the question.
EDIT: I took a look at libstdc++’s implementation of std::uniform_real_distribution, and this is what I found:
The implementation produces a number in the range [dist_min, dist_max) by using a simple linear transformation from some number produced in the range [0, 1). It generates this source number using std::generate_canonical, the implementation of which my be found here (at the end of the file). std::generate_canonical determines the number of times (denoted as k) the range of the distribution, expressed as an integer and denoted here as r*, will fit in the mantissa of the target type. What it then does is essentially to generate one number in [0, r) for each r-sized segment of the mantissa and, using arithmetic, populate each segment accordingly. The formula for the resulting value may be expressed as
Σ(i=0, k-1, X/(r^i))
where X is a stochastic variable in [0, r). Each division by the range is equivalent to a shift by the number of bits used to represent it (i.e., log2(r)), and so fills the corresponding mantissa segment. This way, the whole of the precision of the target type is used, and since the range of the result is [0, 1), the exponent remains 0** (modulo bias) and you don’t get the uniformity issues you have when you start messing with the exponent.
I would not trust implicity that this method is cryptographically secure (and I have suspicions about possible off-by-one errors in the calculation of the size of r), but I imagine it is significantly more reliable in terms of uniformity than the Boost implementation you posted, and definitely better than fiddling about with std::rand.
It may be worth noting that the Boost code is in fact a degenerate case of this algorithm where k = 1, meaning that it is equivalent if the input range requires at least 23 bits to represent its size (IEE 754 single-precision) or at least 52 bits (double-precision). This means a minimum range of ~8.4 million or ~4.5e15, respectively. In light of this information, I don’t think that if you’re using a binary generator, the Boost implementation is quite going to cut it.
After a brief look at libc++’s implementation, it looks like they are using what is the same algorithm, implemented slightly differently.
(*) r is actually the range of the input plus one. This allows using the max value of the urng as valid input.
(**) Strictly speaking, the encoded exponent is not 0, as IEEE 754 encodes an implicit leading 1 before the radix of the significand. Conceptually, however, this is irrelevant to this algorithm.
I want to store billions (10^9) of double precision floating point numbers in memory and save space. These values are grouped in thousands of ordered sets (they are time series), and within a set, I know that the difference between values is usually not large (compared to their absolute value). Also, the closer to each other, the higher the probability of the difference being relatively small.
A perfect fit would be a delta encoding that stores only the difference of each value to its predecessor. However, I want random access to subsets of the data, so I can't depend on going through a complete set in sequence. I'm therefore using deltas to a set-wide baseline that yields deltas which I expect to be within 10 to 50 percent of the absolute value (most of the time).
I have considered the following approaches:
divide the smaller value by the larger one, yielding a value between 0 and 1 that could be stored as an integer of some fixed precision plus one bit for remembering which number was divided by which. This is fairly straightforward and yields satisfactory compression, but is not a lossless method and thus only a secondary choice.
XOR the IEEE 754 binary64 encoded representations of both values and store the length of the long stretches of zeroes at the beginning of the exponent and mantissa plus the remaining bits which were different. Here I'm quite unsure how to judge the compression, although I think it should be good in most cases.
Are there standard ways to do this? What might be problems about my approaches above? What other solutions have you seen or used yourself?
Rarely are all the bits of a double-precision number meaningful.
If you have billions of values that are the result of some measurement, find the calibration and error of your measurement device. Quantize the values so that you only work with meaningful bits.
Often, you'll find that you only need 16 bits of actual dynamic range. You can probably compress all of this into arrays of "short" that retain all of the original input.
Use a simple "Z-score technique" where every value is really a signed fraction of the standard deviation.
So a sequence of samples with a mean of m and a standard deviation of s gets transformed into a bunch of Z score. Normal Z-score transformations use a double, but you should use a fixed-point version of that double. s/1000 or s/16384 or something that retains only the actual precision of your data, not the noise bits on the end.
for u in samples:
z = int( 16384*(u-m)/s )
for z in scaled_samples:
u = s*(z/16384.0)+m
Your Z-scores retain a pleasant easy-to-work with statistical relationship with the original samples.
Let's say you use a signed 16-bit Z-score. You have +/- 32,768. Scale this by 16,384 and your Z-scores have an effective resolution of 0.000061 decimal.
If you use a signed 24-but Z-score, you have +/- 8 million. Scale this by 4,194,304 and you have a resolution of 0.00000024.
I seriously doubt you have measuring devices this accurate. Further, any arithmetic done as part of filter, calibration or noise reduction may reduce the effective range because of noise bits introduced during the arithmetic. A badly thought-out division operator could make a great many of your decimal places nothing more than noise.
Whatever compression scheme you pick, you can decouple that from the problem of needing to be able to perform arbitrary seeks by compressing into fixed-size blocks and prepending to each block a header containing all the data required to decompress it (e.g. for a delta encoding scheme, the block would contain deltas enconded in some fashion that takes advantage of their small magnitude to make them take less space, e.g. fewer bits for exponent/mantissa, conversion to fixed-point value, Huffman encoding etc; and the header a single uncompressed sample); seeking then becomes a matter of cheaply selecting the appropriate block, then decompressing it.
If the compression ratio is so variable that much space is being wasted padding the compressed data to produce fixed size blocks, a directory of offsets into the compressed data could be built instead and the state required to decompress recorded in that.
If you know a group of doubles has the same exponent, you could store the exponent once, and only store the mantissa for each value.