I've been reviewing the algorithm for arithmetic coding, and I can't seem to work out why it was given the name "arithmetic coding". Can anybody shed any light?
Because it is using arithmetic, coding the message into a giant fraction. This is as opposed to Huffman which codes each symbol as a series of bits.
From the abstract of the original paper:
Algorithms for encoding and decoding finite strings over a finite
alphabet are described. The coding operations are arithmetic involving
rational numbers ...
Related
Is there some clever and reliable way to print series of bits as an IEEE-754 without actually using a float type?
I have found a way to print fractions, which allows me to represent the float as a a fraction. However, I then came to realize that the exponent may range from -127 to 128 (after adjusting with bias), which may result in the multiplication mantissa * 2^128. The fraction method relies on representing the numerator as an integer, and I would require a really large integer to do this multiplication. I mean, I could use "custom" type to represent this large value (i.e. https://gmplib.org/), but I would prefer if to avoid this. If we multiplied by 10^x, I could simply adjust the decimal point and add some zeros, but sadly that's not the case either.
I have not been able to find anything that mentions any solution for this. Probably due to the fact that googling stuff like "print from
Why am I actually trying to do this?
I'm only doing this to get a better understanding of how floats (IEEE-754 in particular) work, and I find that it always help to do some practical example. So I thought "Hey, why not try to code it?". This has no practical application (that I know of)!
So, almost immediately after posting this, I finally succeded in finding the resources I've been looking for.
https://www.ryanjuckett.com/printing-floating-point-numbers/ talks about it, and references other relevant sources.
I'm trying to perform a modulo operation on two numbers. One is a cpp_dec_float_50 type (decimal) number, and the other is a int1024_t type number - both supplied by the Boost Multiprecision C++ library.
I've tried using the % operator, and the Standard Library's fmod() function, but alas they aren't designed for such data types. I've scoured the Boost documentation however there is no trace of a modulous alternative that would work with its own floating point data types.
Please note that the int1024_t value is very large.
There must be SOME way to perform a modulo operation on such values, hopefully someone can shed some light on it for me.
Cheers.
How do functions such as printf extract digits from a floating point number? I understand how this could be done in principle. Given a number x, of which you want the first n digits, scale x by a power of 10 so that x is between pow(10, n) and pow(10, n-1). Then convert x into an integer, and take the digits of the integer.
I tried this, and it worked. Sort of. My answer was identical to the answer given by printf for the first 16 decimal digits, but tended to differ on the ones after that. How does printf do it?
The classic implementation is David Gay's dtoa. The exact details are somewhat arcane (see Why does "dtoa.c" contain so much code?), but in general it works by doing the base conversion using more precision beyond what you can get from a 32-bit, 64-bit, or even 80-bit floating point number. To do this, it uses so-called "bigints" or arbitrary-precision numbers, which can hold as many digits as you can fit in memory. Gay's code has been copied, with modifications, into countless other libraries including common implementations for the C standard library (so it might power your printf), Java, Python, PHP, JavaScript, etc.
(As a side note... not all of these copies of Gay's dtoa code were kept up to date, so because PHP used an old version of strtod it hung when parsing 2.2250738585072011e-308.)
In general, if you do things the "obvious" and simple way like multiplying by a power of 10 and then converting the integer, you will lose a small amount of precision and some of the results will be inaccurate... but maybe you will get the first 14 or 15 digits correct. Gay's implementation of dtoa() claims to get all the digits correct... but as a result, the code is quite difficult to follow. Skip to the bottom to see strtod itself, you can see that it starts with a "fast path" which just uses ordinary floating-point arithmetic, but then it detects if that result is incorrect and uses a more reliable algorithm using bigints which works in all cases (but is slower).
The implementation has the following citation, which you may find interesting:
* Inspired by "How to Print Floating-Point Numbers Accurately" by
* Guy L. Steele, Jr. and Jon L. White [Proc. ACM SIGPLAN '90, pp. 112-126].
The algorithm works by calculating a range of decimal numbers which produce the given binary number, and by using more digits, the range gets smaller and smaller until you either have an exact result or you can correctly round to the requested number of digits.
In particular, from sec 2.2 Algorithm,
The algorithm uses exact rational arithmetic to perform its computations so that there is no loss of accuracy. In order to generate digits, the algorithm scales the number so that it is of the form 0.d1d2..., where d1, d2, ..., are base-B digits. The first digit is computed by multiplying the scaled number by the output base, B, and taking the integer part. The remainder is used to compute the rest of the digits using the same approach.
The algorithm can then continue until it has the exact result (which is always possible, since floating-point numbers are base 2, and 2 is a factor of 10) or until it has as many digits as requested. The paper goes on to prove the algorithm's correctness.
Also note that not all implementations of printf are based on Gay's dtoa, this is just a particularly common implementation that's been copied a lot.
There are various ways to convert floating-point numbers to decimal numerals without error (either exactly or with rounding to a desired precision).
One method is to use arithmetic as taught in elementary school. C provides functions to work with floating-point numbers, such as frexp, which separates the fraction (also called the significand, often mistakenly called a mantissa) and the exponent. Given a floating-point number, you could create a large array to store decimal digits in and then compute the digits. Each bit in the fraction part of a floating-point number represents some power of two, as determined by the exponent in the floating-point number. So you can simply put a “1” in an array of digits and then use elementary school arithmetic to multiply or divide it the required number of times. You can do that for each bit and then add all the results, and the sum is the decimal numeral that equals the floating-point number.
Commercial printf implementations will use more sophisticated algorithms. Discussing them is beyond the scope of a Stack Overflow question-and-answer. The seminal paper on this is Correctly Rounded Binary-Decimal and Decimal-Binary Conversions by David M. Gay. (A copy appears to be available here, but that seems to be hosted by a third party; I am not sure how official or durable it is. A web search may turn up other sources.) A more recent paper with an algorithm for converting a binary floating-point number to decimal with the shortest number of digits needed to uniquely distinguish the value is Printing Floating-Point Numbers: An Always Correct Method by Marc Andrysco, Ranjit Jhala, and Sorin Lerner.
One key to how it is done is that printf will not just use the floating-point format and its operations to do the work. It will use some form of extended-precision arithmetic, either by working with parts of the floating-point number in an integer format with more bits, by separating the floating-point number into pieces and using multiple floating-point numbers to work with it, or by using a floating-point format with more precision.
Note that the first step in your question, multiple x by a power of ten, already has two rounding errors. First, not all powers of ten are exactly representable in binary floating-point, so just producing such a power of ten necessarily has some representation error. Then, multiplying x by another number often produces a mathematical result that is not exactly representable, so it must be rounded to the floating-point format.
Neither the C or C++ standard does not dictate a certain algorithm for such things. Therefore is impossible to answer how printf does this.
If you want to know an example of a printf implementation, you can have a look here: http://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/vfprintf.c and here: http://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/printf_fp.c
Since there are two ways of implementing an AP fractional number, one is to emulate the storage and behavior of the double data type, only with more bytes, and the other is to use an existing integer APA implementation for representing a fractional number as a rational i.e. as a pair of integers, numerator and denominator, which of the two ways are more likely to deliver efficient arithmetic in terms of performance? (Memory usage is really of minor concern.)
I'm aware of the existing C/C++ libraries, some of which offer fractional APA with "floats" and other with rationals (none of them features fixed-point APA, however) and of course I could benchmark a library that relies on "float" implementation against one that makes use of rational implementation, but the results would largely depend on implementation details of those particular libraries I would have to choose randomly from the nearly ten available ones. So it's more theoretical pros and cons of the two approaches that I'm interested in (or three if take into consideration fixed-point APA).
The question is what you mean by arbitrary precision that you mention in the title. Does it mean "arbitrary, but pre-determined at compile-time and fixed at run-time"? Or does it mean "infinite, i.e. extendable at run-time to represent any rational number"?
In the former case (precision customizable at compile-time, but fixed afterwards) I'd say that one of the most efficient solutions would actually be fixed-point arithmetic (i.e. none of the two you mentioned).
Firstly, fixed-point arithmetic does not require any dedicated library for basic arithmetic operations. It is just a concept overlaid over integer arithmetic. This means that if you really need a lot of digits after the dot, you can take any big-integer library, multiply all your data, say, by 2^64 and you basically immediately get fixed-point arithmetic with 64 binary digits after the dot (at least as long as arithmetic operations are concerned, with some extra adjustments for multiplication and division). This is typically significantly more efficient than floating-point or rational representations.
Note also that in many practical applications multiplication operations are often accompanied by division operations (as in x = y * a / b) that "compensate" for each other, meaning that often it is unnecessary to perform any adjustments for such multiplications and divisions. This also contributes to efficiency of fixed-point arithmetic.
Secondly, fixed-point arithmetic provides uniform precision across the entire range. This is not true for either floating-point or rational representations, which in some applications could be a significant drawback for the latter two approaches (or a benefit, depending on what you need).
So, again, why are you considering floating-point and rational representations only. Is there something that prevents you from considering fixed-point representation?
Since no one else seemed to mention this, rationals and floats represent different sets of numbers. The value 1/3 can be represented precisely with a rational, but not a float. Even an arbitrary precision float would take infinitely many mantissa bits to represent a repeating decimal like 1/3. This is because a float is effectively like a rational but where the denominator is constrained to be a power of 2. An arbitrary precision rational can represent everything that an arbitrary precision float can and more, because the denominator can be any integer instead of just powers of 2. (That is, unless I've horribly misunderstood how arbitrary precision floats are implemented.)
This is in response to your prompt for theoretical pros and cons.
I know you didn't ask about memory usage, but here's a theoretical comparison in case anyone else is interested. Rationals, as mentioned above, specialize in numbers that can be represented simply in fractional notation, like 1/3 or 492113/203233, and floats specialize in numbers that are simple to represent in scientific notation with powers of 2, like 5*2^45 or 91537*2^203233. The amount of ascii typing needed to represent the numbers in their respective human-readable form is proportional to their memory usage.
Please correct me in the comments if I've gotten any of this wrong.
Either way, you'll need multiplication of arbitrary size integers. This will be the dominant factor in your performance since its complexity is worse than O(n*log(n)). Things like aligning operands, and adding or subtracting large integers is O(n), so we'll neglect those.
For simple addition and subtraction, you need no multiplications for floats* and 3 multiplications for rationals. Floats win hands down.
For multiplication, you need one multiplication for floats and 2 multiplications for rational numbers. Floats have the edge.
Division is a little bit more complex, and rationals might win out here, but it's by no means a certainty. I'd say it's a draw.
So overall, IMHO, the fact that addition is at least O(n*log(n)) for rationals and O(n) for floats clearly gives the win to a floating-point representation.
*It is possible that you might need one multiplication to perform addition if your exponent base and your digit base are different. Otherwise, if you use a power of 2 as your base, then aligning the operands takes a bit shift. If you don't use a power of two, then you may also have to do a multiplication by a single digit, which is also an O(n) operation.
You are effectively asking the question: "I need to participate in a race with my chosen animal. Should I choose a turtle or a snail ?".
The first proposal "emulating double" sounds like staggered precision: using an array of doubles of which the sum is the defined number. There is a paper from Douglas M. Priest "Algorithms for Arbitrary Precision Floating Point Arithmetic" which describes how to implement this arithmetic. I implemented this and my experience is very bad: The necessary overhead to make this run drops the performance 100-1000 times !
The other method of using fractionals has severe disadvantages, too: You need to implement gcd and kgv and unfortunately every prime in your numerator or denominator has a good chance to blow up your numbers and kill your performance.
So from my experience they are the worst choices one can made for performance.
I recommend the use of the MPFR library which is one of the fastest AP packages in C and C++.
Rational numbers don't give arbitrary precision, but rather the exact answer. They are, however, more expensive in terms of storage and certain operations with them become costly and some operations are not allowed at all, e.g. taking square roots, since they do not necessarily yield a rational answer.
Personally, I think in your case AP floats would be more appropriate.
My question has no practical application. I'm just interested. Suppose, I have a double value and I want to obtain its string representation similarly to the printf function. How would I do that without the C runtime library? Let's suppose I'm on the x86 architecture.
Given that you state your question has no practical application, I figure you're trying to learn about floating point number representations.
Thus, if you're looking for a solution without using any library support, start with the format specification. From that you can discern the various "special" values (Infinity, NAN, etc) as well as decoding/calculating the actual numeric value. Once you have the significand and exponent, you know where to put the decimal point. You'll have to write your own itoa type routine. For radices which are a power of two, this can be as simple as a lookup table. For decimal, you'll have to do a little extra math.
you can get all values on left side by (double % 10) and then divide by 10 every time.
they will be in right to left.
to get values on right of dot you have to multiply by 10 and then (double % 10). they will be in left-to-right.
If you want to do it simply with a "close enough" result, see my article http://www.exploringbinary.com/quick-and-dirty-floating-point-to-decimal-conversion/ . It describes a simple program that uses floating-point to convert from floating-point to decimal, and explains why that approach can never be accurate for all conversions. (The program doesn't do decimal rounding like printf, but that should be easy enough to add.)