c++ integer library for 270 digit integers? - c++

I would like to perform basic operations on numbers 270 digits long and I was recommended Matt McCutchen's BigInteger library but I am told it limits you based on what memmory your comp has and my comp has 2.87 GB usable RAM. I want to perform things like division, multiplication etc....any help on what I can use because I don't know yet if my computer's memory will be enough or not.

270 digits is tiny, relatively speaking - it's under 900 bits. Your computer routinely deals with 2048-bit numbers during SSL handshakes.
BigInteger should work fine. You may also want to check out libgmp (the GNU Multi-Precision library).

You'll be fine - 270 digits is not that much in the grand scheme of things.

Try it. 270 digits is nothing. 4096 bit cryptography tasks operate on 1233 digit numbers (granted, using modular arithmetic, so it'll never be larger than 1233 digits..) without breaking a sweat.

Related

c++ decimal implementation based on integers

In my trading application I have to use decimal to represent prices. I need lowest possible latency so the only acceptable solution would be to use int64 to represent decimal. I can configure globally that I do not need for example more then 5 digits after dot, then everywhere
0.0000001 is not supported
0.000001 is not supported
1 should be used instead of 0.00001
10 should be used instead of 0.0001
100 should be used instead of 0.001
1000 should be used instead of 0.01
10000 should be used instead of 0.1
100000 should be used instead of 1
and so on
Are there any libraries that help to do such kind of work? I don't understand completely if I need any library, probably I should just work with int64 and that's it? Any hints and suggestions are welcome.
upd I now realized that devide and multiply are not obvios at all. So i'm looking for some header only library that add some macros of function to devide/multiply fixed point stored in INT64.
What you're suggesting is basically fixed point arithmetic. It's a way of achieving decimal fraction calculations using only integer operations. It can have some speed advantages (on some systems), and if it's done correctly can avoid some of the errors introduced through floating point.
There will be libraries which can help, although the maths involved is quite simple. You might find it's easy enough to read up on the subject and implement it yourself.

Good tolerance for double comparison

I am trying to come up with a good tolerance when comparing doubles in unit tests.
If I allow a fixed tolerance as I've seen mentioned on this site (eg return abs(actual-expected) < 0.00001;), this will frequently fail when numbers are very big due to the nature of floating point representation.
If I use a relative tolerance in terms of % error allowed (eg return abs(actual-expected) < abs(actual * 0.001); this fails too often for small numbers (and for very small numbers, the computation itself can introduce rounding error). Additionally, it allows too much tolerance in certain ranges (eg comparing 2000 and 2001 would pass).
I'm wondering if there's any standard algorithm for allowing tolerance that will work for both small and large numbers. Should I try for some kind of base 2 logarithmic tolerance to mirror floating point storage? Should I do a hybrid approach based on the size of the inputs?
Since this is in unit test code, performance is not a big factor.
The specification of tolerance is a business function. There aren't standards that say "all tolerance must be within +/- .001%". So you have to go to your requirements and figure out what's going to work for you.
Look at your application. Let's say it's for some kind of cutting machine. Are they metal machining tolerances? .005 inches is common. Are they wood cabinet sawing tolerances? 1/32" is sloppy, 1/64" is better. Are they house framing tolerances? Don't expect a carpenter to come closer than 1/4". Hand cutting with a welding torch? Hope for about an inch. The point is simply that every application depends on something different, even when they're doing equivalent things.
If you're just talking "doubles" in general, they're usually good to no better than 15 digits of precision. Floats are good to 7 digits. I round those down by one when I'm thinking about the problem (I don't rely on a double being accurate to more than 14 digits and I stop with floats at six digits); however, if I'm worried about more than the 12th digit of precision I'm generally working with large dollar amounts that have to balance precisely, and I'd be a fool to use non-integer math for them. Business people want their stuff to balance to the penny, and wouldn't approve of rounding off addition operations!
If you're looking at math library operations such as the trig functions, read the library's documentation on each function.

How to handle big data element in c++?

I want to divide the return value of pow(2.0,(n-8)) by 86399.
The problem is 10 <= n <= 100000000.
How can I handle such a large return value?
I'm on Ubuntu 11.10 64 bits, using C++ 4.0.0-8
You can't unless you use a big numbers library. 64 bits can't hold a number that big. And even then, it will probably take a while. 2^(86392) has about 26000 digits in it.
If you want to get just a modulus, there are some nice algorithms for that. See http://en.wikipedia.org/wiki/Modular_exponentiation.
If you want to try bignums still, check out http://gmplib.org/.
One very easy way would be to use GMP -- http://gmplib.org/
This discussion should answer your question Modular Exponentiation for high numbers in C++
For numbers that large, you'll have to do something clever. There's no way you can represent that full number naively in any reasonable way without bigint libraries, and even then it's really too big for brute force. The number itself would take up tens of megabytes.

Measure compression of Huffman Algorithm

I'm revamping my programming skills and implemented the Huffman algorithm. For now, I'm just considering [a-z] with no special characters. The probability values for a-z have been used from wikipedia.
When I run it, I get roughly 2x compression for random paragraphs.
But for this calculation I assume original letters require 8 bits each (ASCII).
But if I think about it, to represent 26 items, i just need 5 bits. If I calculate based on this fact, then compression factor drops to almost 1.1
So my question is, how is the compression factor determined in real world applications?
2nd question - if I write an encoder / decoder which uses 5 bits for representing a-z ( say a=0, b=1, etc) - is this also a considered a valid "compression" algorithm?
You have essentially the right answer, which is that you can't expect a lot of compression if all that you're working with is the letter frequencies of the English language.
The correct way to calculate the gain resulting from knowledge of the letter frequencies is to consider the entropy of a 26-symbol alphabet of equal probabilities with the entropy of the letters in English.
(I wish stackoverflow allowed TeX equations like math.stackexchange.com does. Then I could write decent equations here. Oh well.)
The key formula is -p log(p), where p is the probability of that symbol and the log is in base 2 to get the answer in bits. You calculate this for each symbol and then sum over all symbols.
Then in an ideal arithmetic coding scheme, an equiprobable set of 26-symbols would be coded in 4.70 bits per symbol. For the distribution in English (using the probabilities from the Wikipedia article), we get 4.18 bits per symbol. A reduction of only about 11%.
So that's all the frequency bias by itself can buy you. (It buys you a lot more in Scrabble scores, but I digress.)
We can also look at the same thing in the approximate space of Huffman coding, where each code is an integral number of bits. In this case you would not assume five bits per letter (with six codes wasted). Applying Huffman coding to 26 symbols of equal probability gives six codes that are four bits in length and 20 codes that are five bits in length. This results in 4.77 bits per letter on average. Huffman coding using the letter frequencies occurring in English gives an average of 4.21 bits per letter. A reduction of 12%, which is about the same as the entropy calculation.
There are many ways that real compressors do much better than this. First, they code what is actually in the file, using the frequencies of what's there instead of what they are across the English language. This makes it language independent, optimizes for the actual contents, and doesn't even code symbols that are not present. Second, you can break up the input into pieces and make a new code for each. If the pieces are big enough, then the overhead of transmitting a new code is small, and the gain is usually larger to optimize on a smaller chunk. Third, you can look for higher order effects. Instead of the frequency of single letters, you can take into account the previous letter and look at the probability of the next letter given its predecessor. Now you have 26^2 probabilities (for just letters) to track. These can also be generated dynamically for the actual data at hand, but now you need more data to get a gain, more memory, and more time. You can go to third order, fourth order, etc. for even greater compression performance at the cost of memory and time.
There are other schemes to pre-process the data by, for example, doing run-length encoding, looking for matching strings, applying block transforms, tokenizing XML, delta-coding audio or images, etc., etc. to further expose redundancies for an entropy coder to then take advantage of. I alluded to arithmetic coding, which can be used instead of Huffman to code very probable symbols in less than a bit and all symbols to fractional bit accuracy for better performance in the entropy step.
Back to your question of what constitutes compression, you can begin with any starting point you like, e.g. one eight-bit byte per letter, make assertions about your input, e.g. all lower case letters (accepting that if the assertion is false, the scheme fails), and then assess the compression effectiveness. So long as you use all of the same assumptions when comparing two different compression schemes. You must be careful that anything that is data dependent must also be considered part of the compressed data. E.g. a custom Huffman code derived from a block of data must be sent with that block of data.
If you ran an unrestricted Huffman-coding compression on the same text you'd get the same result, so I think it's reasonable to say that you're getting 2x compression over an ASCII encoding of the same text. I would be more inclined to say that your program is getting the expected compression, but currently has a limitation that it can't handle arbitrary input, and other simpler compression schemes to get compression over ASCII as well if that limitation is in place.
Why not extend your algorithm to handle arbitrary byte values? That way it's easier to make a true heads-up comparison.
It's not 5 bits for 26 character it's log(26) / log(2) = 4,7 bits. This is the maximum entropy but you need to know the specific entropy. For the german language it's 4,0629. When you know that you can use the formula R=Hmax - H. Look here: http://de.wikipedia.org/wiki/Entropie_(Informationstheorie)
http://en.wikipedia.org/wiki/Information_theory#Entropy

how to store 1000000 digit integers in C++

in my problem i have to save big big integers like upto 1000000 digits and to do some operation. how can i do that.i know that a long int in c++ can store upto 10 digits
You can use GMP, the GNU arbitrary precision library. Just be aware that it's not a very good library if you run out of memory.
By that, I mean it will just exit out from underneath you if it cannot allocate memory. I find this an ... interesting ... architectural decision for a general purpose library but it's popular for this sort of stuff so, provided you're willing to wear that restriction, it's probably a good choice.
Another good one is MPIR, a fork of GMP which, despite the name "Multiple Precision Integers and Rationals", handles floating point quite well. I've found these guys far more helpful than the GMP developers when requesting help or suggesting improvements (but, just be aware, that's my experience, your mileage may vary).