modf() with BIG NUMBERS - c++

I hope this finds you well.
I am trying to convert an index (number) for a word, using the ASCII code for that.
for ex:
index 0 -> " "
index 94 -> "~"
index 625798 -> "e#A"
index 899380 -> "!$^."
...
As we all can see, the 4th index correspond to a 4 char string. Unfortunately, at some point, these combinations get really big (i.e., for a word of 8 chars, i need to perform operations with 16 digit numbers (ex: 6634204312890625), and it gets really worse if I raise the number of chars of the word).
To support such big numbers, I had to upgrade some variables of my program from unsigned int to unsigned long long, but then I realized that modf() from C++ uses doubles and uint32_t (http://www.raspberryginger.com/jbailey/minix/html/modf_8c-source.html).
The question is: is this possible to adapt modf() to use 64 bit numbers like unsigned long long? I'm afraid that in case this is not possible, i'll be limited to digits of double length.
Can anyone enlight me please? =)

16-digit numbers fit within the range of a 64-bit number, so you should use uint64_t (from <stdint.h>). The % operator should then do what you need.
If you need bigger numbers, then you'll need to use a big-integer library. However, if all you're interested in is modulus, then there's a trick you can pull, based on the following properties of modulus:
mod(a * b) == mod(mod(a) * mod(b))
mod(a + b) == mod(mod(a) + mod(b))
As an example, let's express a 16-digit decimal number, x as:
x = x_hi * 1e8 + x_lo; // this is pseudocode, not real C
where x_hi is the 8 most-significant decimal digits, and x_lo the least-significant. The modulus of x can then be expressed as:
mod(x) = mod((mod(x_hi) * mod(1e8) + mod(x_lo));
where mod(1e8) is a constant which you can precalculate.
All of this can be done in integer arithmetic.

I could actually use a comment that was deleted right after (wonder why), that said:
modulus = a - a/b * b;
I've made a cast in the division to unsigned long long.
Now... I was a bit disappointed, because in my problem I thought I could keep raising the number of characters of the word with no problem. Nevertheless, I've started to get size issues at the n.º of chars = 7. Why? 95^7 starts to give huge numbers.
I was hoping to get the possibility to write a word like "my cat is so fat I 1234r5s" and calculate the index of this, but this word has almost 30 characters:
95^26 = 2635200944657423647039506726457895338535308837890625 combinations.
Anyway, thanks for the answer.

Related

How do I convert a string of binary numbers to their signed decimal representation using C++?

I have a string:
1010
The unsigned representation of the string comes out be 10 after doing:
string immediateValue = "1010";
char immediateChars[5];
strcpy(immediateChars, immediateValue.c_str());
char * ptr;
long parsedInteger = strtol(immediateChars, &ptr, 2);
As I understand strtol can be used to only get unsigned representation. Is there a way to get the 2's complement value which would be -6?
Check your first character, if it is 0 then use n = strtol normally, if it is 1 then flip the bits, e.g. "1010" to "0101", then take strtol of the flipped string, the negative of that value minus one is your answer.
The numbers are binary strings and you have to convert them to denary.
Nuisance. The strings are likely too long to simply be packed in a long long, which makes life a lot easier (shove the bits in the long long, then call sprintf()). You'll havew to write your own binary division routine, divide by binary ten ("1010") and take the remainder. That's your last digit. Then repeat until the number goes to zero. As a last step, reverse the denary digits to match the most significant digit to the left convention.
It's quite a hunk of code but reasonable as learning exercise.

Splitting a double into multiple parts

As the title stated, I want to split a multi-digit double into multiple parts each containing 4 digits.
I've already round the double to the closet integers, so there are no fraction parts, what left is a really really long double, that far exceeds the largest long long integer.
I need to split the integers parts into several 4 digit parts, which with integers is quite simple, a while statement like this would do:
unsigned long long int IntegerWithSeveralParts;
unsigned short int i = 0;
unsigned int Parts[10];
while ( IntegerWithSeveralParts )
{
Parts[i] = IntegerWithSeveralParts % 10000;
IntegerWithSeveralParts /= 10000;
++ i;
}
Yes, I know the parts are in reverse order, but a vector could fix that. The problem is, I can't perform modular on floats and doubles, which is quite a big deal to me. I can convert those into strings, and do the splitting from that, but that is quite time-consuming as that will include the use of streams.
Is there anyway else to do it?
The fmod() family of functions from <cmath> provides the floating point remainder of x / y where x and y are the two arguments to the function.
You could always "promote" the fractional part to an integer by multiplying by 10000 then modulating by 10000. This will take the first four decimal places and make an int out of them. Do this repeatedly to get more 4-digit chunks.
You can use fmod for floating point modular arithmatic. But a easier solution is to use stringstream to convert the float into string and just split up the string into piece with up to 4 length.
When you convert a binary representation of a floating-point value into a decimal representation (typically by converting to text in order to display it), you'll typically end up with an unbounded series of decimal digits. When you set the precision of an output stream you tell the runtime library to limit the number of digits that it displays and to round the tail. What you're asking for here is essentially the same thing, but with a different arrangement of the digits than an output stream presents. The simplest way to do this is to use an output stream to generate however many digits you want, and then rearrange them as you would like:
std::string result;
std::ostring_stream out;
out << std::fixed << std::precision(12) << result;
// now fiddle with result however you'd like

How to convert unlimited length binary to decimal

The most common way is to get the power of 2 for each non-zero position of the binary number, and then sum them up. This is not workable when the binary number is huge, say,
10000...0001 //1000000 positions
It is impossible to let the computer compute the pow(2,1000000). So the traditional way is not workable.
Other way to do this?
Could someone give an arithmetic method about how to compute, not library?
As happydave said, there are existing libraries (such as GMP) for this type of thing. If you need to roll your own for some reason, here's an outline of a reasonably efficient approach.
You'll need bigint subtraction, comparison and multiplication.
Cache values of 10^(2^n) in your binary format until the next value is bigger than your binary number. This will allow you to quickly generate a power of ten by doing the following:
Select the largest value in your cache smaller than your remaining number, store this
in a working variable.
do{
Multiply it by the next largest value in your cache and store the result in a
temporary value.
If the new value is still smaller, set your working value to this number (swapping
references here rather than allocating new memory is a good idea),
Keep a counter to see which digit you're at. If this changes by more than one
between instances of the outer loop, you need to pad with zeros
} Until you run out of cache
This is your next base ten value in binary, subtract it from your binary number while
the binary number is larger than your digit, the number of times you do this is the
decimal digit -- you can cheat a little here by comparing the most significant bits
and finding a lower bound before trying subtraction.
Repeat until your binary number is 0
This is roughly O(n^4) with regards to number of binary digits, and O(nlog(n)) with regards to memory. You can get that n^4 closer to n^3 by using a more sophisticated multiplication algorithm.
You could write your own class for handling arbitrarily large integers (which you can represent as an array of integers, or whatever makes the most sense), and implement the operations (*, pow, etc.) yourself. Or you could google "C++ big integer library", and find someone else who has already implemented it.
It is impossible to let the computer compute the pow(2,1000000). So the traditional way is not workable.
It is not impossible. For example, Python can do the arithmetic calculation instantly, and the conversion to a decimal number in about two seconds (on my machine). Python has built in facilities for dealing with large integers that exceed the size of a machine word.
In C++ (and C), a good choice of big integer library is GMP. It is robust, well tested, and actively maintained. It includes a C++ wrapper that uses operator overloading to provide a nice interface (except, there is no C++ operator for the pow() operation).
Here is a C++ example that uses GMP:
#include <iostream>
#include <gmpxx.h>
int main(int, char *[])
{
mpz_class a, b;
a = 2;
mpz_pow_ui(b.get_mpz_t(), a.get_mpz_t(), 1000000);
std::string s = b.get_str();
std::cout << "length is " << s.length() << std::endl;
return 0;
}
The output of the above is
length is 301030
which executes on my machine in 0.18 seconds.
"This is roughly O(n^4) with regards to number of binary digits, and O(nlog(n)) with regards to memory". You can do O(n^(2 + epsilon)) operations (where n is the number of binary digits), and O(n) memory as follows: Let N be an enormous number of binary length n. Compute the residues mod 2 (easy; grab the low bit) and mod 5 (not easy but not terrible; break the binary string into successive strings of four bits; compute the residue mod 5 of each such 4-tuple, and add them up as with casting out 9's for decimal numbers.). By computing the residues mod 2 and 5 you can read off the low decimal digit. Subtract this; divide by 10 (the internet documents ways to do this), and repeat to get the next-lowest digit.
I calculated 2 ** 1000000 and converted it to decimal in 9.3 seconds in Smalltalk so it's not impossible. Smalltalk has large integer libraries built in.
2 raisedToInteger: 1000000
As mentioned in another answer, you need a library that handles arbitrary precision integer numbers. Once you have that, you do MOD 10 and DIV 10 operations on it to compute the decimal digits in reverse order (least significant to most significant).
The rough idea is something like this:
LargeInteger *a;
char *string;
while (a != 0) {
int remainder;
LargeInteger *quotient;
remainder = a % 10.
*string++ = remainder + 48.
quotient = a / 10.
}
Many details are missing (or wrong) here concerning type conversions, memory management and allocation of objects but it's meant to demonstrate the general technique.
It's quite simple with the Gnu Multiprecision Library. Unfortunately, I couldn't test this program because it seems I need to rebuild my library after a compiler upgrade. But there's not much room for error!
#include "gmpxx.h"
#include <iostream>
int main() {
mpz_class megabit( "1", 10 );
megabit <<= 1000000;
megabit += 1;
std::cout << megabit << '\n';
}

BigInt implementation - converting a string to binary representatio stored as unsigned int

I'm doing a BigInt implementation in C++ and I'm having a hard time figuring out how to create a converter from (and to) string (C string would suffice for now).
I implement the number as an array of unsigned int (so basically putting blocks of bits next to each other). I just can't figure out how to convert a string to this representation.
For example if usigned int would be 32b and i'd get a string of "4294967296", or "5000000000" or basically anything larger than what a 32b int can hold, how would I properly convert it to appropriate binary representation?
I know I'm missing something obvious, and I'm only asking for a push to the right direction. Thanks for help and sorry for asking such a silly question!
Well one way (not necessarily the most efficient) is to implement the usual arithmetic operators and then just do the following:
// (pseudo-code)
// String to BigInt
String s = ...;
BigInt x = 0;
while (!s.empty())
{
x *= 10;
x += s[0] - '0';
s.pop_front();
}
Output(x);
// (pseudo-code)
// BigInt to String
BigInt x = ...;
String s;
while (x > 0)
{
s += '0' + x % 10;
x /= 10;
}
Reverse(s);
Output(s);
If you wanted to do something trickier than you could try the following:
If input I is < 100 use above method.
Estimate D number of digits of I by bit length * 3 / 10.
Mod and Divide by factor F = 10 ^ (D/2), to get I = X*F + Y;
Execute recursively with I=X and I=Y
Implement and test the string-to-number algorithm using a builtin type such as int.
Implement a bignum class with operator+, operator*, and whatever else the above algorithm uses.
Now the algorithm should work unchanged with the bignum class.
Use the string conversion algo to debug the class, not the other way around.
Also, I'd encourage you to try and write at a high level, and not fall back on C constructs. C may be simpler, but usually does not make things easier.
Take a look at, for instance, mp_toradix and mp_read_radix in Michael Bromberger's MPI.
Note that repeated division by 10 (used in the above) performs very poorly, which shows up when you have very big integers. It's not the "be all and end all", but it's more than good enough for homework.
A divide and conquer approach is possible. Here is the gist. For instance, given the number 123456789, we can break it into pieces: 1234 56789, by dividing it by a power of 10. (You can think of these pieces of two large digits in base 100,000. Now performing the repeated division by 10 is now cheaper on the two pieces! Dividing 1234 by 10 three times and 56879 by 10 four times is cheaper than dividing 123456789 by 10 eight times.
Of course, a really large number can be recursively broken into more than two pieces.
Bruno Haibl's CLN (used in CLISP) does something like that and it is blazingly fast compared to MPI, in converting numbers with thousands of digits to numeric text.

Analysis of the usage of prime numbers in hash functions

I was studying hash-based sort and I found that using prime numbers in a hash function is considered a good idea, because multiplying each character of the key by a prime number and adding the results up would produce a unique value (because primes are unique) and a prime number like 31 would produce better distribution of keys.
key(s)=s[0]*31(len–1)+s[1]*31(len–2)+ ... +s[len–1]
Sample code:
public int hashCode( )
{
int h = hash;
if (h == 0)
{
for (int i = 0; i < chars.length; i++)
{
h = MULT*h + chars[i];
}
hash = h;
}
return h;
}
I would like to understand why the use of even numbers for multiplying each character is a bad idea in the context of this explanation below (found on another forum; it sounds like a good explanation, but I'm failing to grasp it). If the reasoning below is not valid, I would appreciate a simpler explanation.
Suppose MULT were 26, and consider
hashing a hundred-character string.
How much influence does the string's
first character have on the final
value of 'h'? The first character's value
will have been multiplied by MULT 99
times, so if the arithmetic were done
in infinite precision the value would
consist of some jumble of bits
followed by 99 low-order zero bits --
each time you multiply by MULT you
introduce another low-order zero,
right? The computer's finite
arithmetic just chops away all the
excess high-order bits, so the first
character's actual contribution to 'h'
is ... precisely zero! The 'h' value
depends only on the rightmost 32
string characters (assuming a 32-bit
int), and even then things are not
wonderful: the first of those final 32
bytes influences only the leftmost bit
of `h' and has no effect on the
remaining 31. Clearly, an even-valued
MULT is a poor idea.
I think it's easier to see if you use 2 instead of 26. They both have the same effect on the lowest-order bit of h. Consider a 33 character string of some character c followed by 32 zero bytes (for illustrative purposes). Since the string isn't wholly null you'd hope the hash would be nonzero.
For the first character, your computed hash h is equal to c[0]. For the second character, you take h * 2 + c[1]. So now h is 2*c[0]. For the third character h is now h*2 + c[2] which works out to 4*c[0]. Repeat this 30 more times, and you can see that the multiplier uses more bits than are available in your destination, meaning effectively c[0] had no impact on the final hash at all.
The end math works out exactly the same with a different multiplier like 26, except that the intermediate hashes will modulo 2^32 every so often during the process. Since 26 is even it still adds one 0 bit to the low end each iteration.
This hash can be described like this (here ^ is exponentiation, not xor).
hash(string) = sum_over_i(s[i] * MULT^(strlen(s) - i - 1)) % (2^32).
Look at the contribution of the first character. It's
(s[0] * MULT^(strlen(s) - 1)) % (2^32).
If the string is long enough (strlen(s) > 32) then this is zero.
Other people have posted the answer -- if you use an even multiple, then only the last characters in the string matter for computing the hash, as the early character's influence will have shifted out of the register.
Now lets consider what happens when you use a multiplier like 31. Well, 31 is 32-1 or 2^5 - 1. So when you use that, your final hash value will be:
\sum{c_i 2^{5(len-i)} - \sum{c_i}
unfortunately stackoverflow doesn't understad TeX math notation, so the above is hard to understand, but its two summations over the characters in the string, where the first one shifts each character by 5 bits for each subsequent character in the string. So using a 32-bit machine, that will shift off the top for all except the last seven characters of the string.
The upshot of this is that using a multiplier of 31 means that while characters other than the last seven have an effect on the string, its completely independent of their order. If you take two strings that have the same last 7 characters, for which the other characters also the same but in a different order, you'll get the same hash for both. You'll also get the same hash for things like "az" and "by" other than in the last 7 chars.
So using a prime multiplier, while much better than an even multiplier, is still not very good. Better is to use a rotate instruction, which shifts the bits back into the bottom when they shift out the top. Something like:
public unisgned hashCode(string chars)
{
unsigned h = 0;
for (int i = 0; i < chars.length; i++) {
h = (h<<5) + (h>>27); // ROL by 5, assuming 32 bits here
h += chars[i];
}
return h;
}
Of course, this depends on your compiler being smart enough to recognize the idiom for a rotate instruction and turn it into a single instruction for maximum efficiency.
This also still has the problem that swapping 32-character blocks in the string will give the same hash value, so its far from strong, but probably adequate for most non-cryptographic purposes
would produce a unique value
Stop right there. Hashes are not unique. A good hash algorithm will minimize collisions, but the pigeonhole principle assures us that perfectly avoiding collisions is not possible (for any datatype with non-trivial information content).