The most common way is to get the power of 2 for each non-zero position of the binary number, and then sum them up. This is not workable when the binary number is huge, say,
10000...0001 //1000000 positions
It is impossible to let the computer compute the pow(2,1000000). So the traditional way is not workable.
Other way to do this?
Could someone give an arithmetic method about how to compute, not library?
As happydave said, there are existing libraries (such as GMP) for this type of thing. If you need to roll your own for some reason, here's an outline of a reasonably efficient approach.
You'll need bigint subtraction, comparison and multiplication.
Cache values of 10^(2^n) in your binary format until the next value is bigger than your binary number. This will allow you to quickly generate a power of ten by doing the following:
Select the largest value in your cache smaller than your remaining number, store this
in a working variable.
do{
Multiply it by the next largest value in your cache and store the result in a
temporary value.
If the new value is still smaller, set your working value to this number (swapping
references here rather than allocating new memory is a good idea),
Keep a counter to see which digit you're at. If this changes by more than one
between instances of the outer loop, you need to pad with zeros
} Until you run out of cache
This is your next base ten value in binary, subtract it from your binary number while
the binary number is larger than your digit, the number of times you do this is the
decimal digit -- you can cheat a little here by comparing the most significant bits
and finding a lower bound before trying subtraction.
Repeat until your binary number is 0
This is roughly O(n^4) with regards to number of binary digits, and O(nlog(n)) with regards to memory. You can get that n^4 closer to n^3 by using a more sophisticated multiplication algorithm.
You could write your own class for handling arbitrarily large integers (which you can represent as an array of integers, or whatever makes the most sense), and implement the operations (*, pow, etc.) yourself. Or you could google "C++ big integer library", and find someone else who has already implemented it.
It is impossible to let the computer compute the pow(2,1000000). So the traditional way is not workable.
It is not impossible. For example, Python can do the arithmetic calculation instantly, and the conversion to a decimal number in about two seconds (on my machine). Python has built in facilities for dealing with large integers that exceed the size of a machine word.
In C++ (and C), a good choice of big integer library is GMP. It is robust, well tested, and actively maintained. It includes a C++ wrapper that uses operator overloading to provide a nice interface (except, there is no C++ operator for the pow() operation).
Here is a C++ example that uses GMP:
#include <iostream>
#include <gmpxx.h>
int main(int, char *[])
{
mpz_class a, b;
a = 2;
mpz_pow_ui(b.get_mpz_t(), a.get_mpz_t(), 1000000);
std::string s = b.get_str();
std::cout << "length is " << s.length() << std::endl;
return 0;
}
The output of the above is
length is 301030
which executes on my machine in 0.18 seconds.
"This is roughly O(n^4) with regards to number of binary digits, and O(nlog(n)) with regards to memory". You can do O(n^(2 + epsilon)) operations (where n is the number of binary digits), and O(n) memory as follows: Let N be an enormous number of binary length n. Compute the residues mod 2 (easy; grab the low bit) and mod 5 (not easy but not terrible; break the binary string into successive strings of four bits; compute the residue mod 5 of each such 4-tuple, and add them up as with casting out 9's for decimal numbers.). By computing the residues mod 2 and 5 you can read off the low decimal digit. Subtract this; divide by 10 (the internet documents ways to do this), and repeat to get the next-lowest digit.
I calculated 2 ** 1000000 and converted it to decimal in 9.3 seconds in Smalltalk so it's not impossible. Smalltalk has large integer libraries built in.
2 raisedToInteger: 1000000
As mentioned in another answer, you need a library that handles arbitrary precision integer numbers. Once you have that, you do MOD 10 and DIV 10 operations on it to compute the decimal digits in reverse order (least significant to most significant).
The rough idea is something like this:
LargeInteger *a;
char *string;
while (a != 0) {
int remainder;
LargeInteger *quotient;
remainder = a % 10.
*string++ = remainder + 48.
quotient = a / 10.
}
Many details are missing (or wrong) here concerning type conversions, memory management and allocation of objects but it's meant to demonstrate the general technique.
It's quite simple with the Gnu Multiprecision Library. Unfortunately, I couldn't test this program because it seems I need to rebuild my library after a compiler upgrade. But there's not much room for error!
#include "gmpxx.h"
#include <iostream>
int main() {
mpz_class megabit( "1", 10 );
megabit <<= 1000000;
megabit += 1;
std::cout << megabit << '\n';
}
Related
Doing one of my first homeworks of uni, and have ran into this problem:
Task: Find a sum of all n elements where n is the count of numerals in a number (n=1, means 1, 2, 3... 8, 9 for example, answer is 45)
Problem: The code I wrote has gotten all the test answers correctly up to 10 to the power of 9, but when it reaches 10 to the power of 10 territory, then the answers start being wrong, it's really close to what I should be getting, but not quite there (For example, my output = 49499999995499995136, expected result = 49499999995500000000)
Would really appreciate some help/insights, am guessing it's something to do with the variable types, but not quite sure of a possible solution..
#include <iostream>
#include <cmath>
#include <iomanip>
using namespace std;
int main()
{
int n;
double ats = 0, maxi, mini;
cin >> n;
maxi = pow(10, n) - 1;
mini = pow(10, n-1) - 1;
ats = (maxi * (maxi + 1)) / 2 - (mini * (mini + 1)) / 2;
cout << setprecision(0) << fixed << ats;
}
The main reason of problems is pow() function. It works with double, not int. Loss of accuracy is price for representing huge numbers.
There are 3 way's to solve problem:
For small n you can make your own long long int pow(int x, int pow) function. But there is problem, that we can overflow even long long int
Use long arithmetic functions, as #rustyx sayed. You can write your own with vector, or find and include library.
There is Math solution specific for topic's task. It solves the big numbers problem.
You can write your formula like
((10^n) - 1) * (10^n) - (10^m - 1) * (10^m)) / 2 , (here m = n-1)
Then multiply numbers in numerator. Regroup them. Extract common multiples 10^(n-1). And then you can see, that answer have a structure:
X9...9Y0...0 for big enought n, where letter X and Y are constants.
So, you can just print the answer "string" without calculating.
I think you're stretching floating points beyond their precision. Let me explain:
The C pow() function takes doubles as arguments. You're passing ints, the compiler is adding the code to convert them to doubles before they reach pow(). (And anyway you're storing it as a double when you get the return value since you declared it that way).
Floating points are called that way precisely because the point "floats". Inside a double there's a sign bit, a few bits for the mantissa and a few bits for the exponent. In binary, elevating to a power of two is equivalent to moving the fractional point to the right (or to the left if you're elevating to a negative number). So basically the exponent is saying where the fractional point is, in binary. The great advantage of using this kind of in-memory representation for doubles is that you get a lot of precision for numbers close to 0, and gradually lose precision as numbers become bigger.
That last thing is exactly what's happening to you. Your number is too large to be stored exactly. So it's being rounded to the closest sum of powers of two (powers of two are the numbers that have all zeroes to the right in binary).
Quick experiment: press F12 in your browser, open the javascript console and type 49499999995499995136. In my case, in chrome, I reproduce the same problem.
If you really really really want precision with such big numbers then you can try some of these libraries, but that's too advanced for a student program, you don't need it. Just add an if block and print an error message if the number that the user typed is too big (professors love that, which is actually quite correct).
What is the most optimal way to convert a decimal number into its binary form ,i.e with the best time complexity?
Normally to convert a decimal number into binary,we keep on dividing the number by 2 and storing its remainders.But this would take really long time if the number in decimal form is very large.The time complexity in this case would turn out to be O(log n).
So i want to know if there is any approach other than this that can do my job with better time comlexity?
The problem is essentially that of evaluating a polynomial using binary integer arithmetic, so the result is in binary. Suppose
p(x) = a₀xⁿ + a₁xⁿ⁻¹ + ⋯ + aₙ₋₁x + aₙ
Now if a₀,a₁,a₂,⋯,aₙ are the decimal digits of the number (each implicitly represented by binary numbers in the range 0 through 9) and we evaluate p at x=10 (implicitly in binary) then the result is the binary number that the decimal digit sequence represents.
The best way to evaluate a polynomial at a single point given also the coefficients as input is Horner's Rule. This amounts to rewriting p(x) in a way easy to evaluate as follows.
p(x) = ((⋯((a₀x + a₁)x + a₂)x + ⋯ )x + aₙ₋₁)x + aₙ
This gives the following algorithm. Here the array a[] contains the digits of the decimal number, left to right, each represented as a small integer in the range 0 through 9. Pseudocode for an array indexed from 0:
toNumber(a[])
const x = 10
total = a[0]
for i = 1 to a.length - 1 do
total *= x //multiply the total by x=10
total += a[i] //add on the next digit
return total
Running this code on a machine where numbers are represented in binary gives a binary result. Since that's what we have on this planet, this gives you what you want.
If you want to get the actual bits, now you can use efficient binary operations to get them from the binary number you have constructed, for example, mask and shift.
The complexity of this is linear in the number of digits, because arithmetic operations on machine integers are constant time, and it does two operations per digit (apart from the first). This is a tiny amount of work, so this is supremely fast.
If you need very large numbers, bigger that 64 bits, just use some kind of large integer. Implemented properly this will keep the cost of arithmetic down.
To avoid as much large integer arithmetic as possible if your large integer implementation needs it, break the array of digits into slices of 19 digits, with the leftmost slice potentially having fewer. 19 is the maximum number of digits that can be converted into an (unsigned) 64-bit integer.
Convert each block as above into binary without using large integers and make a new array of those 64-bit values in left to right order. These are now the coefficients of a polynomial to be evaluated at x=10¹⁹. The same algorithm as above can be used only with large integer arithmetic operations, with 10 replaced by 10¹⁹ which should be evaluated with large integer arithmetic in advance of its use.
Consider the problem:
It can be shown that for some powers of two in decimal format like:
2^9 = 512
2^89 = 618,970,019,642,690,137,449,562,112
The results end in a string consisting of 1s and 2s. In fact, it can be proven that for every integer R, there
exists a power of 2 such that 2K where K > 0 has a string of only 1s and 2s in its last R digits.
It can be shown clearly in the table below:
R Smallest K 2^K
1 1 2
2 9 512
3 89 ...112
4 89 ...2112
Using this technique, what then is the sum of all the smallest K values for 1 <= R <= 10?
Proposed sol:
Now this problem ain't that difficult to solve. You can simply do
int temp = power(2, int)
and then if you can get the length of the temp then multiply it with
(100^len)-i or (10^len)-i
// where i would determine how many last digits you want.
Now this temp = power(2,int) gets much higher with increasing int that you can't even store it in the int type or even in long int....
So what would be done. And is there any other solution based on bit strings. I guess that might make this problem easy.
Thanks in advance.
No, I doubt there are any solutions based on "strings of bits". That would be quite inefficient. But there are Bignum Libraries like GMP which feature variable types either fixed-size much bigger than int types, or of arbitrary size limited only by memory capacity, plus matching sets of math operations, working similarly to software FPU emulation.
Quoting after reference with a minor paraphrase.
#include <gmpxx.h>
int
main (void)
{
mpz_class a, b, c;
a = 1234;
b = "-5676739826856836954375492356569366529629568926519085610160816539856926459237598";
c = a+b;
cout << "sum is " << c << "\n";
cout << "absolute value is " << abs(c) << "\n";
return 0;
}
Thanks to C++ operator overloading, it is much easier to use than ANSI C version.
Since you are only interested in the the n least significant digits of your result, you could try to devise an algorithm that only calculates those. Based on the standard algorithm for written multiplication you can see that the n least significant digits of the product are entirely determined by the n least significant digits of the multiplicands. Based on this it should be possible to create an algorithm that calculates as many digits of R^K as fit into a long int.
The only problem you might run into is that there may be numbers that end in a matching sequence that is longer that a long int can hold. In that case you can still resort to calculating additional digits using your own algorithm or a library.
Note that this is basically the same thing that big-number libraries do, only your approach might be more efficient, because you are calculating less digits that you are unlikely to need.
Try GMP, http://gmplib.org/
It can store a number with any size if it fits in the memory.
Altough you might be better off with less brute force approach.
You can store binary strings in std::bitset or in std::vector
www.cplusplus.com/reference/bitset/bitset/
I think bitset is your choice.
Using big arithmetic for operations on powers of 2 is not though
I'm doing a BigInt implementation in C++ and I'm having a hard time figuring out how to create a converter from (and to) string (C string would suffice for now).
I implement the number as an array of unsigned int (so basically putting blocks of bits next to each other). I just can't figure out how to convert a string to this representation.
For example if usigned int would be 32b and i'd get a string of "4294967296", or "5000000000" or basically anything larger than what a 32b int can hold, how would I properly convert it to appropriate binary representation?
I know I'm missing something obvious, and I'm only asking for a push to the right direction. Thanks for help and sorry for asking such a silly question!
Well one way (not necessarily the most efficient) is to implement the usual arithmetic operators and then just do the following:
// (pseudo-code)
// String to BigInt
String s = ...;
BigInt x = 0;
while (!s.empty())
{
x *= 10;
x += s[0] - '0';
s.pop_front();
}
Output(x);
// (pseudo-code)
// BigInt to String
BigInt x = ...;
String s;
while (x > 0)
{
s += '0' + x % 10;
x /= 10;
}
Reverse(s);
Output(s);
If you wanted to do something trickier than you could try the following:
If input I is < 100 use above method.
Estimate D number of digits of I by bit length * 3 / 10.
Mod and Divide by factor F = 10 ^ (D/2), to get I = X*F + Y;
Execute recursively with I=X and I=Y
Implement and test the string-to-number algorithm using a builtin type such as int.
Implement a bignum class with operator+, operator*, and whatever else the above algorithm uses.
Now the algorithm should work unchanged with the bignum class.
Use the string conversion algo to debug the class, not the other way around.
Also, I'd encourage you to try and write at a high level, and not fall back on C constructs. C may be simpler, but usually does not make things easier.
Take a look at, for instance, mp_toradix and mp_read_radix in Michael Bromberger's MPI.
Note that repeated division by 10 (used in the above) performs very poorly, which shows up when you have very big integers. It's not the "be all and end all", but it's more than good enough for homework.
A divide and conquer approach is possible. Here is the gist. For instance, given the number 123456789, we can break it into pieces: 1234 56789, by dividing it by a power of 10. (You can think of these pieces of two large digits in base 100,000. Now performing the repeated division by 10 is now cheaper on the two pieces! Dividing 1234 by 10 three times and 56879 by 10 four times is cheaper than dividing 123456789 by 10 eight times.
Of course, a really large number can be recursively broken into more than two pieces.
Bruno Haibl's CLN (used in CLISP) does something like that and it is blazingly fast compared to MPI, in converting numbers with thousands of digits to numeric text.
I'm looking for an extremely fast atof() implementation on IA32 optimized for US-en locale, ASCII, and non-scientific notation. The windows multithreaded CRT falls down miserably here as it checks for locale changes on every call to isdigit(). Our current best is derived from the best of perl + tcl's atof implementation, and outperforms msvcrt.dll's atof by an order of magnitude. I want to do better, but am out of ideas. The BCD related x86 instructions seemed promising, but I couldn't get it to outperform the perl/tcl C code. Can any SO'ers dig up a link to the best out there? Non x86 assembly based solutions are also welcome.
Clarifications based upon initial answers:
Inaccuracies of ~2 ulp are fine for this application.
The numbers to be converted will arrive in ascii messages over the network in small batches and our application needs to convert them in the lowest latency possible.
What is your accuracy requirement? If you truly need it "correct" (always gets the nearest floating-point value to the decimal specified), it will probably be hard to beat the standard library versions (other than removing locale support, which you've already done), since this requires doing arbitrary precision arithmetic. If you're willing to tolerate an ulp or two of error (and more than that for subnormals), the sort of approach proposed by cruzer's can work and may be faster, but it definitely will not produce <0.5ulp output. You will do better accuracy-wise to compute the integer and fractional parts separately, and compute the fraction at the end (e.g. for 12345.6789, compute it as 12345 + 6789 / 10000.0, rather than 6*.1 + 7*.01 + 8*.001 + 9*0.0001) since 0.1 is an irrational binary fraction and error will accumulate rapidly as you compute 0.1^n. This also lets you do most of the math with integers instead of floats.
The BCD instructions haven't been implemented in hardware since (IIRC) the 286, and are simply microcoded nowadays. They are unlikely to be particularly high-performance.
This implementation I just finished coding runs twice as fast as the built in 'atof' on my desktop. It converts 1024*1024*39 number inputs in 2 seconds, compared 4 seconds with my system's standard gnu 'atof'. (Including the setup time and getting memory and all that).
UPDATE:
Sorry I have to revoke my twice as fast claim. It's faster if the thing you're converting is already in a string, but if you're passing it hard coded string literals, it's about the same as atof. However I'm going to leave it here, as possibly with some tweaking of the ragel file and state machine, you may be able to generate faster code for specific purposes.
https://github.com/matiu2/yajp
The interesting files for you are:
https://github.com/matiu2/yajp/blob/master/tests/test_number.cpp
https://github.com/matiu2/yajp/blob/master/number.hpp
Also you may be interested in the state machine that does the conversion:
It seems to me you want to build (by hand) what amounts to a state machine where each state handles the Nth input digit or exponent digits; this state machine would be shaped like a tree (no loops!). The goal is to do integer arithmetic wherever possible, and (obviously) to remember state variables ("leading minus", "decimal point at position 3") in the states implicitly, to avoid assignments, stores and later fetch/tests of such values. Implement the state machine with plain old "if" statements on the input characters only (so your tree gets to be a set of nested ifs). Inline accesses to buffer characters; you don't want a function call to getchar to slow you down.
Leading zeros can simply be suppressed; you might need a loop here to handle ridiculously long leading zero sequences. The first nonzero digit can be collected without zeroing an accumulator or multiplying by ten. The first 4-9 nonzero digits (for 16 bit or 32 bits integers) can be collected with integer multiplies by constant value ten (turned by most compilers into a few shifts and adds). [Over the top: zero digits don't require any work until a nonzero digit is found and then a multiply 10^N for N sequential zeros is required; you can wire all this in into the state machine]. Digits following the first 4-9 may be collected using 32 or 64 bit multiplies depending on the word size of your machine. Since you don't care about accuracy, you can simply ignore digits after you've collected 32 or 64 bits worth; I'd guess that you can actually stop when you have some fixed number of nonzero digits based on what your application actually does with these numbers. A decimal point found in the digit string simply causes a branch in the state machine tree. That branch knows the implicit location of the point and therefore later how to scale by a power of ten appropriately. With effort, you may be able to combine some state machine sub-trees if you don't like the size of this code.
[Over the top: keep the integer and fractional parts as separate (small) integers. This will require an additional floating point operation at the end to combine the integer and fraction parts, probably not worth it].
[Over the top: collect 2 characters for digit pairs into a 16 bit value, lookup the 16 bit value.
This avoids a multiply in the registers in trade for a memory access, probably not a win on modern machines].
On encountering "E", collect the exponent as an integer as above; look up accurately precomputed/scaled powers of ten up in a table of precomputed multiplier (reciprocals if "-" sign present in exponent) and multiply the collected mantissa. (don't ever do a float divide). Since each exponent collection routine is in a different branch (leaf) of the tree, it has to adjust for the apparent or actual location of the decimal point by offsetting the power of ten index.
[Over the top: you can avoid the cost of ptr++ if you know the characters for the number are stored linearly in a buffer and do not cross the buffer boundary. In the kth state along a tree branch, you can access the the kth character as *(start+k). A good compiler can usually hide the "...+k" in an indexed offset in the addressing mode.]
Done right, this scheme does roughly one cheap multiply-add per nonzero digit, one cast-to-float of the mantissa, and one floating multiply to scale the result by exponent and location of decimal point.
I have not implemented the above. I have implemented versions of it with loops, they're pretty fast.
I've implemented something you may find useful.
In comparison with atof it's about x5 faster and if used with __forceinline about x10 faster.
Another nice thing is that it seams to have exactly same arithmetic as crt implementation.
Of course it has some cons too:
it supports only single precision float,
and doesn't scan any special values like #INF, etc...
__forceinline bool float_scan(const wchar_t* wcs, float* val)
{
int hdr=0;
while (wcs[hdr]==L' ')
hdr++;
int cur=hdr;
bool negative=false;
bool has_sign=false;
if (wcs[cur]==L'+' || wcs[cur]==L'-')
{
if (wcs[cur]==L'-')
negative=true;
has_sign=true;
cur++;
}
else
has_sign=false;
int quot_digs=0;
int frac_digs=0;
bool full=false;
wchar_t period=0;
int binexp=0;
int decexp=0;
unsigned long value=0;
while (wcs[cur]>=L'0' && wcs[cur]<=L'9')
{
if (!full)
{
if (value>=0x19999999 && wcs[cur]-L'0'>5 || value>0x19999999)
{
full=true;
decexp++;
}
else
value=value*10+wcs[cur]-L'0';
}
else
decexp++;
quot_digs++;
cur++;
}
if (wcs[cur]==L'.' || wcs[cur]==L',')
{
period=wcs[cur];
cur++;
while (wcs[cur]>=L'0' && wcs[cur]<=L'9')
{
if (!full)
{
if (value>=0x19999999 && wcs[cur]-L'0'>5 || value>0x19999999)
full=true;
else
{
decexp--;
value=value*10+wcs[cur]-L'0';
}
}
frac_digs++;
cur++;
}
}
if (!quot_digs && !frac_digs)
return false;
wchar_t exp_char=0;
int decexp2=0; // explicit exponent
bool exp_negative=false;
bool has_expsign=false;
int exp_digs=0;
// even if value is 0, we still need to eat exponent chars
if (wcs[cur]==L'e' || wcs[cur]==L'E')
{
exp_char=wcs[cur];
cur++;
if (wcs[cur]==L'+' || wcs[cur]==L'-')
{
has_expsign=true;
if (wcs[cur]=='-')
exp_negative=true;
cur++;
}
while (wcs[cur]>=L'0' && wcs[cur]<=L'9')
{
if (decexp2>=0x19999999)
return false;
decexp2=10*decexp2+wcs[cur]-L'0';
exp_digs++;
cur++;
}
if (exp_negative)
decexp-=decexp2;
else
decexp+=decexp2;
}
// end of wcs scan, cur contains value's tail
if (value)
{
while (value<=0x19999999)
{
decexp--;
value=value*10;
}
if (decexp)
{
// ensure 1bit space for mul by something lower than 2.0
if (value&0x80000000)
{
value>>=1;
binexp++;
}
if (decexp>308 || decexp<-307)
return false;
// convert exp from 10 to 2 (using FPU)
int E;
double v=pow(10.0,decexp);
double m=frexp(v,&E);
m=2.0*m;
E--;
value=(unsigned long)floor(value*m);
binexp+=E;
}
binexp+=23; // rebase exponent to 23bits of mantisa
// so the value is: +/- VALUE * pow(2,BINEXP);
// (normalize manthisa to 24bits, update exponent)
while (value&0xFE000000)
{
value>>=1;
binexp++;
}
if (value&0x01000000)
{
if (value&1)
value++;
value>>=1;
binexp++;
if (value&0x01000000)
{
value>>=1;
binexp++;
}
}
while (!(value&0x00800000))
{
value<<=1;
binexp--;
}
if (binexp<-127)
{
// underflow
value=0;
binexp=-127;
}
else
if (binexp>128)
return false;
//exclude "implicit 1"
value&=0x007FFFFF;
// encode exponent
unsigned long exponent=(binexp+127)<<23;
value |= exponent;
}
// encode sign
unsigned long sign=negative<<31;
value |= sign;
if (val)
{
*(unsigned long*)val=value;
}
return true;
}
I remember we had a Winforms application that performed so slowly while parsing some data interchange files, and we all thought it was the db server thrashing, but our smart boss actually found out that the bottleneck was in the call that was converting the parsed strings into decimals!
The simplest is to loop for each digit (character) in the string, keep a running total, multiply the total by 10 then add the value of the next digit. Keep on doing this until you reach the end of the string or you encounter a dot. If you encounter a dot, separate the whole number part from the fractional part, then have a multiplier that divides itself by 10 for each digit. Keep on adding them up as you go.
Example: 123.456
running total = 0, add 1 (now it's 1)
running total = 1 * 10 = 10, add 2 (now it's 12)
running total = 12 * 10 = 120, add 3 (now it's 123)
encountered a dot, prepare for fractional part
multiplier = 0.1, multiply by 4, get 0.4, add to running total, makes 123.4
multiplier = 0.1 / 10 = 0.01, multiply by 5, get 0.05, add to running total, makes 123.45
multipiler = 0.01 / 10 = 0.001, multiply by 6, get 0.006, add to running total, makes 123.456
Of course, testing for a number's correctness as well as negative numbers will make it more complicated. But if you can "assume" that the input is correct, you can make the code much simpler and faster.
Have you considered looking into having the GPU do this work? If you can load the strings into GPU memory and have it process them all you may find a good algorithm that will run significantly faster than your processor.
Alternately, do it in an FPGA - There are FPGA PCI-E boards that you can use to make arbitrary coprocessors. Use DMA to point the FPGA at the part of memory containing the array of strings you want to convert and let it whizz through them leaving the converted values behind.
Have you looked at a quad core processor? The real bottleneck in most of these cases is memory access anyway...
-Adam