I am trying to convert a large number going in to Megabytes. I don't want decimals
numeric function formatMB(required numeric num) output="false" {
return arguments.num \ 1024 \ 1024;
}
It then throws an error
How do I get around this?
You can't change the size of a Long, which is what CF uses for integers. So you'll need to BigInteger instead:
numeric function formatMB(required numeric num) {
var numberAsBigInteger = createObject("java", "java.math.BigInteger").init(javacast("string", num));
var mbAsBytes = 1024 ^ 2;
var mbAsBytesAsBigInteger = createObject("java", "java.math.BigInteger").init(javacast("string", mbAsBytes));
var numberInMb = numberAsBigInteger.divide(mbAsBytesAsBigInteger);
return numberInMb.longValue();
}
CLI.writeLn(formatMB(2147483648));
But as Leigh points out... for what you're doing, you're probably better off just doing this:
return floor(arguments.num / (1024 * 1024));
the size of a Long, which is what CF uses for integers
Small correction for those that may not read the comments. CF primarily uses 32 bit signed Integers, not Long (which has a much greater capacity). So as the error message indicates, the size limit here is the capacity of an Integer:
Integer.MAX_VALUE = 2147483647
Long.MAX_VALUE = 9223372036854775807
It is worth noting that although CF is relatively typeless, some Math and Date functions also have the same limitation. For example, although DateAdd technically supports milliseconds, if you try and use a very large number:
// getTime() - returns number of milliseconds since January 1, 1970
currentDate = dateAdd("l", now().getTime(), createDate(1970,1,1));
... it will fail with the exact same error because the "number" parameter must be an integer. So take note if the documentation mentions an "Integer" is expected. It does not just mean a "number" or "numeric" ...
Related
I am trying to convert a large number going in to Megabytes. I don't want decimals
numeric function formatMB(required numeric num) output="false" {
return arguments.num \ 1024 \ 1024;
}
It then throws an error
How do I get around this?
You can't change the size of a Long, which is what CF uses for integers. So you'll need to BigInteger instead:
numeric function formatMB(required numeric num) {
var numberAsBigInteger = createObject("java", "java.math.BigInteger").init(javacast("string", num));
var mbAsBytes = 1024 ^ 2;
var mbAsBytesAsBigInteger = createObject("java", "java.math.BigInteger").init(javacast("string", mbAsBytes));
var numberInMb = numberAsBigInteger.divide(mbAsBytesAsBigInteger);
return numberInMb.longValue();
}
CLI.writeLn(formatMB(2147483648));
But as Leigh points out... for what you're doing, you're probably better off just doing this:
return floor(arguments.num / (1024 * 1024));
the size of a Long, which is what CF uses for integers
Small correction for those that may not read the comments. CF primarily uses 32 bit signed Integers, not Long (which has a much greater capacity). So as the error message indicates, the size limit here is the capacity of an Integer:
Integer.MAX_VALUE = 2147483647
Long.MAX_VALUE = 9223372036854775807
It is worth noting that although CF is relatively typeless, some Math and Date functions also have the same limitation. For example, although DateAdd technically supports milliseconds, if you try and use a very large number:
// getTime() - returns number of milliseconds since January 1, 1970
currentDate = dateAdd("l", now().getTime(), createDate(1970,1,1));
... it will fail with the exact same error because the "number" parameter must be an integer. So take note if the documentation mentions an "Integer" is expected. It does not just mean a "number" or "numeric" ...
I am finding pow(2,i) where i can range: 0<=i<=100000.
Apart i have MOD=1000000007
powers[100000];
powers[0]=1;
for (i = 1; i <=100000; ++i)
{
powers[i]=(powers[i-1]*2)%MOD;
}
for i=100000 won't power value become greater than MOD ?
How do I store the power correctly?
The operation doesn't look feasible to me.
I am getting correct value up to i=70 max I guess.
I have to find sum+= ar[i]*power(2,i) and finally print sum%1000000007 where ar[i] is an additional array with some big numbers up to 10^5
As long as your modulus value is less than half the capacity of your data type, it will never be exceeded. That's because you take the previous value in the range 0..1000000006, double it, then re-modulo it bringing it back to that same range.
However, I can't guarantee that higher values won't cause you troubles, it's more mathematical analysis than I'm prepared to invest given the simple alternative. You could spend a lot of time analysing, checking and debugging, but it's probably better just to not allow the problem to occur in the first place.
The alternative? I'd tend to use the pre-generation method (having a program do the gruntwork up front, inserting the pre-generated values into an array easily and speedily accessible from your real program).
With this method, you can use tools that are well tested and known to work with massive values. Since this data is not going to change, it's useless calculating it every time your program starts.
If you want an easy (and efficient) way to do this, the following bash script in conjunction with bc and awk can do this:
#!/usr/bin/bash
bc >nums.txt <<EOF
i = 1;
for (x = 0;x <= 10000; x++) {
i % 1000000007;
i = i * 2;
}
EOF
awk 'BEGIN { printf "static int array[] = {" }
{ if (NR % 5 == 1) printf "\n ";
printf "%s, ",$0;
next
}
END { print "\n};" }' nums.txt
The bc part is the "meat" of the matter, it creates the large powers of two and outputs them modulo the number you provided. The awk part is simply to format them in C-style array elements, five per line.
Just take the output of that and put it into your code and, voila, there you have it, a compile-time-expensed array that you can use for fast lookup.
It takes only a second and a half on my box to generate the array and then you never need to do it again. You also won't have to concern yourself with the vagaries of modulo math :-)
static int array[] = {
1,2,4,8,16,
32,64,128,256,512,
1024,2048,4096,8192,16384,
32768,65536,131072,262144,524288,
1048576,2097152,4194304,8388608,16777216,
33554432,67108864,134217728,268435456,536870912,
73741817,147483634,294967268,589934536,179869065,
359738130,719476260,438952513,877905026,755810045,
511620083,23240159,46480318,92960636,185921272,
371842544,743685088,487370169,974740338,949480669,
898961331,797922655,595845303,191690599,383381198,
766762396,533524785,67049563,134099126,268198252,
536396504,72793001,145586002,291172004,582344008,
164688009,329376018,658752036,317504065,635008130,
270016253,540032506,80065005,160130010,320260020,
640520040,281040073,562080146,124160285,248320570,
:
861508356,723016705,446033403,892066806,784133605,
568267203,136534399,273068798,546137596,92275185,
184550370,369100740,738201480,476402953,952805906,
905611805,
};
If you notice that your modulo can be stored in int. MOD=1000000007(decimal) is equivalent of 0b00111011100110101100101000000111 and can be stored in 32 bits.
- i pow(2,i) bit representation
- 0 1 0b00000000000000000000000000000001
- 1 2 0b00000000000000000000000000000010
- 2 4 0b00000000000000000000000000000100
- 3 8 0b00000000000000000000000000001000
- ...
- 29 536870912 0b00100000000000000000000000000000
Tricky part starts when pow(2,i) is grater than your MOD=1000000007, but if you know that current pow(2,i) will be greater than your MOD, you can actually see how bits look like after MOD
- i pow(2,i) pow(2,i)%MOD bit representation
- 30 1073741824 73741817 0b000100011001010011000000000000
- 31 2147483648 147483634 0b001000110010100110000000000000
- 32 4294967296 294967268 0b010001100101001100000000000000
- 33 8589934592 589934536 0b100011001010011000000000000000
So if you have pow(2,i-1)%MOD you can do *2 actually on pow(2,i-1)%MOD till you're next pow(2,i) will be greater than MOD.
In example for i=34 you will use (589934536*2) MOD 1000000007 instead of (8589934592*2) MOD 1000000007, because 8589934592 can't be stored in int.
Additional you can try bit operations instead of multiplication for pow(2,i).
Bit operation same as multiplication for 2 is bit shift left.
I'm programming in C++ and I have to store big numbers in one of my exercices.
The biggest number i have to store is : 9 780 321 563 842.
Each time i try to print the number (contained in a variable) it gives me a wrong result (not that number).
A 32bit type isn't enough since 2^32 is a 10 digit number and I have to store a 13 digit number. But with 64 bits you can respresent a number that has 20digits. So I tried using the type "uint64_t" but that didn't work for me and I really don't understand why.
So I searched on the internet to find which type would be sufficient for my variable to fit in. I saw on this forum persons with the same problem but they solved it using long long int or long double as type. But none worked for me (neither did long float).
I really don't know which other type could store that number, as I tried a lot but nothing worked for me.
Thanks for your help! :)
--
EDIT : The code is a bit long and complex and would not matter for the question, so this is actually what I do with the variable containing that number :
string barcode_s = "9780321563842";
uint64_t barcode = atoi(barcode_s.c_str());
cout << "Barcode is : " << barcode << endl;
Off course I don't put that number in a variable (of type string) "barcode_s" to convert it directly to a number, but that's what happen in my program. I read text from an input file and put it in "barcode_s" (the text I read and put in that variable is always a number) and then I convert that string to a number (using atoi).
So i presume the problem comes from the "atoi" function?
Thanks for your help!
The problem is indeed atoi: it returns an int, which is on most platforms a 32-bits integer. Converting to uint64_t from int will not magically restore the information that has been lost.
There are several solutions, though. In C++03, you could use stringstream to handle the conversion:
std::istringstream stream(barcode_s);
unsigned long barcode = 0;
if (not (stream >> barcode)) { std::abort(); }
In C++11, you can simply use stoul or stoull:
unsigned long long const barcode = std::stoull(barcode_s);
Your number 9 780 321 563 842 is hex 8E52897B4C2, which fits into 44 bits (4 bits per hex digit), so any 64 bit integer, no matter if signed or unsigned, will have space to spare. 'uint64_t' will work, and it will even fit into a 'double' with no loss of precision.
It follows that the remaining issue is a mistake in your code, usually that is either an accidental conversion of the 64 bit number to another type somewhere, or you are calling the wrong fouction to print a 64 bit integer.
Edit: just saw your code. 'atoi' returns int. As in 'int32_t'. Converting that to 'unit64_t' will not reconstruct the 64 bit number. Have a look at this: http://msdn.microsoft.com/en-us/library/czcad93k.aspx
The atoll () function converts char* to a long long.
If you don't have the longer function available, write your own in the mean time.
uint64_t result = 0 ;
for (unsigned int ii = 0 ; str.c_str()[ii] != 0 ; ++ ii)
{
result *= 10 ;
result += str.c_str () [ii] - '0' ;
}
I hope this finds you well.
I am trying to convert an index (number) for a word, using the ASCII code for that.
for ex:
index 0 -> " "
index 94 -> "~"
index 625798 -> "e#A"
index 899380 -> "!$^."
...
As we all can see, the 4th index correspond to a 4 char string. Unfortunately, at some point, these combinations get really big (i.e., for a word of 8 chars, i need to perform operations with 16 digit numbers (ex: 6634204312890625), and it gets really worse if I raise the number of chars of the word).
To support such big numbers, I had to upgrade some variables of my program from unsigned int to unsigned long long, but then I realized that modf() from C++ uses doubles and uint32_t (http://www.raspberryginger.com/jbailey/minix/html/modf_8c-source.html).
The question is: is this possible to adapt modf() to use 64 bit numbers like unsigned long long? I'm afraid that in case this is not possible, i'll be limited to digits of double length.
Can anyone enlight me please? =)
16-digit numbers fit within the range of a 64-bit number, so you should use uint64_t (from <stdint.h>). The % operator should then do what you need.
If you need bigger numbers, then you'll need to use a big-integer library. However, if all you're interested in is modulus, then there's a trick you can pull, based on the following properties of modulus:
mod(a * b) == mod(mod(a) * mod(b))
mod(a + b) == mod(mod(a) + mod(b))
As an example, let's express a 16-digit decimal number, x as:
x = x_hi * 1e8 + x_lo; // this is pseudocode, not real C
where x_hi is the 8 most-significant decimal digits, and x_lo the least-significant. The modulus of x can then be expressed as:
mod(x) = mod((mod(x_hi) * mod(1e8) + mod(x_lo));
where mod(1e8) is a constant which you can precalculate.
All of this can be done in integer arithmetic.
I could actually use a comment that was deleted right after (wonder why), that said:
modulus = a - a/b * b;
I've made a cast in the division to unsigned long long.
Now... I was a bit disappointed, because in my problem I thought I could keep raising the number of characters of the word with no problem. Nevertheless, I've started to get size issues at the n.ยบ of chars = 7. Why? 95^7 starts to give huge numbers.
I was hoping to get the possibility to write a word like "my cat is so fat I 1234r5s" and calculate the index of this, but this word has almost 30 characters:
95^26 = 2635200944657423647039506726457895338535308837890625 combinations.
Anyway, thanks for the answer.
I'm a beginner (self-learning) programmer learning C++, and recently I decided to implement a binary-coded decimal (BCD) class as an exercise, and so I could handle very large numbers on Project Euler. I'd like to do it as basically as possible, starting properly from scratch.
I started off using an array of ints, where every digit of the input number was saved as a separate int. I know that each BCD digit can be encoded with only 4 bits, so I thought using a whole int for this was a bit overkill. I'm now using an array of bitset<4>'s.
Is using a library class like this overkill as well?
Would you consider it cheating?
Is there a better way to do this?
EDIT: The primary reason for this is as an exercise - I wouldn't want to use a library like GMP because the whole point is making the class myself. Is there a way of making sure that I only use 4 bits for each decimal digit?
Just one note, using an array of bitset<4>'s is going to require the same amount of space as an array of long's. bitset is usually implemented by having an array of word sized integers be the backing store for the bits, so that bitwise operations can use bitwise word operations, not byte ones, so more gets done at a time.
Also, I question your motivation. BCD is usually used as a packed representation of a string of digits when sending them between systems. There isn't really anything to do with arithmetic usually. What you really want is an arbitrary sized integer arithmetic library like GMP.
Is using a library class like this overkill as well?
I would benchmark it against an array of ints to see which one performs better. If an array of bitset<4> is faster, then no it's not overkill. Every little bit helps on some of the PE problems
Would you consider it cheating?
No, not at all.
Is there a better way to do this?
Like Greg Rogers suggested, an arbitrary precision library is probably a better choice, unless you just want to learn from rolling your own. There's something to learn from both methods (using a library vs. writing a library). I'm lazy, so I usually use Python.
Like Greg Rogers said, using a bitset probably won't save any space over ints, and doesn't really provide any other benefits. I would probably use a vector instead. It's twice as big as it needs to be, but you get simpler and faster indexing for each digit.
If you want to use packed BCD, you could write a custom indexing function and store two digits in each byte.
Is using a library class like this overkill as well?
Would you consider it cheating?
Is there a better way to do this?
1&2: not really
3: each byte's got 8-bits, you could store 2 BCD in each unsigned char.
In general, bit operations are applied in the context of an integer, so from the performance aspect there is no real reason to go to bits.
If you want to go to bit approach to gain experience, then this may be of help
#include <stdio.h>
int main
(
void
)
{
typedef struct
{
unsigned int value:4;
} Nibble;
Nibble nibble;
for (nibble.value = 0; nibble.value < 20; nibble.value++)
{
printf("nibble.value is %d\n", nibble.value);
}
return 0;
}
The gist of the matter is that inside that struct, you are creating a short integer, one that is 4 bits wide. Under the hood, it is still really an integer, but for your intended use, it looks and acts like a 4 bit integer.
This is shown clearly by the for loop, which is actually an infinite loop. When the nibble value hits, 16, the value is really zero, as there are only 4 bits to work with.
As a result nibble.value < 20 never becomes true.
If you look in the K&R White book, one of the notes there is the fact that bit operations like this are not portable, so if you want to port your program to another platform, it may or may not work.
Have fun.
You are trying to get base-10 representation (i.e. decimal digit in each cell of the array). This way either space (one int per digit), or time (4-bits per dgit, but there is overhead of packing/unpacking) is wasted.
Why not try with base-256, for example, and use an array of bytes? Or even base-2^32 with array of ints? The operations are implemented the same way as in base-10. The only thing that will be different is converting the number to a human-readable string.
It may work like this:
Assuming base-256, each "digit" has 256 possible values, so the numbers 0-255 are all single digit values. Than 256 is written as 1:0 (I'll use colon to separate the "digits", we cannot use letters like in base-16), analoge in base-10 is how after 9, there is 10.
Likewise 1030 (base-10) = 4 * 256 + 6 = 4:6 (base-256).
Also 1020 (base-10) = 3 * 256 + 252 = 3:252 (base-256) is two-digit number in base-256.
Now let's assume we put the digits in array of bytes with the least significant digit first:
unsigned short digits1[] = { 212, 121 }; // 121 * 256 + 212 = 31188
int len1 = 2;
unsigned short digits2[] = { 202, 20 }; // 20 * 256 + 202 = 5322
int len2 = 2;
Then adding will go like this (warning: notepad code ahead, may be broken):
unsigned short resultdigits[enough length] = { 0 };
int len = len1 > len2 ? len1 : len2; // max of the lengths
int carry = 0;
int i;
for (i = 0; i < len; i++) {
int leftdigit = i < len1 ? digits1[i] : 0;
int rightdigit = i < len2 ? digits2[i] : 0;
int sum = leftdigit + rightdigit + carry;
if (sum > 255) {
carry = 1;
sum -= 256;
} else {
carry = 0;
}
resultdigits[i] = sum;
}
if (carry > 0) {
resultdigits[i] = carry;
}
On the first iteration it should go like this:
sum = 212 + 202 + 0 = 414
414 > 256, so carry = 1 and sum = 414 - 256 = 158
resultdigits[0] = 158
On the second iteration:
sum = 121 + 20 + 1 = 142
142 < 256, so carry = 0
resultdigits[1] = 142
So at the end resultdigits[] = { 158, 142 }, that is 142:158 (base-256) = 142 * 256 + 158 = 36510 (base-10), which is exactly 31188 + 5322
Note that converting this number to/from a human-readable form is by no means a trivial task - it requires multiplication and division by 10 or 256 and I cannot present code as a sample without proper research. The advantage is that the operations 'add', 'subtract' and 'multiply' can be made really efficient and the heavy conversion to/from base-10 is done only once in the beginning and once after the end of the calculation.
Having said all that, personally, I'd use base 10 in array of bytes and not care about the memory loss. This will require adjusting the constants 255 and 256 above to 9 and 10 respectively.