Related
I am trying to convert a large number going in to Megabytes. I don't want decimals
numeric function formatMB(required numeric num) output="false" {
return arguments.num \ 1024 \ 1024;
}
It then throws an error
How do I get around this?
You can't change the size of a Long, which is what CF uses for integers. So you'll need to BigInteger instead:
numeric function formatMB(required numeric num) {
var numberAsBigInteger = createObject("java", "java.math.BigInteger").init(javacast("string", num));
var mbAsBytes = 1024 ^ 2;
var mbAsBytesAsBigInteger = createObject("java", "java.math.BigInteger").init(javacast("string", mbAsBytes));
var numberInMb = numberAsBigInteger.divide(mbAsBytesAsBigInteger);
return numberInMb.longValue();
}
CLI.writeLn(formatMB(2147483648));
But as Leigh points out... for what you're doing, you're probably better off just doing this:
return floor(arguments.num / (1024 * 1024));
the size of a Long, which is what CF uses for integers
Small correction for those that may not read the comments. CF primarily uses 32 bit signed Integers, not Long (which has a much greater capacity). So as the error message indicates, the size limit here is the capacity of an Integer:
Integer.MAX_VALUE = 2147483647
Long.MAX_VALUE = 9223372036854775807
It is worth noting that although CF is relatively typeless, some Math and Date functions also have the same limitation. For example, although DateAdd technically supports milliseconds, if you try and use a very large number:
// getTime() - returns number of milliseconds since January 1, 1970
currentDate = dateAdd("l", now().getTime(), createDate(1970,1,1));
... it will fail with the exact same error because the "number" parameter must be an integer. So take note if the documentation mentions an "Integer" is expected. It does not just mean a "number" or "numeric" ...
I am finding pow(2,i) where i can range: 0<=i<=100000.
Apart i have MOD=1000000007
powers[100000];
powers[0]=1;
for (i = 1; i <=100000; ++i)
{
powers[i]=(powers[i-1]*2)%MOD;
}
for i=100000 won't power value become greater than MOD ?
How do I store the power correctly?
The operation doesn't look feasible to me.
I am getting correct value up to i=70 max I guess.
I have to find sum+= ar[i]*power(2,i) and finally print sum%1000000007 where ar[i] is an additional array with some big numbers up to 10^5
As long as your modulus value is less than half the capacity of your data type, it will never be exceeded. That's because you take the previous value in the range 0..1000000006, double it, then re-modulo it bringing it back to that same range.
However, I can't guarantee that higher values won't cause you troubles, it's more mathematical analysis than I'm prepared to invest given the simple alternative. You could spend a lot of time analysing, checking and debugging, but it's probably better just to not allow the problem to occur in the first place.
The alternative? I'd tend to use the pre-generation method (having a program do the gruntwork up front, inserting the pre-generated values into an array easily and speedily accessible from your real program).
With this method, you can use tools that are well tested and known to work with massive values. Since this data is not going to change, it's useless calculating it every time your program starts.
If you want an easy (and efficient) way to do this, the following bash script in conjunction with bc and awk can do this:
#!/usr/bin/bash
bc >nums.txt <<EOF
i = 1;
for (x = 0;x <= 10000; x++) {
i % 1000000007;
i = i * 2;
}
EOF
awk 'BEGIN { printf "static int array[] = {" }
{ if (NR % 5 == 1) printf "\n ";
printf "%s, ",$0;
next
}
END { print "\n};" }' nums.txt
The bc part is the "meat" of the matter, it creates the large powers of two and outputs them modulo the number you provided. The awk part is simply to format them in C-style array elements, five per line.
Just take the output of that and put it into your code and, voila, there you have it, a compile-time-expensed array that you can use for fast lookup.
It takes only a second and a half on my box to generate the array and then you never need to do it again. You also won't have to concern yourself with the vagaries of modulo math :-)
static int array[] = {
1,2,4,8,16,
32,64,128,256,512,
1024,2048,4096,8192,16384,
32768,65536,131072,262144,524288,
1048576,2097152,4194304,8388608,16777216,
33554432,67108864,134217728,268435456,536870912,
73741817,147483634,294967268,589934536,179869065,
359738130,719476260,438952513,877905026,755810045,
511620083,23240159,46480318,92960636,185921272,
371842544,743685088,487370169,974740338,949480669,
898961331,797922655,595845303,191690599,383381198,
766762396,533524785,67049563,134099126,268198252,
536396504,72793001,145586002,291172004,582344008,
164688009,329376018,658752036,317504065,635008130,
270016253,540032506,80065005,160130010,320260020,
640520040,281040073,562080146,124160285,248320570,
:
861508356,723016705,446033403,892066806,784133605,
568267203,136534399,273068798,546137596,92275185,
184550370,369100740,738201480,476402953,952805906,
905611805,
};
If you notice that your modulo can be stored in int. MOD=1000000007(decimal) is equivalent of 0b00111011100110101100101000000111 and can be stored in 32 bits.
- i pow(2,i) bit representation
- 0 1 0b00000000000000000000000000000001
- 1 2 0b00000000000000000000000000000010
- 2 4 0b00000000000000000000000000000100
- 3 8 0b00000000000000000000000000001000
- ...
- 29 536870912 0b00100000000000000000000000000000
Tricky part starts when pow(2,i) is grater than your MOD=1000000007, but if you know that current pow(2,i) will be greater than your MOD, you can actually see how bits look like after MOD
- i pow(2,i) pow(2,i)%MOD bit representation
- 30 1073741824 73741817 0b000100011001010011000000000000
- 31 2147483648 147483634 0b001000110010100110000000000000
- 32 4294967296 294967268 0b010001100101001100000000000000
- 33 8589934592 589934536 0b100011001010011000000000000000
So if you have pow(2,i-1)%MOD you can do *2 actually on pow(2,i-1)%MOD till you're next pow(2,i) will be greater than MOD.
In example for i=34 you will use (589934536*2) MOD 1000000007 instead of (8589934592*2) MOD 1000000007, because 8589934592 can't be stored in int.
Additional you can try bit operations instead of multiplication for pow(2,i).
Bit operation same as multiplication for 2 is bit shift left.
I'm programming in C++ and I have to store big numbers in one of my exercices.
The biggest number i have to store is : 9 780 321 563 842.
Each time i try to print the number (contained in a variable) it gives me a wrong result (not that number).
A 32bit type isn't enough since 2^32 is a 10 digit number and I have to store a 13 digit number. But with 64 bits you can respresent a number that has 20digits. So I tried using the type "uint64_t" but that didn't work for me and I really don't understand why.
So I searched on the internet to find which type would be sufficient for my variable to fit in. I saw on this forum persons with the same problem but they solved it using long long int or long double as type. But none worked for me (neither did long float).
I really don't know which other type could store that number, as I tried a lot but nothing worked for me.
Thanks for your help! :)
--
EDIT : The code is a bit long and complex and would not matter for the question, so this is actually what I do with the variable containing that number :
string barcode_s = "9780321563842";
uint64_t barcode = atoi(barcode_s.c_str());
cout << "Barcode is : " << barcode << endl;
Off course I don't put that number in a variable (of type string) "barcode_s" to convert it directly to a number, but that's what happen in my program. I read text from an input file and put it in "barcode_s" (the text I read and put in that variable is always a number) and then I convert that string to a number (using atoi).
So i presume the problem comes from the "atoi" function?
Thanks for your help!
The problem is indeed atoi: it returns an int, which is on most platforms a 32-bits integer. Converting to uint64_t from int will not magically restore the information that has been lost.
There are several solutions, though. In C++03, you could use stringstream to handle the conversion:
std::istringstream stream(barcode_s);
unsigned long barcode = 0;
if (not (stream >> barcode)) { std::abort(); }
In C++11, you can simply use stoul or stoull:
unsigned long long const barcode = std::stoull(barcode_s);
Your number 9 780 321 563 842 is hex 8E52897B4C2, which fits into 44 bits (4 bits per hex digit), so any 64 bit integer, no matter if signed or unsigned, will have space to spare. 'uint64_t' will work, and it will even fit into a 'double' with no loss of precision.
It follows that the remaining issue is a mistake in your code, usually that is either an accidental conversion of the 64 bit number to another type somewhere, or you are calling the wrong fouction to print a 64 bit integer.
Edit: just saw your code. 'atoi' returns int. As in 'int32_t'. Converting that to 'unit64_t' will not reconstruct the 64 bit number. Have a look at this: http://msdn.microsoft.com/en-us/library/czcad93k.aspx
The atoll () function converts char* to a long long.
If you don't have the longer function available, write your own in the mean time.
uint64_t result = 0 ;
for (unsigned int ii = 0 ; str.c_str()[ii] != 0 ; ++ ii)
{
result *= 10 ;
result += str.c_str () [ii] - '0' ;
}
I have instructions on creating a checksum of a message described like this:
The checksum consists of a single byte equal to the two’s complement sum of all bytes starting from the “message type” word up to the end of the message block (excluding the transmitted checksum). Carry from the most significant bit is ignored.
Another description I found was:
The checksum value contains the twos complement of the modulo 256 sum of the other words in the data message (i.e., message type, message length, and data words). The receiving equipment may calculate the modulo 256 sum of the received words and add this sum to the received checksum word. A result of zero generally indicates that the message was correctly received.
I understand this to mean that I sum the value of all bytes in message (excl checksum), get modulo 256 of this number. get twos complement of this number and that is my checksum.
But I am having trouble with an example message example (from design doc so I must assume it has been encoded correctly).
unsigned char arr[] = {0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30,0x2,0x8,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe};
So the last byte, 0xE, is the checksum. My code to calculate the checksum is as follows:
bool isMsgValid(unsigned char arr[], int len) {
int sum = 0;
for(int i = 0; i < (len-1); ++i) {
sum += arr[i];
}
//modulo 256 sum
sum %= 256;
char ch = sum;
//twos complement
unsigned char twoscompl = ~ch + 1;
return arr[len-1] == twoscompl;
}
int main(int argc, char* argv[])
{
unsigned char arr[] = {0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30,0x2,0x8,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe};
int arrsize = sizeof(arr) / sizeof(arr[0]);
bool ret = isMsgValid(arr, arrsize);
return 0;
}
The spec is here:= http://www.sinet.bt.com/227v3p5.pdf
I assume I have misunderstood the algorithm required. Any idea how to create this checksum?
Flippin spec writer made a mistake in their data example. Just spotted this then came back on here and found others spotted too. Sorry if I wasted your time. I will study responses because it looks like some useful comments for improving my code.
You miscopied the example message from the pdf you linked. The second parameter length is 9 bytes, but you used 0x08 in your code.
The document incorrectly states "8 bytes" in the third column when there are really 9 bytes in the parameter. The second column correctly states "00001001".
In other words, your test message should be:
{0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30, // param1
0x2,0x9,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe} // param2
^^^
With the correct message array, ret == true when I try your program.
Agree with the comment: looks like the checksum is wrong. Where in the .PDF is this data?
Some general tips:
Use an unsigned type as the accumulator; that gives you well-defined behavior on overflow, and you'll need that for longer messages. Similarly, if you store the result in a char variable, make it unsigned char.
But you don't need to store it; just do the math with an unsigned type, complement the result, add 1, and mask off the high bits so that you get an 8-bit result.
Also, there's a trick here, if you're on hardware that uses twos-complement arithmetic: just add all of the values, including the checksum, then mask off the high bits; the result will be 0 if the input was correct.
The receiving equipment may calculate the modulo 256 sum of the received words and add this sum to the received checksum word.
It's far easier to use this condition to understand the checksum:
{byte 0} + {byte 1} + ... + {last byte} + {checksum} = 0 mod 256
{checksum} = -( {byte 0} + {byte 1} + ... + {last byte} ) mod 256
As the others have said, you really should use unsigned types when working with individual bits. This is also true when doing modular arithmetic. If you use signed types, you leave yourself open to a rather large number of sign-related mistakes. OTOH, pretty much the only mistake you open yourself up to using unsigned numbers is things like forgetting 2u-3u is a positive number.
(do be careful about mixing signed and unsigned numbers together: there are a lot of subtleties involved in that too)
I'm a beginner (self-learning) programmer learning C++, and recently I decided to implement a binary-coded decimal (BCD) class as an exercise, and so I could handle very large numbers on Project Euler. I'd like to do it as basically as possible, starting properly from scratch.
I started off using an array of ints, where every digit of the input number was saved as a separate int. I know that each BCD digit can be encoded with only 4 bits, so I thought using a whole int for this was a bit overkill. I'm now using an array of bitset<4>'s.
Is using a library class like this overkill as well?
Would you consider it cheating?
Is there a better way to do this?
EDIT: The primary reason for this is as an exercise - I wouldn't want to use a library like GMP because the whole point is making the class myself. Is there a way of making sure that I only use 4 bits for each decimal digit?
Just one note, using an array of bitset<4>'s is going to require the same amount of space as an array of long's. bitset is usually implemented by having an array of word sized integers be the backing store for the bits, so that bitwise operations can use bitwise word operations, not byte ones, so more gets done at a time.
Also, I question your motivation. BCD is usually used as a packed representation of a string of digits when sending them between systems. There isn't really anything to do with arithmetic usually. What you really want is an arbitrary sized integer arithmetic library like GMP.
Is using a library class like this overkill as well?
I would benchmark it against an array of ints to see which one performs better. If an array of bitset<4> is faster, then no it's not overkill. Every little bit helps on some of the PE problems
Would you consider it cheating?
No, not at all.
Is there a better way to do this?
Like Greg Rogers suggested, an arbitrary precision library is probably a better choice, unless you just want to learn from rolling your own. There's something to learn from both methods (using a library vs. writing a library). I'm lazy, so I usually use Python.
Like Greg Rogers said, using a bitset probably won't save any space over ints, and doesn't really provide any other benefits. I would probably use a vector instead. It's twice as big as it needs to be, but you get simpler and faster indexing for each digit.
If you want to use packed BCD, you could write a custom indexing function and store two digits in each byte.
Is using a library class like this overkill as well?
Would you consider it cheating?
Is there a better way to do this?
1&2: not really
3: each byte's got 8-bits, you could store 2 BCD in each unsigned char.
In general, bit operations are applied in the context of an integer, so from the performance aspect there is no real reason to go to bits.
If you want to go to bit approach to gain experience, then this may be of help
#include <stdio.h>
int main
(
void
)
{
typedef struct
{
unsigned int value:4;
} Nibble;
Nibble nibble;
for (nibble.value = 0; nibble.value < 20; nibble.value++)
{
printf("nibble.value is %d\n", nibble.value);
}
return 0;
}
The gist of the matter is that inside that struct, you are creating a short integer, one that is 4 bits wide. Under the hood, it is still really an integer, but for your intended use, it looks and acts like a 4 bit integer.
This is shown clearly by the for loop, which is actually an infinite loop. When the nibble value hits, 16, the value is really zero, as there are only 4 bits to work with.
As a result nibble.value < 20 never becomes true.
If you look in the K&R White book, one of the notes there is the fact that bit operations like this are not portable, so if you want to port your program to another platform, it may or may not work.
Have fun.
You are trying to get base-10 representation (i.e. decimal digit in each cell of the array). This way either space (one int per digit), or time (4-bits per dgit, but there is overhead of packing/unpacking) is wasted.
Why not try with base-256, for example, and use an array of bytes? Or even base-2^32 with array of ints? The operations are implemented the same way as in base-10. The only thing that will be different is converting the number to a human-readable string.
It may work like this:
Assuming base-256, each "digit" has 256 possible values, so the numbers 0-255 are all single digit values. Than 256 is written as 1:0 (I'll use colon to separate the "digits", we cannot use letters like in base-16), analoge in base-10 is how after 9, there is 10.
Likewise 1030 (base-10) = 4 * 256 + 6 = 4:6 (base-256).
Also 1020 (base-10) = 3 * 256 + 252 = 3:252 (base-256) is two-digit number in base-256.
Now let's assume we put the digits in array of bytes with the least significant digit first:
unsigned short digits1[] = { 212, 121 }; // 121 * 256 + 212 = 31188
int len1 = 2;
unsigned short digits2[] = { 202, 20 }; // 20 * 256 + 202 = 5322
int len2 = 2;
Then adding will go like this (warning: notepad code ahead, may be broken):
unsigned short resultdigits[enough length] = { 0 };
int len = len1 > len2 ? len1 : len2; // max of the lengths
int carry = 0;
int i;
for (i = 0; i < len; i++) {
int leftdigit = i < len1 ? digits1[i] : 0;
int rightdigit = i < len2 ? digits2[i] : 0;
int sum = leftdigit + rightdigit + carry;
if (sum > 255) {
carry = 1;
sum -= 256;
} else {
carry = 0;
}
resultdigits[i] = sum;
}
if (carry > 0) {
resultdigits[i] = carry;
}
On the first iteration it should go like this:
sum = 212 + 202 + 0 = 414
414 > 256, so carry = 1 and sum = 414 - 256 = 158
resultdigits[0] = 158
On the second iteration:
sum = 121 + 20 + 1 = 142
142 < 256, so carry = 0
resultdigits[1] = 142
So at the end resultdigits[] = { 158, 142 }, that is 142:158 (base-256) = 142 * 256 + 158 = 36510 (base-10), which is exactly 31188 + 5322
Note that converting this number to/from a human-readable form is by no means a trivial task - it requires multiplication and division by 10 or 256 and I cannot present code as a sample without proper research. The advantage is that the operations 'add', 'subtract' and 'multiply' can be made really efficient and the heavy conversion to/from base-10 is done only once in the beginning and once after the end of the calculation.
Having said all that, personally, I'd use base 10 in array of bytes and not care about the memory loss. This will require adjusting the constants 255 and 256 above to 9 and 10 respectively.