Reading a double stored in a binary format in a character array - c++

I am trying to read a floating point number stored as a specific binary format in a char array. The format is as follows, where each letter represents a binary digit:
SEEEEEEE MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM
The format is more clearly explained in this website. Basically, the exponent is in Excess-64 notation and the mantissa is normalized to values <1 and >1/16. To get the true value of the number the mantissa is multipled by 16 to the power of the true value of the exponent.
Basicly, what I've done so far is to extract the sign and the exponent values, but I'm having trouble extracting the mantissa. The implementation I'm trying is quite brute force and is probably far from ideal in terms of code but it seemed to me as the simplest. It basicly is:
unsigned long a = 0;
for(int i = 0; i < 7; i++)
a += static_cast<unsigned long>(m_bufRecord[index+1+i])<<((6-i)*8);
It takes every 8-bit byte size stored in the char array and shifts it left according to its index in the array. So if the array I have is as follows:
{0x3f, 0x28, 0xf5, 0xc2, 0x8f, 0x5c, 0x28, 0xf6}
I'm expecting a to take the value:
0x28f5c28f5c28f6
However, with the above implementation a takes the value:
0x27f4c18f5c27f6
Later, I convert the long integer to a floating number using the following code:
double m = a;
m = m*(pow(16, e-14));
m = (s==1)?-m:m;
What is going wrong here? Also, I'd love to know how a conversion like this would be implemented ideally?

I haven't tried running your code, but I suspect the reason you get this:
0x27f4c18f5c27f6
instead of
0x28f5c28f5c28f6
is because your have a "negative number" in the cell previous to it. Are your 8-bit byte array a signed or unsigned value? I expect it will work better if you make it unsigned. [Or move your cast so that it's before the shift operations].

Related

Convering Big Endian Formatted Bits to Intended Decimal Value While Ignoring First Bit

I am a reading binary file and trying to convert from IBM 4 Byte floating point to double in C++. How exactly would one use the first byte of IBM data to find the ccccccc in the given picture
IBM to value conversion chart
The code below gives an exponent way larger than what the data should have. I am confused with how the line
exponent = ((IBM4ByteValue[0] & 127) - 64);
executes, I do not understand the use of the & operator in this statement. But essentially what the previous author of this code implied is that (IBM4ByteValue[0]) is the ccccccc , so does this mean that the ampersand sets a maximum value that the left side of the operator can equal? Even if this is correct though I'm sure how this line accounts for the fact that there Big Endian bitwise notation in the first byte (I believe it is Big Endian after viewing the picture). Not to mention 1000001 and 0000001 should have the same exponent (-63) however they will not with my current interpretation of the previously mentioned line.
So in short could someone show me how to find the ccccccc (shown in the picture link above) using the first byte --> IBM4ByteValue[0]. Maybe accessing each individual bit? However I do not know the code to do this using my array.
**this code is using the std namespace
**I believe ret should be mantissa * pow(16, 24+exponent) however if I'm wrong about the exponent I'm probable wrong about this (I got the IBM Conversion from a previously asked stackoverflow question) **I would have just commented on the old post, but this question was a bit too large, pun intended, for a comment. It is also different in that I am asking how exactly one accesses the bits in an array storing whole bytes.
Code I put together using an IBM conversion from previous question answer
for (long pos = 0; pos < fileLength; pos += BUF_LEN) {
file.seekg(bytePosition);
file.read((char *)(&IBM4ByteValue[0]), BUF_LEN);
bytePosition += 4;
printf("\n%8ld: ", pos);
//IBM Conversion
double ret = 0;
uint32_t mantissa = 0;
uint16_t exponent = 0;
mantissa = (IBM4ByteValue[3] << 16) | (IBM4ByteValue[2] << 8)|IBM4ByteValue[1];
exponent = ((IBM4ByteValue[0] & 127) - 64);
ret = mantissa * exp2(-24 + 4 * exponent);
if (IBM4ByteValue[0] & 128) ret *= -1.;
printf(":%24f", ret);
printf("\n");
system("PAUSE");
}
The & operator basically takes the bits in that value of the array and masks it with the binary value of 127. If a bit in the value of the array is 1, and the corresponding bit position of 127 is 1, the bit will be a resulting 1. 1 & 0 would be 0, and so would 0 & 0 , and 0 & 1. You would be changing the bits. Then you would take the resulting bit value, converted to decimal now, and subtract 64 from it to equal your exponent.
In floating point we always have a bias (in this case, 64) for the exponent. This means that if your exponent is 5, 69 will be stored. So what this code is trying to do is find the original value of the exponent.

Algorithm for converting large hex numbers into decimal form (base 10 form)

I have an array of bytes and length of that array. The goal is to output the string containing that number represented as base-10 number.
My array is little endian. It means that the first (arr[0]) byte is the least significant byte. This is an example:
#include <iostream>
using namespace std;
typedef unsigned char Byte;
int main(){
int len = 5;
Byte *arr = new Byte[5];
int i = 0;
arr[i++] = 0x12;
arr[i++] = 0x34;
arr[i++] = 0x56;
arr[i++] = 0x78;
arr[i++] = 0x9A;
cout << hexToDec(arr, len) << endl;
}
The array consists of [0x12, 0x34, 0x56, 0x78, 0x9A]. The function hexToDec which I want to implement should return 663443878930 which is that number in decimal.
But, the problem is because my machine is 32-bit so it instead outputs 2018915346 (notice that this number is obtained from integer overflow). So, the problem is because I am using naive way (iterating over the array and multiplying it by 256 to the power of position in the array, then multiplying by the byte at that position and finally adding to the sum). This of course yields integer overflow.
I also tried with long long int, but at some point of course, integer overflow occurs.
The arrays I want to represent as decimal number can be very long (more that 1000 bytes) which definitelly requires a lot more clever algorithm than my naive one.
Question
What would be the good algorithm to achieve that? Also, another question I must ask is what is the optimal complexity of that algorithm? Can it be done in linear complexity O(n) where n is the length of the array? I really cannot think about a good idea. Implementation is not the problem, my lack of ideas is.
Advice or idea how to do that will be enough. But, if it is easier to explain using some code, feel free to write in C++.
You can and can not achieve this in O(n). All depends on the internal representation of your number.
For truly binary form (power of 2 base like 256)
is this not solvable in O(n) The hex print of such number is in O(n) however and you can convert HEX string to decadic and back like this:
How to convert a gi-normous integer (in string format) to hex format?
As creating hex string does not require bignum math. You just consequently print the array from MSW to LSW in HEX. This is O(n) but the conversion to DEC is not.
To print bigint in decadic you need to continuously mod/div it by 10 obtaining digits from LSD to MSD until the subresult is zero. Then just print them in reverse order ... The division and modulus can be done at once as they are the same operation. So if your number has N decadic digits then you need N bigint divisions. Each bigint division can be done for example by binary division so we need log2(n) bit shifts and substraction which are all O(n) so the complexity of native bigint print is:
O(N.n.log2(n))
We can compute N from n by logarithms so for BYTEs:
N = log10(base^n)
= log10(2^(8.n))
= log2(2^(8.n))/log2(10)
= 8.n/log2(10)
= 8.n*0.30102999
= 2.40824.n
So the complexity will be:
O(2.40824.n.n.log2(n)) = O(n^2.log2(n))
Which is insaine for really big numbers.
power of 10 base binary form
To do this in O(n) you need to slightly change the base of your number. it will still be represented in binary form but the base will be power of 10.
For example if your number will be represented by 16bit WORDs you can use highest base 10000 which still fits in it (max is 16536). Now you print in decadic easily just print consequently each word in you array from MSW to LSW.
Example:
lets have big number 1234567890 stored as BYTEs with base 100 where MSW goes first. So the number will be stored as follows
BYTE x[] = { 12, 34, 56, 78, 90 }
But as you can see while using BYTEs and base 100 we are wasting space as only 100*100/256=~39% is used from the full BYTE range. The operations on such numbers are slightly different then in raw binary form as we need to handle overflow/underflow and carry flag differently.
BCD (binary coded decimal)
There is also another option which is to use BCD (binary coded decimal) it is almost the same as previous option but the base 10 is used for single digit of number... each nibel (4 bits) contains exactly one digit. Processors usually have instruction set for this number representation. The usage is like for binary encoded numbers but after each arithmetics operation is BCD recovery instruction called like DAA which uses Carry and Auxiliary Carry flags state to recover BCD encoding of the result. To print value in BCD in decadic you just print the value as HEX. Our number from previous example would be encoded in BCD like this:
BYTE x[] = { 0x12, 0x34, 0x56, 0x78, 0x90 }
Off course both #2,#3 will make impossible the HEX print of your number in O(n).
The number you posted 0x9a78563412, as you have represented it in little endian format, can be converted to a proper uint64_t with the following code:
#include <iostream>
#include <stdint.h>
int main()
{
uint64_t my_number = 0;
const int base = 0x100; /* base 256 */
uint8_t array[] = { 0x12, 0x34, 0x56, 0x78, 0x9a };
/* go from right to left, as it is little endian */
for (int i = sizeof array; i > 0;) {
my_number *= base;
my_number += array[--i];
}
std::cout << my_number << std::endl; /* conversion uses 10 base by default */
}
sample run gives:
$ num
663443878930
as we are in a base that is an exact power of 2, we can optimize the code by using
my_number <<= 8; /* left shift by 8 */
my_number |= array[--i]; /* bit or */
as these operations are simpler than integer multiplication and sum, it is expected some (but not much) efficiency improvement in doing that way. It should be more expressive to leave it as in the first example, as it more represents an arbitrary base conversion.
You'll need to brush up your elementary school skills and implement long division.
I think you'd be better off implementing the long division in base 16 (divide the number by 0x0A each iteration). Take the reminder of each division - these are your decimal digits (first one is the least significant digit).

converting a hexadecimal number correctly in a decimal number (also negatives)

As the headline supposes I am trying to convert hex numbers like 0x320000dd to decimal numbers.
My code only works for positive numbers but fails when it comes to hex numbers that represent negative decimal numbers. Here is a excerpt of my code:
cin>>hex>> x;
unsigned int number = x;
int r = number & 0xffffff;
My input is alreay in hex, and the computer converts it automatically in an integer. What I am trying to do is getting the operand of the hex number, so the last 24 bits.
Can you help me get my code working for negative values like :0x32fffdc9 or 0x32ffffff? Thank's a lot!
EDIT:
I would like my output to be :
0x32fffdc9 --> -567
or
0x32ffffff --> -1
so just the plain decimal values but instead it gives me 16776649 and 16777215 for the upper examples.
Negative integers are typically stored in 2's complement Meaning that if your most significant bit (MSB) is not set the number is not negative. This means that just as you need to unset the 8-MSBs of your number to clamp a 32-bit number to a 24-bit positive number, so you'll need to set the 8-MSBs of your number to clamp to a negative number:
const int32_t r = 0x800000 & number ? 0xFF000000 | number : number & 0xFFFFFF;
vector<bool> or bitset may be worth your consideration, as they would clarify the hexadecimal numbers to the range of bits to be set.

How to use negative number with openssl's BIGNUM?

I want a C++ version of the following Java code.
BigInteger x = new BigInteger("00afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d", 16);
BigInteger y = x.multiply(BigInteger.valueOf(-1));
//prints y = ff5028d4a7ca52dd15a297d860053f49ad83e54f04ce0e19b908d728a342c519a3
System.out.println("y = " + new String(Hex.encode(y.toByteArray())));
And here is my attempt at a solution.
BIGNUM* x = BN_new();
BN_CTX* ctx = BN_CTX_new();
std::vector<unsigned char> xBytes = hexStringToBytes(“00afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d");
BN_bin2bn(&xBytes[0], xBytes.size(), x);
BIGNUM* negative1 = BN_new();
std::vector<unsigned char> negative1Bytes = hexStringToBytes("ff");
BN_bin2bn(&negative1Bytes[0], negative1Bytes.size(), negative1);
BIGNUM* y = BN_new();
BN_mul(y, x, negative1, ctx);
char* yHex = BN_bn2hex(y);
std::string yStr(yHex);
//prints y = AF27542CDD7775C7730ABF785AC5F59C299E964A36BFF460B031AE85607DAB76A3
std::cout <<"y = " << yStr << std::endl;
(Ignored the case.) What am I doing wrong? How do I get my C++ code to output the correct value "ff5028d4a7ca52dd15a297d860053f49ad83e54f04ce0e19b908d728a342c519a3". I also tried setting negative1 by doing BN_set_word(negative1, -1), but that gives me the wrong answer too.
The BN_set_negative function sets a negative number.
The negative of afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d is actually -afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d , in the same way as -2 is the negative of 2.
ff5028d4a7ca52dd15a297d860053f49ad83e54f04ce0e19b908d728a342c519a3 is a large positive number.
The reason you are seeing this number in Java is due to the toByteArray call . According to its documentation, it selects the minimum field width which is a whole number of bytes, and also capable of holding a two's complement representation of the negative number.
In other words, by using the toByteArray function on a number that current has 1 sign bit and 256 value bits, you end up with a field width of 264 bits. However if your negative number's first nibble were 7 for example, rather than a, then (according to this documentation - I haven't actually tried it) you would get a 256-bit field width out (i.e. 8028d4..., not ff8028d4.
The leading 00 you have used in your code is insignificant in OpenSSL BN. I'm not sure if it is significant in BigInteger although the documentation for that constructor says "The String representation consists of an optional minus or plus sign followed by a sequence of one or more digits in the specified radix. "; so the fact that it accepts a minus sign suggests that if the minus sign is not present then the input is treated as a large positive number, even if its MSB is set. (Hopefully a Java programmer can clear this paragraph up for me).
Make sure you keep clear in your mind the distinction between a large negative value, and a large positive number obtained by modular arithmetic on that negative value, such as is the output of toByteArray.
So your question is really: does Openssl BN have a function that emulates the behaviour of BigInteger.toByteArray() ?
I don't know if such a function exists (the BN library has fairly bad documentation IMHO, and I've never heard of it being used outside of OpenSSL, especially not in a C++ program). I would expect it doesn't, since toByteArray's behaviour is kind of weird; and in any case, all of the BN output functions appear to output using a sign-magnitude format, rather than a two's complement format.
But to replicate that output, you could add either 2^256 or 2^264 to the large negative number , and then do BN_bn2hex . In this particular case, add 2^264, In general you would have to measure the current bit-length of the number being stored and round the exponent up to the nearest multiple of 8.
Or you could even output in sign-magnitude format (using BN_bn2hex or BN_bn2mpi) and then iterate through inverting each nibble and fixing up the start!
NB. Is there any particular reason you want to use OpenSSL BN? There are many alternatives.
Although this is a question from 2014 (more than five years ago), I would like to solve your problem / clarify the situation, which might help others.
a) One's complement and two's complement
In finite number theory, there is "one's complement" and "two's complement" representation of numbers. One's complement stores absolute (positive) values only and does not know a sign. If you want to have a sign for a number stored as one's complement, then you have to store it separately, e.g. in one bit (0=positive, 1=negative). This is exactly the situation of floating point numbers (IEEE 754). The mantissa is stored as the one's complement together with the exponent and one additional sign bit. Numbers in one's complement have two zeros: -0 and +0 because you treat the sign independently of the absolute value itself.
In two's complement, the most significant bit is used as the sign bit. There is no '-0' because negating a value in two's complement means performing the logical NOT (in C: tilde) operation followed by adding one.
As an example, one byte (in two's complement) can be one of the three values 0xFF, 0x00, 0x01 meaning -1, 0 and 1. There is no room for the -0. If you have, e.g. 0xFF (-1) and want to negate it, then the logical NOT operation computes 0xFF => 0x00. Adding one yields 0x01, which is 1.
b) OpenSSL BIGNUM and Java BigInteger
OpenSSL's BIGNUM implementation represents numbers as one's complement. The Java BigInteger treats numbers as two's complement. That was your desaster. Your big integer (in hex) is 00afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d. This is a positive 256bit integer. It consists of 33 bytes because there is a leading zero byte 0x00, which is absolutely correct for an integer stored as two's complement because the most significant bit (omitting the initial 0x00) is set (in 0xAF), which would make this number a negative number.
c) Solution you were looking for
OpenSSL's function bin2bn works with absolute values only. For OpenSSL, you can leave the initial zero byte or cut it off - does not make any difference because OpenSSL canonicalizes the input data anyway, which means cutting off all leading zero bytes. The next problem of your code is the way you want to make this integer negative: You want to multiply it with -1. Using 0xFF as the only input byte to bin2bn makes this 255, not -1. In fact, you multiply your big integer with 255 yielding the overall result AF27542CDD7775C7730ABF785AC5F59C299E964A36BFF460B031AE85607DAB76A3, which is still positive.
Multiplication with -1 works like this (snippet, no error checking):
BIGNUM* x = BN_bin2bn(&xBytes[0], (int)xBytes.size(), NULL);
BIGNUM* negative1 = BN_new();
BN_one(negative1); /* negative1 is +1 */
BN_set_negative(negative1, 1); /* negative1 is now -1 */
BN_CTX* ctx = BN_CTX_new();
BIGNUM* y = BN_new();
BN_mul(y, x, negative1, ctx);
Easier is:
BIGNUM* x = BN_bin2bn(&xBytes[0], (int)xBytes.size(), NULL);
BN_set_negative(x,1);
This does not solve your problem because as M.M said, this just makes -afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d from afd72b5835ad22ea5d68279ffac0b6527c1ab0fb31f1e646f728d75cbd3ae65d.
You are looking for the two's compülement of your big integer, which is
int i;
for (i = 0; i < (int)sizeof(value); i++)
value[i] = ~value[i];
for (i = ((int)sizeof(posvalue)) - 1; i >= 0; i--)
{
value[i]++;
if (0x00 != value[i])
break;
}
This is an unoptimized version of the two's complement if 'value' is your 33 byte input array containing your big integer prefixed by the byte 0x00. The result of this operation are the 33 bytes ff5028d4a7ca52dd15a297d860053f49ad83e54f04ce0e19b908d728a342c519a3.
d) Working with two's complement and OpenSSL BIGNUM
The whole sequence is like this:
Prologue: If input is negative (check most significant bit), then compute two's complement of input.
Convert to BIGNUM using BN_bin2bn
If input was negative, then call BN_set_negative(x,1)
Main function: Carry out all arithmetic operations using OpenSSL BIGNUM package
Call BN_is_negative to check for negative result
Convert to raw binary byte using BN_bn2bin
If result was negative, then compute two's complement of result.
Epilogue: If result was positive and result raw (output of step 7) byte's most significant bit is set, then prepend a byte 0x00. If result was negative and result raw byte's most significant bit is clear, then prepend a byte 0xFF.

how are integers stored in memory?

I'm confused when I was reading an article about Big/Little Endian.
Code goes below:
#include <iostream>
using namespace std;
int i = 12345678;
int main()
{
char *p = (char*)&i; //line-1
if(*p == 78) //line-2
cout << "little endian" << endl;
if(*p == 12)
cout << "big endian" << endl;
}
Question:
In line-1, can I do the conversion using static_cast<char*>(&i)?
In line-2, according to the code, if it's little-endian, then 78 is stored in the lowest byte, else 12 is stored in the lowest byte. But what I think is that, i = 12345678; will be stored in memory in binary.
If it's little-endian, then the last byte of i's binary will be stored in the lowest byte, but what I don't understand is how can it guarantee that the last byte of i is 78?
Just like, if i = 123;, then i's binary is 01111011, can it guarantee that in little-endian, 23 is stored in the lowest byte?
I'd prefer a reinterpret_cast.
Little-endian and big-endian refer to the way bytes, i.e. 8-bit quantities, are stored in memory, not two-decimal quantities. If i had the value 0x12345678, then you could check for 0x78 and 0x12 to determine endianness, since two hex digits correspond to a single byte (on all the hardware that I've programmed for).
There are two different involved concept here:
Numbers are stored in binary format. 8bits represent a byte, integers can use 1,2,4 or even 8 or 1024 bytes depending on the platform they run on.
Endiannes is the order bytes have in memory (less significant first - LE, or most significant first - BE)
Now, 12345678 is a decimal number whose binary (base2) representation is 101111000110000101001110. Not that easy to be checked, mainly because base2 representation doesn't group exactly into one decimal digit. (there is no integer x so that 2x gives 10).
Hexadecimal number are easyer to fit: 24=16 and 28=162=256.
So the hexadecimal number 0x12345678 forms the bytes 0x12-0x34-0x56-0x78.
Now it's easy to check if the first is 0x12 or 0x78.
(note: the hexadecimal representation of 12345678 is 0x00BC614E, where 0xBC is 188, 0x61 is 97 and 0x4E is 78)
static_cast is one new-style alternative to old-fashioned C-style casts, but it's not appropriate here; reinterpret_cast is for when you're completely changing the data type.
This code simply won't work -- bytes don't hold an even number of decimal digits! The digits of a decimal number don't match up one-to-one with the bytes stored in memory. Decimal 500, for example, could be stored in two bytes as 0x01F4. The "01" stands for 256, and the "F4" is another 244, for a total of 500. You can't say that the "5" from "500" is in either of those two bytes -- there's no direct correspondence.
It should be
unsigned char* p = (unsigned char*)&i;
You cannot use static_cast. Only reinterpret_cast.