Encoding and decoding ASN.1 REAL with BER - c++

Excuse me for my bad english. I have a number in the decimal system: 0.15625.
(This is example) http://www.strozhevsky.com/free_docs/asn1_in_simple_words.pdf (Page 5)
By the rule of BER ASN.1 - Encoded in octal: 09 03 90 FE 0A (This is the right decision)
http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf -
Standart ASN.1(8.5 - REAL)
1 byte:
(8-7) Class - Universal - 00
(6) P/C - Primitive - 0
(5-1) Tag Number - 01001(REAL)
TOTAL: 00001001(2) = 09(16) (Correct)
2 byte:
(8) binary encoding - 1
_____________________
(7) When binary encoding is used (bit 8 = 1), then if the mantissa M is
non-zero, it shall be represented by a sign S, a positive integer value N
and a binary scaling factor F, such that:
M = S × N × 2F
Bit 7 of the first contents octets shall be 1 if S is –1 and 0 otherwise.
What I would have bit 7?
_____________________
(6-5) base 8 - 01
_______________________
(3-4) Bits 4 to 3 of the first contents octet shall encode the value of
the binary scaling factor F as an unsigned binary
integer. I don't have scaling factor. - 00
_____________________
(2-1) 8.5.6.4 Bits 2 to 1 of the first contents octet shall encode
the format of the exponent as follows: I do not know how to determine
what my value will be here. (Poor understand English). I think 11?
Total: 1?010011 - NOT EQUAL 03 Why? (Not correct)
What does the 90? Call octet? How to find it? The book does not say, or I simply do not understand.
In FE coded number -2(Exponent), how do I translate FE, not to get 254, and -2? Perhaps it contains information about the byte: 90?
Thank you for listening.

In the section "Chapter 1. Common rules for ASN.1 encoding" it states that an encoding has three sections:
an information block
a length block
a value block
The length block specifies the length of the value block.
The encoding of 0.15625 as the octets 09 03 80 FB 05 breaks down like this:
09 - information block (1 octet)
03 - length block (1 octet)
80 FB 05 - value block (3 octets)
The value block itself consists of three sections: an information octet, a block for the exponent and a block for the mantissa. In this case the mantissa is M = 5 (101 in base 2) and the exponent is E = -5. Therefore the value block is:
80 - information octet
FB - the exponent block (FB = -5)
05 - the mantissa block (5)
The information octet specifies various pieces of information including:
that we are encoding a real number
we are using in base 2, and
the number is non-negative (>= 0)
To answer your question about FE being interpreted as -2, this is how negative numbers are represented in 2s-complement arithmetic (more info). For single octet numbers we have:
FF -> -1
FE -> -2
FD -> -3
...
80 -> -128
7F -> +127
7E -> +126

Related

Is there a formula to find the numbers of bits for either exponent or significand in a floating point number?

Recently, I have been interested with using bit shiftings on floating point numbers to do some fast calculations.
To make them work in more generic ways, I would like to make my functions work with different floating point types, probably through templates, that is not limited to float and double, but also "halfwidth" or "quadruple width" floating point numbers and so on.
Then I noticed:
- Half --- 5 exponent bits --- 10 signicant bits
- Float --- 8 exponent bits --- 23 signicant bits
- Double --- 11 exponent bits --- 52 signicant bits
So far I thought exponent bits = logbase2(total byte) * 3 + 2,
which means 128bit float should have 14 exponent bits, and 256bit float should have 17 exponent bits.
However, then I learned:
- Quad --- 15 exponent bits --- 112 signicant bits
- Octuple--- 19 exponent bits --- 237 signicant bits
So, is there a formula to find it at all? Or, is there a way to call it through some builtin functions?
C or C++ are preferred, but open to other languages.
Thanks.
Characteristics Provided Via Built-In Functions
C++ provides this information via the std::numeric_limits template:
#include <iostream>
#include <limits>
#include <cmath>
template<typename T> void ShowCharacteristics()
{
int radix = std::numeric_limits<T>::radix;
std::cout << "The floating-point radix is " << radix << ".\n";
std::cout << "There are " << std::numeric_limits<T>::digits
<< " base-" << radix << " digits in the significand.\n";
int min = std::numeric_limits<T>::min_exponent;
int max = std::numeric_limits<T>::max_exponent;
std::cout << "Exponents range from " << min << " to " << max << ".\n";
std::cout << "So there must be " << std::ceil(std::log2(max-min+1))
<< " bits in the exponent field.\n";
}
int main()
{
ShowCharacteristics<double>();
}
Sample output:
The floating-point radix is 2.
There are 53 base-2 digits in the significand.
Exponents range from -1021 to 1024.
So there must be 11 bits in the exponent field.
C also provides the information, via macro definitions like DBL_MANT_DIG defined in <float.h>, but the standard defines the names only for types float (prefix FLT), double (DBL), and long double (LDBL), so the names in a C implementation that supported additional floating-point types would not be predictable.
Note that the exponent as specified in the C and C++ standards is one off from the usual exponent described in IEEE-754: It is adjusted for a significand scaled to [½, 1) instead of [1, 2), so it is one greater than the usual IEEE-754 exponent. (The example above shows the exponent ranges from −1021 to 1024, but the IEEE-754 exponent range is −1022 to 1023.)
Formulas
IEEE-754 does provide formulas for recommended field widths, but it does not require IEEE-754 implementations to conform to these, and of course the C and C++ standards do not require C and C++ implementations to conform to IEEE-754. The interchange format parameters are specified in IEEE 754-2008 3.6, and the binary parameters are:
For a floating-point format of 16, 32, 64, or 128 bits, the significand width (including leading bit) should be 11, 24, 53, or 113 bits, and the exponent field width should be 5, 8, 11, or 15 bits.
Otherwise, for a floating-point format of k bits, k should be a multiple of 32, and the significand width should be k−round(4•log2k)+13, and the exponent field should be round(4•log2k)−13.
The answer is no.
How many bits to use (or even which representation to use) is decided by compiler implementers and committees. And there's no way to guess what a committee decided (and no, it's not the "best" solution for any reasonable definition of "best"... it's just what happened that day in that room: an historical accident).
If you really want to get down to that level you need to actually test your code on the platforms you want to deploy to and add in some #ifdef macrology (or ask the user) to find which kind of system your code is running on.
Also beware that in my experience one area in which compilers are extremely aggressive (to the point of being obnoxious) about type aliasing is with floating point numbers.
I want to see if there's a formula is to say if 512bit float is put in as standard, it would automatically work with it, without the need of altering anything
I don't know of a published standard that guarantees the bit allocation for future formats (*). Past history shows that several considerations factor into the final choice, see for example the answer and links at Why do higher-precision floating point formats have so many exponent bits?.(*) EDIT: see note added at the end.
For a guessing game, the existing 5 binary formats defined by IEEE-754 hint that the number of exponent bits grows slightly faster than linear. One (random) formula that fits these 5 data points could be for example (in WA notation) exponent_bits = round( (log2(total_bits) - 1)^(3/2) ).
This would foresee that a hypothetical binary512 format would assign 23 bits to the exponent, though of course IEEE is not bound in any way by such second-guesses.
The above is just an interpolation formula that happens to match the 5 known exponents, and it is certainly not the only such formula. For example, searching for the sequence 5,8,11,15,19 on oeis finds 18 listed integer sequences that contain this as a subsequence.
[ EDIT ]   As pointed out in #EricPostpischil's answer, IEEE 754-2008 does in fact list the formula exponent_bits = round( 4 * log2(total_bits) - 13 ) for total_bits >= 128 (the formula actually holds for total_bits = 64, too, though it does not for = 32 or = 16).
The empirical formula above matches the reference IEEE one for 128 <= total_bits <= 1472, in particular IEEE also gives 23 exponent bits for binary512 and 27 exponent bits for binary1024.
UPDATE : I've now incorporated that into a single unified function that perfectly lines up with the official formula while incorporating the proper exponents for 16- and 32-bit formats, and how the bits are split between sign-bit, exponent bits, and mantissa bits.
inputs can be in # of bits, e.g. 64, a ratio like "2x", or even case-insensitive single letters :
"S" for 1x single, - "D" for 2x double,
"Q" for 4x quadruple, - "O" for 8x "octuple",
"X" for 16x he"X", - "T" for 32x "T"hirty-two,
-— all other inputs, missing, or invalid, defaults to 0.5x half-precision
gcat <( jot 20 | mawk '$!(_=NF)=(_+_)^($_)' ) \
<( jot - -1 8 | mawk '$!NF =(++_+_)^$(_--)"x"' ) |
{m,g}awk '
function _754(__,_,___) {
return \
(__=(__==___)*(_+=_+=_^=_<_) ? _--^_++ : ">"<__ ? \
(_+_)*(_*_/(_+_))^index("SDQOXT", toupper(__)) : \
__==(+__ "") ? +__ : _*int(__+__)*_)<(_+_) \
\
? "_ERR_754_INVALID_INPUT_" \
: "IEEE-754-fp:" (___=__) "-bit:" (_^(_<_)) "_s:"(__=int(\
log((_^--_)^(_+(__=(log(__)/log(--_))-_*_)/_-_)*_^(\
-((++_-__)^(--_<__) ) ) )/log(_))) "_e:" (___-++__) "_m"
}
function round(__,_) {
return \
int((++_+_)^-_+__)
}
function _4xlog2(_) {
return (log(_)/log(_+=_^=_<_))*_*_
}
BEGIN { CONVFMT = OFMT = "%.250g"
}
( $++NF = _754(_=$!__) ) \
( $++NF = "official-expn:" \
+(_=round(_4xlog2(_=_*32^(_~"[0-9.]+[Xx]")))-13) < 11 ? "n/a" :_) |
column -s':' -t | column -t | lgp3 5
.
2 _ERR_754_INVALID_INPUT_ n/a
4 _ERR_754_INVALID_INPUT_ n/a
8 IEEE-754-fp 8-bit 1_s 2_e 5_m n/a
16 IEEE-754-fp 16-bit 1_s 5_e 10_m n/a
32 IEEE-754-fp 32-bit 1_s 8_e 23_m n/a
64 IEEE-754-fp 64-bit 1_s 11_e 52_m 11
128 IEEE-754-fp 128-bit 1_s 15_e 112_m 15
256 IEEE-754-fp 256-bit 1_s 19_e 236_m 19
512 IEEE-754-fp 512-bit 1_s 23_e 488_m 23
1024 IEEE-754-fp 1024-bit 1_s 27_e 996_m 27
2048 IEEE-754-fp 2048-bit 1_s 31_e 2016_m 31
4096 IEEE-754-fp 4096-bit 1_s 35_e 4060_m 35
8192 IEEE-754-fp 8192-bit 1_s 39_e 8152_m 39
16384 IEEE-754-fp 16384-bit 1_s 43_e 16340_m 43
32768 IEEE-754-fp 32768-bit 1_s 47_e 32720_m 47
65536 IEEE-754-fp 65536-bit 1_s 51_e 65484_m 51
131072 IEEE-754-fp 131072-bit 1_s 55_e 131016_m 55
262144 IEEE-754-fp 262144-bit 1_s 59_e 262084_m 59
524288 IEEE-754-fp 524288-bit 1_s 63_e 524224_m 63
1048576 IEEE-754-fp 1048576-bit 1_s 67_e 1048508_m 67
0.5x IEEE-754-fp 16-bit 1_s 5_e 10_m n/a
1x IEEE-754-fp 32-bit 1_s 8_e 23_m n/a
2x IEEE-754-fp 64-bit 1_s 11_e 52_m 11
4x IEEE-754-fp 128-bit 1_s 15_e 112_m 15
8x IEEE-754-fp 256-bit 1_s 19_e 236_m 19
16x IEEE-754-fp 512-bit 1_s 23_e 488_m 23
32x IEEE-754-fp 1024-bit 1_s 27_e 996_m 27
64x IEEE-754-fp 2048-bit 1_s 31_e 2016_m 31
128x IEEE-754-fp 4096-bit 1_s 35_e 4060_m 35
256x IEEE-754-fp 8192-bit 1_s 39_e 8152_m 39
===============================================
Similar to the concept mentioned above, here's an alternative formula (just re-arranging some terms) that will calculate the unsigned integer range of the exponent ([32,256,2048,32768,524288], corresponding to [5,8,11,15,19]-powers-of-2) without needing to call the round function :
uint_range = ( 64 ** ( 1 + (k=log2(bits)-4)/2) )
*
( 2 ** -( (3-k)**(2<k) ) )
(a) x ** y means x-to-y-power
(b) 2 < k is a boolean condition that should just return 0 or 1.
The function shall be accurate from 16-bit to 256-bit, at least. Beyond that, this formula yields exponent sizes of
– 512-bit : 23
– 1024-bit : 27
– 2048-bit : 31
– 4096-bit : 35
(beyond-256 may be inaccurate. even 27-bit-wide exponent allows exponents that are +/- 67 million, and over 40-million decimal digits once you calculate 2-to-that-power.)
from there to IEEE 754 exponent is just a matter of log2(uint_range)

C++ Encoding Numbers

I am currently working on sending data to a receiving party based on mod96 encoding scheme. Following is the request structure to be sent from my side:
Field Size Type
1. Message Type 2 "TT"
2. Firm 2 Mod-96
3. Identifier Id 1 Alpha String
4. Start Sequence 3 Mod-96
5. End Sequence 3 Mod-96
My doubt is that the sequence number can be greater than 3 bytes. Suppose I have to send numbers 123 and 123456 as start and end sequence numbers, how to encode it in mod 96 format . Have dropped the query to the receiving party, but they are yet to answer it. Can somebody please throw some light on how to go about encoding the numbers in mod 96 format.
Provided there's a lot of missing detail on what you really need, here's how works Mod-96 econding:
You just use printable characters as if they were digits of a number:
when you encode in base 10 you know that 123 is 10^2*1 + 10^1*2 + 10^0*3
(oh and note that you arbitrary choose that 1's value is really one: value('1') = 1
when you encode in base 96 you know that 123 is
96^2*value('1')+ 96^2*value('2')+96^0*value('3')
since '1' is the ASCII character #49 then value('1') = 49-32 = 17
Encoding 3 printable characters into a number
unsigned int encode(char a, char b, char c){
return (a-32)*96*96 + (b-32)*96 + (c-32);
}
Encoding 2 printable characters into a number
unsigned int encode(char a, char b){
return (b-32)*96 + (c-32);
}
Decoding a number into 2 printable characters
void decode( char* a, char*b, unsigned int k){
* b = k % 96 +32;
* a = k / 96 +32;
}
Decoding a number into 3 printable characters
void decode( char* a, char*b, char*c, unsigned int k){
* c = k % 96 +32;
k/= 96;
* b = k % 96 +32;
* a = k/96 +32;
}
You also of course need to check that characters are printable (between 32 and 127 included) and that numbers you are going to decode are less than 9216 (for 2 characters encoded) and 884736(for 3 characters encoded).
You know the final size would be 6 bytes:
Size 2 => max of 9215 => need 14 bits storage (values up to 16383 unused)
Size 3 => max of 884735 => need 17 bits storage (values up to 131071 unused)
Your packet needs 14+17+17 bits of memory (wich is 48 => exactly 6 bytes) bits storage just for Mod-96 stuff.
Observation:
Instead of 3 fields of sizes(2+3+3) we could have used one field of size(8) => we would finally use 47 bits ( but is still rounded up to 6 bytes)
If you still store each encoded number into a integer number of bytes you would use the same amount of memory (14 bits fits into 2 bytes, 17 bits fits into 3 bytes) used by storing chars directly.

Why (char)433 is equal to -79'+-' in c++?

When I type cast 433 to char I get this.
How does 433 equal to -79 while ASCII for 4 & 3 are 52 & 51 respectively, according to this table.
The decimal number 433 is 0x1b1, and is an int and is usually 32 bits longs. What happens when you cast it to a char (which usually have 8 bits) is that all but the lowest 8 bits are just thrown away, leaving you with 0xb1 which is -79 as a signed two-complement 8-bit integer.

represent negative number with 2' complement technique?

I am using 2' complement to represent a negative number in binary form
Case 1:number -5
According to the 2' complement technique:
Convert 5 to the binary form:
00000101, then flip the bits
11111010, then add 1
00000001
=> result: 11111011
To make sure this is correct, I re-calculate to decimal:
-128 + 64 + 32 + 16 + 8 + 2 + 1 = -5
Case 2: number -240
The same steps are taken:
11110000
00001111
00000001
00010000 => recalculate this I got 16, not -240
I am misunderstanding something?
The problem is that you are trying to represent 240 with only 8 bits. The range of an 8 bit signed number is -128 to 127.
If you instead represent it with 9 bits, you'll see you get the correct answer:
011110000 (240)
100001111 (flip the signs)
+
000000001 (1)
=
100010000
=
-256 + 16 = -240
Did you forget that -240 cannot be represented with 8 bits when it is signed ?
The lowest negative number you can express with 8 bits is -128, which is 10000000.
Using 2's complement:
128 = 10000000
(flip) = 01111111
(add 1) = 10000000
The lowest negative number you can express with N bits (with signed integers of course) is always - 2 ^ (N - 1).

IEEE 754 Float debug - from memory little endian to the actual floating number

I am testing IEEE 754 floating format with VS2008 using the example below:
int main(int argc, char *argv[])
{
float i = 0.15625;
}
I put &i to the VS2008 watch and I see the address is 0x0012FF60 and I can see address's content is 00 00 20 3e from Memory debug window, see below:
0x0012FF60 00 00 20 3e cc cc cc cc
BTW I have the basic knowledge of IEEE754 floating format and I know IEEE 754 floating format consist of three fields: sign bit, exponent, and fraction. The fraction is the significand without its most significant bit.
But how did I calcuate exactly from little endian 00 00 20 3e to 0.15625 ?
Many thanks
Memory layout of a 32bit float ( see http://en.wikipedia.org/wiki/Single_precision ) on a big endian machine.
A little endian machine (eg. x86) simply swaps pairs of bytes, the 'cc' are unused bits of memory to make the 32bit float upto a 64bit value being displayed by the debugger
edit:
Remember the exponent is signed (twos complement) and since 0.15625 is less than 1 the exponent is negative)
value = sign * 2^exp * mantissa.
0x3e = 0011 1110
0x20 = 0010 0000
Because of the sign bit we have to shuffle these along one so
exponent = 0111 1100 = -3
mantissa = 0100 0000 = 1 + 0.25 ( the one before the first place is assumed)
ie 0.15625 = +1 * 2^(-3) * 1.25
You are printing out something broken. We only need 32 bits, which are:
00 00 20 3E
Your variable in binary:
00000000 00000000 00100000 00111110
Logical value accounting for little endian:
00111110 00100000 00000000 00000000
According to IEEE:
0 01111100 01000000000000000000000
S E - 127 M - 1
So now it's clear:
the sign is +1 (S = 0)
the exponent is 124 - 127 = -3
the mantissaa is 1.01b, which is 5/4
So the value is 5/4 / 8 = 5/32 = 0.15625.
Your value is hex 0x3E200000 or
0011 1110 0010 0000 0000 0000 0000 0000
or rearranged:
s ----e--- ----------m------------
0 01111100 01000000000000000000000
sign_bit = 0 (i.e. positive)
exponent = 0x7C = 124 ---> subtract 127 to get -3
significand = 1 + 0.0100... = 1.0100... = 1*2^0 + 0*2^-1 + 1*2^-2 = 1.25
significand * 2^exponent = 1.25 * 2^-3 = 1.25 * 0.125 = 0.15625
The basic format of an IEEE floating point is based on a four byte
value, and it is simpler to analyse if you display it as such. In that
case, the top bit is the exponent, the next 8 the exponent (in excess
127), and the rest the mantissa. The simplest way to explain it is
probably to show the C++ code which would access the separate fields:
double d;
// ...
uint32_t const* p = reinterpret_cast<uint32_t const*>( &d );
bool isNegative = (*p & 0x80000000) != 0;
int exp = ((*p & 0x78000000) >> 23) - 127;
int mantissa = (*p & 0x07FFFFFF) | 0x08000000 ;
The mantissa should have an implicit decimal place just above the 24
bits (But I don't know how to represent this as an integer:-)).
If all you have is a sequence of bytes, you have to assemble them,
according to the byte order, and then apply the above.
Edited: the constant values have been corrected, following up on Rudy Velthuis' pointing out my error.