I'm looking into parsing terminfo database files, which are a type of binary files. You can read about its storage format by your own and confirm the problem I'm facing.
The manual says -
The header section begins the file. This section contains
six short integers in the format described below. These
integers are
(1) the magic number (octal 0432);
...
...
Short integers are stored in two 8-bit bytes. The first
byte contains the least significant 8 bits of the value,
and the second byte contains the most significant 8 bits.
(Thus, the value represented is 256*second+first.) The
value -1 is represented by the two bytes 0377, 0377; other
negative values are illegal. This value generally means
that the corresponding capability is missing from this
terminal. Machines where this does not correspond to the
hardware must read the integers as two bytes and compute
the little-endian value.
The first problem while parsing this type of input is that it fixes the size to 8 bits, so the plain old char cannot be used since it doesn't guarantees the size to be exactly 8 bits. So I was lookin 'Fixed width integer types' but again was faced with dillema of choosing b/w int8_t or uint8_t which clearly states - "provided only if the implementation directly supports the type". So what should I choose so that the type is portable enough.
The second problem is there is no buffer.readInt16LE() method in c++ standard library which might read 16 bytes of data in Little Endian format. So how should I proceed forward to implement this function again in a portable & safe way.
I've already tried reading it with char data type but it definitely produces garbage on my machine. Proper input can be read by infocmp command eg - $ infocmp xterm.
#include <fstream>
#include <iostream>
#include <vector>
int main()
{
std::ifstream db(
"/usr/share/terminfo/g/gnome", std::ios::binary | std::ios::ate);
std::vector<unsigned char> buffer;
if (db) {
auto size = db.tellg();
buffer.resize(size);
db.seekg(0, std::ios::beg);
db.read(reinterpret_cast<char*>(&buffer.front()), size);
}
std::cout << "\n";
}
$1 = std::vector of length 3069, capacity 3069 = {26 '\032', 1 '\001', 21 '\025',
0 '\000', 38 '&', 0 '\000', 16 '\020', 0 '\000', 157 '\235', 1 '\001',
193 '\301', 4 '\004', 103 'g', 110 'n', 111 'o', 109 'm', 101 'e', 124 '|',
71 'G', 78 'N', 79 'O', 77 'M', 69 'E', 32 ' ', 84 'T', 101 'e', 114 'r',
109 'm', 105 'i', 110 'n', 97 'a', 108 'l', 0 '\000', 0 '\000', 1 '\001',
0 '\000', 0 '\000', 1 '\001', 0 '\000', 0 '\000', 0 '\000', 0 '\000',
0 '\000', 0 '\000', 0 '\000', 0 '\000', 1 '\001', 1 '\001', 0 '\000',
....
....
The first problem while parsing this type of input is that it fixes the size to 8 bits, so the plain old char cannot be used since it doesn't guarantees the size to be exactly 8 bits.
Any integer that is at least 8 bits is OK. While char isn't guaranteed to be exactly 8 bits, it is required to be at least 8 bits, so as far as size is concerned, there is no problem other than you may in some cases need to mask the high bits if they exist. However, char might not be unsigned, and you don't want the octets to be interpreted as signed values, so use unsigned char instead.
The second problem is there is no buffer.readInt16LE() method in c++ standard library which might read 16 bytes of data in Little Endian format. So how should I proceed forward to implement this function again in a portable & safe way.
Read one octet at a time into an unsigned char. Assign the first octet to the variable (that is large enough to represent at least 16 bits). Shift the bits of the second octet left by 8 and assign to the variable using the compound bitwise or.
Or better yet, don't re-implement it, but use an third party existing library.
I've already tried reading it with char data type but it definitely produces garbage on my machine.
Then your attempt was buggy. There is no problem inherent with char that would cause garbage output. I recommend using a debugger to solve this problem.
Related
Many compilers seem to be keeping only 0 or 1 in bool values, but I'm not sure this will always work:
int a = 2;
bool b = a;
int c = 3 + b; // 4 or 5?
Yes:
In C++ (§4.5/4):
An rvalue of type bool can be
converted to an rvalue of type int,
with false becoming zero and true
becoming one.
In C, when a value is converted to _Bool, it becomes 0 or 1 (§6.3.1.2/1):
When any scalar value is converted to
_Bool, the result is 0 if the value compares equal to 0; otherwise, the
result is 1.
When converting to int, it's pretty straight-forward. int can hold 0 and 1, so there's no change in value (§6.3.1.3).
Well, not always...
const int n = 100;
bool b[n];
for (int i = 0; i < n; ++i)
{
int x = b[i];
if (x & ~1)
{
std::cout << x << ' ';
}
}
Output on my system:
28 255 34 148 92 192 119 46 165 192 119 232 26 195 119 44 255 34 96 157 192 119
8 47 78 192 119 41 78 192 119 8 250 64 2 194 205 146 124 192 73 64 4 255 34 56 2
55 34 224 255 34 148 92 192 119 80 40 190 119 255 255 255 255 41 78 192 119 66 7
8 192 119 192 73 64 240 255 34 25 74 64 192 73 64
The reason for this apparently weird output is laid out in the standard, 3.9.1 §6:
Values of type bool are either true or false. Using a bool value in ways described by this International Standard as "undefined", such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false.
Is C/C++ .......
There's no language named C/C++.
bool type always guaranteed to be 0 or 1 when typecast'ed to int?
In C++ yes because section $4.5/4 says
An rvalue of type bool can be converted to an rvalue of type int, with false becoming zero and true becoming one.
.
int c = 3 + b; // 4 or 5?
The value of c will be 4
One more example when you are out of the safe boat:
bool b = false;
*(reinterpret_cast<char*>(&b)) = 0xFF;
int from_bool = b;
cout << from_bool << " is " << (b ? "true" : "false");
Output (g++ (GCC) 4.4.7):
255 is true
To be added to the FredOverflow's example.
There is no bool type in C pre C99 (Such as C90), however the bool type in C99/C++ is always guaranteed to be 0 or 1.
In C, all boolean operation are guaranteed to return either 0 or 1, whether the bool type is defined or not.
So a && b or !a or a || b will always return 0 or 1 in C or C++ regardless of the type of a and b.
Types with padding bits may behave strangely if the padding bits don't hold the values expected for the type. Most C89 implementations didn't use padding bits with any of their integer types, but C99 requires that implementations define such a type: _Bool. Reading a _Bool when all of its bits are zero will yield zero. Writing any non-zero value to a _Bool will set its bits to some pattern which will yield 1 when read. Writing zero will set the bits to a pattern (which may or may not be all-bits-zero) which will yield 0 when read.
Unless specified otherwise in an implementation's documentation, any bit pattern other than all-bits-zero which could not have been produced by storing a zero or non-zero value to a _Bool is a trap representation; the Standard says nothing about what will happen if an attempt is made to read such a value. Given, e.g.
union boolChar { _Bool b; unsigned char c; } bc;
storing zero to bc.c and reading bc.b will yield zero. Storing zero or one to bc.b will set bc.c to values which, if written, will cause bc.b to hold zero or one. Storing any other value to bc.c and reading bc.b will yield Undefined Behavior.
I am a trying to receive some data from network using UDP and parse it.
Here is the code,
char recvline[1024];
int n=recvfrom(sockfd,recvline,1024,0,NULL,NULL);
for(int i=0;i<n;i++)
cout << hex <<static_cast<short int>(recvline[i])<<" ";
Printed the output,
19 ffb0 0 0 ff88 d 38 19 48 38 0 0 2 1 3 1 ff8f ff82 5 40 20 16 6 6 22 36 6 2c 0 0 0 0 0 0 0 0
But I am expecting the output like,
19 b0 0 0 88 d 38 19 48 38 0 0 2 1 3 1 8f 82 5 40 20 16 6 6 22 36 6 2c 0 0 0 0 0 0 0 0
The ff shouldn't be there on printed output.
Actually I have to parse this data based on each character,
Like,
parseCommand(recvline);
and the parse code looks,
void parseCommand( char *msg){
int commId=*(msg+1);
switch(commId){
case 0xb0 : //do some operation
break;
case 0x20 : //do another operation
break;
}
}
And while debugging I am getting commId=-80 on watch.
Note:
In Linux I am getting successful output with the code, note that I have used unsigned char instead char for the read buffer.
unsigned char recvline[1024];
int n=recvfrom(sockfd,recvline,1024,0,NULL,NULL);
Where as in Windows recvfrom() not allowing the second argument as unsigned it giving build error, so I chose char
Looks like you might be getting the correct values, but your cast to short int during printing sign-extends your char value, causing ff to be propogated to the top byte if the top bit of your char is 1 (i.e. it is negative). You should first cast it to unsigned type, then extend to int, so you need 2 casts:
cout << hex << static_cast<short int>(static_cast<uint8_t>(recvline[i]))<<" ";
I have tested this and it behaves as expected.
In response to your extension: the data read is fine, it is a matter of how you interpret it. To parse correctly you should:
uint8_t commId= static_cast<uint8_t>(*(msg+1));
switch(commId){
case 0xb0 : //do some operation
break;
case 0x20 : //do another operation
break;
}
As you store your data in a signed data type conversions/promotion to bigger data types will first sign extend the value (filling the high order bits with the value of the MSB) even if it then gets converted to unsigned datatypes.
One solution is to define recvline as uint8_t[] in the first place an cast it to char* when passing it to the recvfrom function. That way, you only have to cast it once and you are using the same code in your windows and linux version. Also uint8_t[] is (at least to me) a clear indication that you are using the array as raw memory instead of a string of some kind.
Another possibility is to simply perform a bitwise And: (recvline[i] & 0xff). Thanks to automatic integral promotion this doesn't even require a cast.
Personal Note:
It is really annoying that the C and C++ standards don't provide a separate type for raw memory (yet), but with any luck well get a byte type in a future standard revision.
I am currently working on sending data to a receiving party based on mod96 encoding scheme. Following is the request structure to be sent from my side:
Field Size Type
1. Message Type 2 "TT"
2. Firm 2 Mod-96
3. Identifier Id 1 Alpha String
4. Start Sequence 3 Mod-96
5. End Sequence 3 Mod-96
My doubt is that the sequence number can be greater than 3 bytes. Suppose I have to send numbers 123 and 123456 as start and end sequence numbers, how to encode it in mod 96 format . Have dropped the query to the receiving party, but they are yet to answer it. Can somebody please throw some light on how to go about encoding the numbers in mod 96 format.
Provided there's a lot of missing detail on what you really need, here's how works Mod-96 econding:
You just use printable characters as if they were digits of a number:
when you encode in base 10 you know that 123 is 10^2*1 + 10^1*2 + 10^0*3
(oh and note that you arbitrary choose that 1's value is really one: value('1') = 1
when you encode in base 96 you know that 123 is
96^2*value('1')+ 96^2*value('2')+96^0*value('3')
since '1' is the ASCII character #49 then value('1') = 49-32 = 17
Encoding 3 printable characters into a number
unsigned int encode(char a, char b, char c){
return (a-32)*96*96 + (b-32)*96 + (c-32);
}
Encoding 2 printable characters into a number
unsigned int encode(char a, char b){
return (b-32)*96 + (c-32);
}
Decoding a number into 2 printable characters
void decode( char* a, char*b, unsigned int k){
* b = k % 96 +32;
* a = k / 96 +32;
}
Decoding a number into 3 printable characters
void decode( char* a, char*b, char*c, unsigned int k){
* c = k % 96 +32;
k/= 96;
* b = k % 96 +32;
* a = k/96 +32;
}
You also of course need to check that characters are printable (between 32 and 127 included) and that numbers you are going to decode are less than 9216 (for 2 characters encoded) and 884736(for 3 characters encoded).
You know the final size would be 6 bytes:
Size 2 => max of 9215 => need 14 bits storage (values up to 16383 unused)
Size 3 => max of 884735 => need 17 bits storage (values up to 131071 unused)
Your packet needs 14+17+17 bits of memory (wich is 48 => exactly 6 bytes) bits storage just for Mod-96 stuff.
Observation:
Instead of 3 fields of sizes(2+3+3) we could have used one field of size(8) => we would finally use 47 bits ( but is still rounded up to 6 bytes)
If you still store each encoded number into a integer number of bytes you would use the same amount of memory (14 bits fits into 2 bytes, 17 bits fits into 3 bytes) used by storing chars directly.
When I type cast 433 to char I get this.
How does 433 equal to -79 while ASCII for 4 & 3 are 52 & 51 respectively, according to this table.
The decimal number 433 is 0x1b1, and is an int and is usually 32 bits longs. What happens when you cast it to a char (which usually have 8 bits) is that all but the lowest 8 bits are just thrown away, leaving you with 0xb1 which is -79 as a signed two-complement 8-bit integer.
Is there a way to stream in a number to a unsigned char?
istringstream bytes( "13 14 15 16 17 18 19 20" );
unsigned char myChars[8];
for( int i = 0; i < 8 && !bytes.eof(); i++ )
{
bytes >> myChars[i];
cout << unsigned( myChars[i] ) << endl;
}
This code currently outputs the ascii values of the first 8 non-space characters:
49 51 49 52 49 53 49 54
But what I want is the numerical values of each token:
13
14
15
16
17
18
19
20
You are reading a char at a time, which means you get '1', '3', skips the space, '1', '4', skips the space, etc.
To read the values as NUMBERS, you need to use an integer type as a temporary:
unsigned short s;
bytes >> s;
myChars[i] = s;
Now, the stream will read an integer value, e.g. 13, 14, and store it in s. Then you convert it to a unsigned char with myChars[i] = s;.
So there is a lot of error checking that you'll bypass to do this without a temporary. For example, does each number being assigned fit in a byte and are there more numbers in bytes than elements of myChars? But presuming you've already delt with those you can just use an istream_iterator<unsigned short>:
copy(istream_iterator<unsigned short>{ bytes }, istream_iterator<unsigned short>{}, begin(myChars))
Live Example
An additional note here: A char[] typically contains a null terminated string. Presuming that's not what you intend, it would be gracious of you to indicate to the reader that's not how you're using it. In c++11 you were given int8_t/uint8_t to do just that. Using something like: uint8_t myChars[8] would make your code more readable.