Spitting a char array into a sequence of ints and floats - c++

I'm writing a program in C++ to listen to a stream of tcp messages from another program to give tracking data from a webcam. I have the socket connected and I'm getting all the information in but having difficulty splitting it up into the data I want.
Here's the format of the data coming in:
8 byte header:
4 character string,
integer
32 byte message:
integer,
float,
float,
float,
float,
float
This is all being stuck into a char array called buffer. I need to be able to parse out the different bytes into the primitives I need. I have tried making smaller sub arrays such as headerString that was filled by looping through and copying the first 4 elements of the buffer array and I do get the the correct hear ('CCV ') printed out. But when I try the same thing with the next for elements (to get the integer) and try to print it out I get weird ascii characters being printed out. I've tried converting the headerInt array to an integer with the atoi method from stdlib.h but it always prints out zero.
I've already done this in python using the excellent unpack method, is their any alternative in C++?
Any help greatly appreciated,
Jordan
Links
CCV packet structure
Python unpack method

The buffer only contains the raw image of what you read over the
network. You'll have to convert the bytes in the buffer to whatever
format you want. The string is easy:
std::string s(buffer + sOffset, 4);
(Assuming, of course, that the internal character encoding is the same
as in the file—probably an extension of ASCII.)
The others are more complicated, and depend on the format of the
external data. From the description of the header, I gather than the
integers are four bytes, but that still doesn't tell me anything about
their representation. Depending on the case, either:
int getInt(unsigned char* buffer, int offset)
{
return (buffer[offset ] << 24)
| (buffer[offset + 1] << 16)
| (buffer[offset + 2] << 8)
| (buffer[offset + 3] );
}
or
int getInt(unsigned char* buffer, int offset)
{
return (buffer[offset + 3] << 24)
| (buffer[offset + 2] << 16)
| (buffer[offset + 1] << 8)
| (buffer[offset ] );
}
will probably do the trick. (Other four byte representations of
integers are possible, but they are exceedingly rare. Similarly, the
conversion of the unsigned results of the shifts and or's into a int
is implementation defined, but in practice, the above will work almost
everywhere.)
The only hint you give concerning the representation of the floats is in
the message format: 32 bytes, minus a 4 byte integer, leave 28 bytes for
5 floats; but 28 doesn't go into five, so I cannot even guess as to the
length of the floats (except that there must be some padding in there
somewhere). But converting floating point can be more or less
complicated if the external format isn't exactly like the internal
format.

Something like this may work:
struct {
char string[4];
int integers[2];
float floats[5];
} Header;
Header* header = (Header*)buffer;
You should check that sizeof(Header) == 32.

Related

Bits to type from buffer

A file which contains the buffer value. The first 16 bits contain the type. The next 32 bits gives the length of the data. The remaining value in the data.
How can I find the type from the 16 bits (find if it is int or char...)
I'm super stuck in my though process here. Not able to find a way to convert bits to types.
Say you have the homework assignment:
You are given a file where the first bit encodes the type, the
next 7 bits encode the length, and the rest is the data.
The types are encoded in the following way:
0 is for int
1 is for char
Print the ints or chars separated by newlines.
You just use the given information! Since 1 bit is used to encode the type there are two possible types. So you just read the first bit, then do:
if (bit == 0) {
int *i = ...
}
else if (bit == 1) {
char *c = ...
}

How to understand MNIST Binary converter in c++?

I've recently needed to convert mnist data-set to images and labels, it is binary and the structure is in the previous link, so i did a little research and as I'm fan of c++ ,I've read the I/O binary in c++,after that I've found this link in stack. That link works well but no code commenting and no explanation of algorithm so I've get confused and that raise some question in my mind which i need a professional c++ programmer to ask.
1-What is the algorithm to convert the data-set in c++ with help of ifstream?
I've realized to read a file as a binary with file.read and move to the next record, but in C , we define a struct and move it inside the file but i can't see any struct in c++ program for example to read this:
[offset] [type] [value] [description]
0000 32 bit integer 0x00000803(2051) magic number
0004 32 bit integer 60000 number of images
0008 32 bit integer 28 number of rows
0012 32 bit integer 28 number of columns
0016 unsigned byte ?? pixel
How can we go to the specific offset for example 0004 and read for example 32 bit integer and put it to an integer variable.
2-What the function reverseInt is doing? (It is not obviously doing simple reversing an integer)
int ReverseInt (int i)
{
unsigned char ch1, ch2, ch3, ch4;
ch1 = i & 255;
ch2 = (i >> 8) & 255;
ch3 = (i >> 16) & 255;
ch4 = (i >> 24) & 255;
return((int) ch1 << 24) + ((int)ch2 << 16) + ((int)ch3 << 8) + ch4;
}
I've did a little debugging with cout and when it revised for example 270991360 it return 10000 , which i cannot find any relation, I understand it AND the number multiples with two with 255 but why?
PS :
1-I already have the MNIST converted images but i want to understand the algorithm.
2-I've already unzip the gz files so the file is pure binary.
1-What is the algorithm to convert the data-set in c++ with help of ifstream?
This function read a file (t10k-images-idx3-ubyte.gz) as follow:
Read a magic number and adjust endianness
Read number of images and adjust endianness
Read number rows and adjust endianness
Read number of columns and adjust endianness
Read all the given images x rows x columns characters (but loose them).
The function use normal int and always switch endianness, that means it target a very specific architecture and is not portable.
How can we go to the specific offset for example 0004 and read for example 32 bit integer and put it to an integer variable.
ifstream provides a function to seek to a given position:
file.seekg( posInBytes, std::ios_base::beg);
At the given position, you could read the 32-bit integer:
int32_t val;
file.read ((char*)&val,sizeof(int32_t));
2- What the function reverseInt is doing?
This function reverse order of the bytes of an int value:
Considering an integer of 32bit like aaaaaaaabbbbbbbbccccccccdddddddd, it return the integer ddddddddccccccccbbbbbbbbaaaaaaaa.
This is useful for normalizing endianness, however, it is probably not very portable, as int might not be 32bit (but e.g. 16bit or 64bit)

Reading consecutive bytes as one integer

I am new here, and would like to ask this question.
I am working with a binary file that each byte, multiple bytes or even parts of a byte have a different meaning.
What I have been trying so far is to read a number of bytes (4 in my example) as a one block.
I have them in Hexadecimal representation like: 00 1D FB C8.
Using the following code, I read them separately:
for (int j = 36; j < 40;j++)
{
cout << dec << (bitset<8>(fileBuf[j])).to_ulong();
}
where j is the position of the byte in the file. The previous code gives me 029251200 which is wrong. What I want is read the 4 bytes at once and get the answer of 1965000
I appreciate any help.
Thank you.
DWORD final = (fileBuf[j] << 24) + (fileBuf[j+1] << 16) + (fileBuf[j+2] << 8) + (fileBuf[j+3]);
Also depends what kind of endian you want (ABCD / DCBA / CDAB)
EDIT (cant reply due to low rep, just joined today)
I tried to extend the bitset, however it gave the value of the first byte only
It will not work because the fileBuf is 99% byte array, extending from 8bit to 32bit(int) wont make any difference because its still a byte array which is 8bit. You have to mathematicly calculate the value from 4 array elements into original integer representation. see code above edit
The answer isnt "Wrong" this is a logic error. Youre not storing the values and adding the computation
C8 is 200 in decimal form, so youre not appending the value to the original subset.
The answer it spit it out, was infact what you programmed it to do.
You need to either extend the bitset to a larger amount to append the other hex numbers or provide some other means of outputting
Keeping the format of the function from the question, you could do:
//little-endian
{
int i = (fileBuf[j]<<0) | (fileBuf[j+1]<<8) | (fileBuf[j+2]<<16) | (fileBuf[j+3]<<24);
cout << dec << i;
}
// big-endian
{
int i = (fileBuf[j+3]<<0) | (fileBuf[j+2]<<8) | (fileBuf[j+1]<<16) | (fileBuf[j]<<24);
cout << dec << i;
}

How can i store 2 numbers in a 1 byte char?

I have the question of the title, but If not, how could I get away with using only 4 bits to represent an integer?
EDIT really my question is how. I am aware that there are 1 byte data structures in a language like c, but how could I use something like a char to store two integers?
In C or C++ you can use a struct to allocate the required number of bits to a variable as given below:
#include <stdio.h>
struct packed {
unsigned char a:4, b:4;
};
int main() {
struct packed p;
p.a = 10;
p.b = 20;
printf("p.a %d p.b %d size %ld\n", p.a, p.b, sizeof(struct packed));
return 0;
}
The output is p.a 10 p.b 4 size 1, showing that p takes only 1 byte to store, and that numbers with more than 4 bits (larger than 15) get truncated, so 20 (0x14) becomes 4. This is simpler to use than the manual bitshifting and masking used in the other answer, but it is probably not any faster.
You can store two 4-bit numbers in one byte (call it b which is an unsigned char).
Using hex is easy to see that: in b=0xAE the two numbers are A and E.
Use a mask to isolate them:
a = (b & 0xF0) >> 4
and
e = b & 0x0F
You can easily define functions to set/get both numbers in the proper portion of the byte.
Note: if the 4-bit numbers need to have a sign, things can become a tad more complicated since the sign must be extended correctly when packing/unpacking.

How to store two bytes in a BYTE array as an int (or something similar)?

I am writing a bittorrent client in C++ that receives a message from a tracker (server) containing several 6 byte strings. The first 4 bytes represent the IP address of a peer and the next 2 bytes represent the port number that the peer is listening on.
I have worked out how to convert the ip bytes into a human readable ip address but am struggling to convert the two bytes representing the port number into an int (or something similar)
Here are my efforts so far:
BYTE portbinary[2];
unsigned short peerport;
//trackers[i]->peersBinary[j * 6 + 4] is the first byte
portbinary[0] = trackers[i]->peersBinary[j * 6 + 4];
//trackers[i]->peersBinary[j * 6 + 5] is the second byte
portbinary[1] = trackers[i]->peersBinary[j * 6 + 5];
peerport = *portbinary;
Upon examination peerport only seems to contain the integer representation of the first byte, how might I be able to fix this?
Thanks in advance :)
I prefer using bitwise operations instead of type punning because it brings no issues with endianness at all (the port number comes as a big endian number, and many systems today are little endian).
int peerport = (portbinary[0] << 8) | portbinary[1];
Since it seems like the data is aligned, you can just use
peerport = ntohs(*(uint16_t *)(trackers[i]->peersBinary + j * 6 + 4));
Since portbinary is an array of BYTE, then *portbinary is equivalent to portbinary[0].
A portable way to to achieve your result could be:
peerport = portbinary[0];
peerport = 256*peerport + portbinary[1];
This assumes portbinary was delivered in network byte order.
Solution with unions:
union port_extractor
{
BYTE raw_port[2];
unsigned short port_assembled;
};
This will work if your computer endianess is the same as of representation you fetch from the network. Sorry, I dont know how it comes by bittorrent protocol.
If the endianess is the opposite, then the solution you are bound to use is not as nice:
unsigned short port_assembled = (unsigned short)raw_port[first_byte] | ((unsigned short)raw_port[second_byte] << 8);
// first_byte = 0, second_byte = 1 or vice versa
// depending on source data endianess