Reading consecutive bytes as one integer - c++

I am new here, and would like to ask this question.
I am working with a binary file that each byte, multiple bytes or even parts of a byte have a different meaning.
What I have been trying so far is to read a number of bytes (4 in my example) as a one block.
I have them in Hexadecimal representation like: 00 1D FB C8.
Using the following code, I read them separately:
for (int j = 36; j < 40;j++)
{
cout << dec << (bitset<8>(fileBuf[j])).to_ulong();
}
where j is the position of the byte in the file. The previous code gives me 029251200 which is wrong. What I want is read the 4 bytes at once and get the answer of 1965000
I appreciate any help.
Thank you.

DWORD final = (fileBuf[j] << 24) + (fileBuf[j+1] << 16) + (fileBuf[j+2] << 8) + (fileBuf[j+3]);
Also depends what kind of endian you want (ABCD / DCBA / CDAB)
EDIT (cant reply due to low rep, just joined today)
I tried to extend the bitset, however it gave the value of the first byte only
It will not work because the fileBuf is 99% byte array, extending from 8bit to 32bit(int) wont make any difference because its still a byte array which is 8bit. You have to mathematicly calculate the value from 4 array elements into original integer representation. see code above edit

The answer isnt "Wrong" this is a logic error. Youre not storing the values and adding the computation
C8 is 200 in decimal form, so youre not appending the value to the original subset.
The answer it spit it out, was infact what you programmed it to do.
You need to either extend the bitset to a larger amount to append the other hex numbers or provide some other means of outputting

Keeping the format of the function from the question, you could do:
//little-endian
{
int i = (fileBuf[j]<<0) | (fileBuf[j+1]<<8) | (fileBuf[j+2]<<16) | (fileBuf[j+3]<<24);
cout << dec << i;
}
// big-endian
{
int i = (fileBuf[j+3]<<0) | (fileBuf[j+2]<<8) | (fileBuf[j+1]<<16) | (fileBuf[j]<<24);
cout << dec << i;
}

Related

How to understand MNIST Binary converter in c++?

I've recently needed to convert mnist data-set to images and labels, it is binary and the structure is in the previous link, so i did a little research and as I'm fan of c++ ,I've read the I/O binary in c++,after that I've found this link in stack. That link works well but no code commenting and no explanation of algorithm so I've get confused and that raise some question in my mind which i need a professional c++ programmer to ask.
1-What is the algorithm to convert the data-set in c++ with help of ifstream?
I've realized to read a file as a binary with file.read and move to the next record, but in C , we define a struct and move it inside the file but i can't see any struct in c++ program for example to read this:
[offset] [type] [value] [description]
0000 32 bit integer 0x00000803(2051) magic number
0004 32 bit integer 60000 number of images
0008 32 bit integer 28 number of rows
0012 32 bit integer 28 number of columns
0016 unsigned byte ?? pixel
How can we go to the specific offset for example 0004 and read for example 32 bit integer and put it to an integer variable.
2-What the function reverseInt is doing? (It is not obviously doing simple reversing an integer)
int ReverseInt (int i)
{
unsigned char ch1, ch2, ch3, ch4;
ch1 = i & 255;
ch2 = (i >> 8) & 255;
ch3 = (i >> 16) & 255;
ch4 = (i >> 24) & 255;
return((int) ch1 << 24) + ((int)ch2 << 16) + ((int)ch3 << 8) + ch4;
}
I've did a little debugging with cout and when it revised for example 270991360 it return 10000 , which i cannot find any relation, I understand it AND the number multiples with two with 255 but why?
PS :
1-I already have the MNIST converted images but i want to understand the algorithm.
2-I've already unzip the gz files so the file is pure binary.
1-What is the algorithm to convert the data-set in c++ with help of ifstream?
This function read a file (t10k-images-idx3-ubyte.gz) as follow:
Read a magic number and adjust endianness
Read number of images and adjust endianness
Read number rows and adjust endianness
Read number of columns and adjust endianness
Read all the given images x rows x columns characters (but loose them).
The function use normal int and always switch endianness, that means it target a very specific architecture and is not portable.
How can we go to the specific offset for example 0004 and read for example 32 bit integer and put it to an integer variable.
ifstream provides a function to seek to a given position:
file.seekg( posInBytes, std::ios_base::beg);
At the given position, you could read the 32-bit integer:
int32_t val;
file.read ((char*)&val,sizeof(int32_t));
2- What the function reverseInt is doing?
This function reverse order of the bytes of an int value:
Considering an integer of 32bit like aaaaaaaabbbbbbbbccccccccdddddddd, it return the integer ddddddddccccccccbbbbbbbbaaaaaaaa.
This is useful for normalizing endianness, however, it is probably not very portable, as int might not be 32bit (but e.g. 16bit or 64bit)

Combining 2 Hex Values Into 1 Hex Value

I have a coordinate pair of values that each range from [0,15]. For now I can use an unsigned, however since 16 x 16 = 256 total possible coordinate locations, this also represents all the binary and hex values of 1 byte. So to keep memory compact I'm starting to prefer the idea of using a BYTE or an unsigned char. What I want to do with this coordinate pair is this:
Let's say we have a coordinate pair with the hex value [0x05,0x0C], I would like the final value to be 0x5C. I would also like to do the reverse as well, but I think I've already found an answer with a solution to the reverse. I was thinking on the lines of using & or | however, I'm missing something for I'm not getting the correct values.
However as I was typing this and looking at the reverse of this: this is what I came up with and it appears to be working.
byte a = 0x04;
byte b = 0x0C;
byte c = (a << 4) | b;
std::cout << +c;
And the value that is printing is 76; which converted to hex is 0x4C.
Since I have figured out the calculation for this, is there a more efficient way?
EDIT
After doing some testing the operation to combine the initial two is giving me the correct value, however when I'm doing the reverse operation as such:
byte example = c;
byte nibble1 = 0x0F & example;
byte nibble2 = (0xF0 & example) >> 4;
std::cout << +nibble1 << " " << +nibble2 << std::endl;
It is printout 12 4. Is this correct or should this be a concern? If worst comes to worst I can rename the values to indicate which coordinate value they are.
EDIT
After thinking about this for a little bit and from some of the suggestions I had to modify the reverse operation to this:
byte example = c;
byte nibble1 = (0xF0 & example) >> 4;
byte nibble2 = (0x0F & example);
std:cout << +nibble1 << " " << +nibble2 << std::endl;
And this prints out 4 12 which is the correct order of what I am looking for!
First of all, be careful about there are in fact 17 values in the range 0..16. Your values are probably 0..15, because if they actually range both from 0 to 16, you won't be able to uniquely store every possible coordinate pair into a single byte.
The code extract you submitted is pretty efficient, you are using bit operators, which are the quickest thing you can ask a processor to do.
For the "reverse" (splitting your byte into two 4-bit values), you are right when thinking about using &. Just apply a 4-bit shift at the right time.

WebSockets handshake reply from server side, draft 00

I am writing a small WebSocket server application that should support both draft 17 and older variations such as draft 00. I didn't have any problems with the newest draft, but I cannot make the draft 00 client happy.
For testing purposes I used the example provided in the official (old) draft 00 docuemnt, page 7:
Sec-WebSocket-Key1: 18x 6]8vM;54 *(5: { U1]8 z [ 8
Sec-WebSocket-Key2: 1_ tx7X d < nw 334J702) 7]o}` 0
Tm[K T2u
When calculating the keys by concatenating the digits and dividing by spaces count, I get the following two integers: 155712099 and 173347027 (the document has these two numbers as well).
Next, it says to:
Convert them individually to Big Endian
Concatenate the result in a string and append the last eight bits (Tm[K T2u).
Create a 128 bits MD5 sum from the string produced in step 1 and 2.
Armed with this knowledge I've produced the following code:
#define BYTE 8
#define WORD 16
// Little Endian to Big Endian short
#define LE_TO_BE_SHORT(SHORT)\
(((SHORT >> BYTE) & 0x00FF) | ((SHORT << BYTE) & 0xFF00))
// Little Endian to Big Endian long
#define LE_TO_BE_LONG(LONG)\
(((LE_TO_BE_SHORT(LONG >> WORD)) | \
((LE_TO_BE_SHORT((LONG & 0xFFFF)) << WORD))))
uint num1 = LE_TO_BE_LONG(155712099);
uint num2 = LE_TO_BE_LONG(173347027);
QString cookie = QString::fromUtf8("Tm[K T2u");
QString c = QString::number(num1) + QString::number(num2) + cookie;
QByteArray data = c.toUtf8();
qDebug() << QCryptographicHash::hash(data, QCryptographicHash::Md5);
Here's what I get:
←→»α√r¼??┐☺║Pa♠µ
And here's what's expected (again, based on the draft example)
fQJ,fN/4F4!~K~MH
On the other side, I've noticed that the wikipedia article does not mention anything about Endian conversion. I tried the above code without conversion (both wikipedia example and the example from the draft) and still cannot reproduce the expected result.
Anyone can point out what is the problem here?
EDIT:
I found this document has a better explanation of the protocol. It is a different draft (76) but is similar to 00 in terms of handshake.
Here is the calculation in the C implementation of websockify. I know that works so you might be able to use that as reference.
Finally with the help of fresh eyes from my colleagues, I figured out what I was doing wrong. Basically I was literally concatenating the two integers into a string. Instead I needed to concatenate the bytes:
uint num1 = LE_TO_BE_LONG(155712099); // macros definition can
uint num2 = LE_TO_BE_LONG(173347027); // be found in the question
QString cookie = QString::fromUtf8("Tm[K T2u");
QByteArray array;
array = QByteArray((const char*)&num1, sizeof(int));
array += QByteArray((const char*)&num2, sizeof(int));
array += QByteArray(cookie.toStdString().data(), cookie.length());
qDebug() << QCryptographicHash::hash(array, QCryptographicHash::Md5);
Make sure that you don't use the overloaded constructor that does not take the size because Qt will crate a slightly larger array padded with garbage. At least that was in my case.

logical operations between chunks of memory?

I want to or two big chunks of memory... but it doesn't work
Consider I have three char * bm, bm_old, and bm_res.
#define to_uint64(buffer,n) {(uint64_t)buffer[n] << 56 | (uint64_t)buffer[n+1] << 48 | (uint64_t)buffer[n+2] << 40 | (uint64_t)buffer[n+3] << 32 | (uint64_t) buffer[n+4] << 24 | (uint64_t)buffer[n+5] << 16 | (uint64_t)buffer[n+6] << 8 | (uint64_t)buffer[n+7];}
...
for (unsigned int i=0; i<bitmapsize(size)/8; i++){
uint64_t or_res = (to_uint64(bm_old,i*8)) | (to_uint64(bm,i*8));
memcpy(bm_res+i*sizeof(uint64_t), &or_res, sizeof(uint64_t));
}
bm_res is not correct!
Have any clue?
Thanks,
Amir.
Enclose the definition of to_uint64 in parentheses () instead of braces {} and get rid of the semicolon at the end. Using #define creates a macro whose text is inserted verbatim wherever it's used, not an actual function, so you were attempting to |-together two blocks rather than those blocks' "return values."
I think you need to advance your output pointer by the correct size:
memcpy(bm_res + i * sizeof(uint64_t), &or_res, sizeof(uint64_t));
^^^^^^^^^^^^^^^^^^^^
Since bm_res is a char-pointer, + 1 advances by just one byte.
You're incrementing bm_res by one for every eight-byte block you move. Further, you never increment bm or bm_old at all. So you're basically tiling the first byte of or_res over bm_res, which is probably not what you want.
More importantly, your code is byte-order sensitive - whether or_res is represented in memory as least-order-byte first or highest-order-byte first matters.
I would recommend you just do a byte-by-byte or first, and only try to optimize it if that is too slow. When you do optimize it, don't use your crazy to_uint64 macro there - it'll be slower than just going byte-by-byte. Instead, cast to uint64_t * directly. While this is, strictly speaking, undefined behavior, it works on every platform I've ever seen, and should be byteorder agnostic.

Spitting a char array into a sequence of ints and floats

I'm writing a program in C++ to listen to a stream of tcp messages from another program to give tracking data from a webcam. I have the socket connected and I'm getting all the information in but having difficulty splitting it up into the data I want.
Here's the format of the data coming in:
8 byte header:
4 character string,
integer
32 byte message:
integer,
float,
float,
float,
float,
float
This is all being stuck into a char array called buffer. I need to be able to parse out the different bytes into the primitives I need. I have tried making smaller sub arrays such as headerString that was filled by looping through and copying the first 4 elements of the buffer array and I do get the the correct hear ('CCV ') printed out. But when I try the same thing with the next for elements (to get the integer) and try to print it out I get weird ascii characters being printed out. I've tried converting the headerInt array to an integer with the atoi method from stdlib.h but it always prints out zero.
I've already done this in python using the excellent unpack method, is their any alternative in C++?
Any help greatly appreciated,
Jordan
Links
CCV packet structure
Python unpack method
The buffer only contains the raw image of what you read over the
network. You'll have to convert the bytes in the buffer to whatever
format you want. The string is easy:
std::string s(buffer + sOffset, 4);
(Assuming, of course, that the internal character encoding is the same
as in the file—probably an extension of ASCII.)
The others are more complicated, and depend on the format of the
external data. From the description of the header, I gather than the
integers are four bytes, but that still doesn't tell me anything about
their representation. Depending on the case, either:
int getInt(unsigned char* buffer, int offset)
{
return (buffer[offset ] << 24)
| (buffer[offset + 1] << 16)
| (buffer[offset + 2] << 8)
| (buffer[offset + 3] );
}
or
int getInt(unsigned char* buffer, int offset)
{
return (buffer[offset + 3] << 24)
| (buffer[offset + 2] << 16)
| (buffer[offset + 1] << 8)
| (buffer[offset ] );
}
will probably do the trick. (Other four byte representations of
integers are possible, but they are exceedingly rare. Similarly, the
conversion of the unsigned results of the shifts and or's into a int
is implementation defined, but in practice, the above will work almost
everywhere.)
The only hint you give concerning the representation of the floats is in
the message format: 32 bytes, minus a 4 byte integer, leave 28 bytes for
5 floats; but 28 doesn't go into five, so I cannot even guess as to the
length of the floats (except that there must be some padding in there
somewhere). But converting floating point can be more or less
complicated if the external format isn't exactly like the internal
format.
Something like this may work:
struct {
char string[4];
int integers[2];
float floats[5];
} Header;
Header* header = (Header*)buffer;
You should check that sizeof(Header) == 32.