Weird hexadecimal format - c++

Following hexdump shows some data made by device i have on my hands. It stores year, month, day, hour, minute, seconds, and lenght in weird way for me (4 bytes marks for single digit in reverse order).
de 07 00 00 01 00 00 00 16 00 00 00 10 00 00 00
24 00 00 00 1d 00 00 00 15 00 00 00 X X X X
For example:
Year is marked as "000007de" aka 0x07de (=2014). Now; problem i am having is how to properly handle this in c/c++. (first 4 bytes)
How do i read those 4 bytes with "reverse" order to make proper hexadecimal for handling afterwards with like ints/longs?

If you read the value as int on the same architecture it has been generated with then you don't need to do anything, as this is the natural format for your system.
You only need to do something about this if you want to read it on a different architecture, with a different binary format.
So you can read it simply with
int32_t n;
fread(&n, sizeof int32_t, 1, FILE);
Of course the file has to be opened in binary mode and you need a 32 bit int.

If you read it in the reverse order, you can then change the endianness with something like:
uint32_t before = 0xde070000;
uint32_t after = ((before<<24) & 0xff000000) |
((before<<8) & 0xff0000) |
((before>>8) & 0xff00) |
((before>>24) & 0xff);
Edit: as pointed out in comments, this is only defined for unsigned 32-bits conversions.

Related

Reordering bit-fields mysteriously changes size of struct

For some reason I have a struct that needs to keep track of 56 bits of information ordered as 4 packs of 12 bits and 2 packs of 4 bits. This comes out to 7 bytes of information total.
I tried a bit field like so
struct foo {
uint16_t R : 12;
uint16_t G : 12;
uint16_t B : 12;
uint16_t A : 12;
uint8_t X : 4;
uint8_t Y : 4;
};
and was surprised to see sizeof(foo) evaluate to 10 on my machine (a linux x86_64 box) with g++ version 12.1. I tried reordering the fields like so
struct foo2 {
uint8_t X : 4;
uint16_t R : 12;
uint16_t G : 12;
uint16_t B : 12;
uint16_t A : 12;
uint8_t Y : 4;
};
and was surprised that the size now 8 bytes, which is what I originally expected. It's the same size as the structure I expected the first solution to effectively produce:
struct baseline {
uint16_t first;
uint16_t second;
uint16_t third;
uint8_t single;
};
I am aware of size and alignment and structure packing, but I am really stumped as to why the first ordering adds 2 extra bytes. There is no reason to add more than one byte of padding since the 56 bits I requested can be contained exactly by 7 bytes.
Minimal Working Example Try it on Wandbox
What am I missing?
PS: none of this changes if we change uint8_t to uint16_t
If we create an instance of struct foo, zero it out, set all bits in a field, and print the bytes, and do this for each field, we see the following:
R: ff 0f 00 00 00 00 00 00 00 00
G: 00 00 ff 0f 00 00 00 00 00 00
B: 00 00 00 00 ff 0f 00 00 00 00
A: 00 00 00 00 00 00 ff 0f 00 00
X: 00 00 00 00 00 00 00 f0 00 00
Y: 00 00 00 00 00 00 00 00 0f 00
So what appears to be happening is that each 12 bit field is starting in a new 16 bit storage unit. Then the first 4 bit field fills out the remaining bits in the prior 16 bit unit, then the last field takes up 4 bits in the last unit. This occupies 9 bites And since the largest field, in this case a bitfield storage unit, is 2 bytes wide, one byte of padding is added at the end.
So it appears that is 12 bit field, which has a 16 bit base type, is kept within a single 16 bit storage unit instead of being split between multiple storage units.
If we do the same for the modified struct:
X: 0f 00 00 00 00 00 00 00
R: f0 ff 00 00 00 00 00 00
G: 00 00 ff 0f 00 00 00 00
B: 00 00 00 00 ff 0f 00 00
A: 00 00 00 00 00 00 ff 0f
Y: 00 00 00 00 00 00 00 f0
We see that X takes up 4 bits of the first 16 bit storage unit, then R takes up the remaining 12 bits. The rest of the fields fill out as before. This results in 8 bytes being used, and so requires no additional padding.
While the exact details of the ordering of bitfields is implementation defined, the C standard does set a few rules.
From section 6.7.2.1p11:
An implementation may allocate any addressable storage unit large
enough to hold a bit- field. If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The order of
allocation of bit-fields within a unit (high-order to low-order or
low-order to high-order) is implementation-defined. The alignment of
the addressable storage unit is unspecified.
And 6.7.2.1p15:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared.

Disable alignment on a 64-bit structure

I'm trying to align my structure and make it as small as possible using bit fields. I have to send this data back to a client, which will examine the fields to set a few data members.
The size of the structure is indeed the same, but when I set members it does not work at all.
Here's some example code:
#pragma pack(push, 1)
struct PW_INFO
{
char hash[16]; //Does not matter
uint32_t number; //Does not matter
uint32_t salt_id : 30; //Position: 0 bits
uint32_t enc_level : 7; //Position: 30 bits
uint32_t delta : 27; //Position: 37 bits
}; //Total size: 28 bytes
#pragma pack(pop)
void int64shrl(uint64_t& base, uint32_t to_shift, uint32_t position)
{
uint64_t res = static_cast<uint64_t>(to_shift);
res = Int64ShllMod32(res, position);
base |= res;
}
int32_t main()
{
std::cout << "Size of PW_INFO: " << sizeof(PW_INFO) << "\n"; //Returns 28 as expected (16 + sizeof(uint32_t) + 8)
PW_INFO pw = { "abc123", 0, 0, 0, 0 };
pw.enc_level = 105;
uint64_t base{ 0 };
&base; //debug purposes
int64shrl(base, 103, 30);
return 0;
}
Here's where it gets weird: setting the "salt_id" field (which is 30 bits into the bitfield) will yield the following result in memory:
0x003FFB8C 61 62 63 31 32 33 00 00 abc123..
0x003FFB94 00 00 00 00 00 00 00 00 ........
0x003FFB9C 00 00 00 00 00 00 00 00 ........
0x003FFBA4 69 00 00 00 i...
(Only the last 8 bytes are of concern since they represent the bit field.)
But, Int64ShllMod32 returns a correct result (the remote client undersands it perfectly):
0x003FFB7C 00 00 00 c0 19 00 00 00 ...À....
I'm guessing it has to do with alignment, if so how would I completely get rid of it? It seems even if the size is correct, it will try to align it (1 byte boundary as the #pragma directive suggests).
More information:
I use Visual Studio 2015 and its compiler.
I am not trying to write those in a different format, the reason I'm asking this is that I do NOT want to use my own format. They are reading from 64 bit bitfields everywhere, I don't have access to the source code but I see a lot of calls to Int64ShrlMod32 (from what I read, this is what the compiler produces when dealing with 8 byte structures).
The actual bitfield starts at "salt_id". 30 + 7 + 27 = 64 bits, I hope it is clearer now.

Bit reading puzzle (reading a binary file in C++)

I am trying to read the file 'train-images-idx3-ubyte', which can be found here along with the corresponding file format description (at the bottom of the webpage). When I look at the bytes with od -t x1 train-images-idx3-ubyte | less (hexadecimal, bytewise), I get the following output:
adress bytes
0000000 00 00 08 03 00 00 ea 60 00 00 00 1c 00 00 00 1c
0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
This is what I expected according to 1. But when I try to read the data with C++ I've got a problem. What I do is this:
std::fstream trainingData("minst/train-images-idx3-ubyte",
std::ios::in | std::ios::binary);
int8_t zero = 0, encoding = 0, dimension = 0;
int32_t samples = -1;
trainingData >> zero >> zero >> encoding >> dimension;
trainingData >> samples;
debugLogger << "training set image file, encoding = "
<< (int) encoding << ", dimension = "
<< (int) dimension << ", items = " << (int) samples << "\n";
But the output of these few lines of code is:
training set image file, encoding = 8, dimension = 3, items = 0
Everything but the number of instances (items, samples) is correct. I tried reading the next 4 bytes as int8_t and that gave me at least the same result as od. I cannot imagine how samples can be 0. What I actually wanted to read here was 10,000. Maybe you've got a clue?
As mentioned in other answers, you need to use unformatted input, i.e. istream::read(...) instead of operator>>. Translating your code above to use read yields:
trainingData.read(reinterpret_cast<char*>(&zero), sizeof(zero));
trainingData.read(reinterpret_cast<char*>(&zero), sizeof(zero));
trainingData.read(reinterpret_cast<char*>(&encoding), sizeof(encoding));
trainingData.read(reinterpret_cast<char*>(&dimension), sizeof(dimension));
trainingData.read(reinterpret_cast<char*>(&samples), sizeof(samples));
Which gets you most of the way there - but 00 00 ea 60 looks like it's in Big-endian format, so you'll have to pass it through ntohl to make sense of it if you're running on an intel-based machine:
samples = ntohl(samples);
which gives encoding = 8, dimension = 3, items = 60000.
The input is formatted, which will result in you reading wrong results from the file. Reading from an unformatted input will provide the correct results.

Accessing specific binary information based on binary format documentation

I have a binary file and documentation of the format the information is stored in. I'm trying to write a simple program using c++ that pulls a specific piece of information from the file but I'm missing something since the output isn't what I expect.
The documentation is as follows:
Half-word Field Name Type Units Range Precision
10 Block Divider INT*2 N/A -1 N/A
11-12 Latitude INT*4 Degrees -90 to +90 0.001
There are other items in the file obviously but for this case I'm just trying to get the Latitude value.
My code is:
#include <cstdlib>
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char* argv[])
{
char* dataFileLocation = "testfile.bin";
ifstream dataFile(dataFileLocation, ios::in | ios::binary);
if(dataFile.is_open())
{
char* buffer = new char[32768];
dataFile.seekg(10, ios::beg);
dataFile.read(buffer, 4);
dataFile.close();
cout << "value is << (int)(buffer[0] & 255);
}
}
The result of which is "value is 226" which is not in the allowed range.
I'm quite new to this and here's what my intentions where when writing the above code:
Open file in binary mode
Seek to the 11th byte from the start of the file
Read in 4 bytes from that point
Close the file
Output those 4 bytes as an integer.
If someone could point out where I'm going wrong I'd sure appreciate it. I don't really understand the (buffer[0] & 255) part (took that from some example code) so layman's terms for that would be greatly appreciated.
Hex Dump of the first 100 bytes:
testfile.bin 98,402 bytes 11/16/2011 9:01:52
-0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F
00000000- 00 5F 3B BF 00 00 C4 17 00 00 00 E2 2E E0 00 00 [._;.............]
00000001- 00 03 FF FF 00 00 94 70 FF FE 81 30 00 00 00 5F [.......p...0..._]
00000002- 00 02 00 00 00 00 00 00 3B BF 00 00 C4 17 3B BF [........;.....;.]
00000003- 00 00 C4 17 00 00 00 00 00 00 00 00 80 02 00 00 [................]
00000004- 00 05 00 0A 00 0F 00 14 00 19 00 1E 00 23 00 28 [.............#.(]
00000005- 00 2D 00 32 00 37 00 3C 00 41 00 46 00 00 00 00 [.-.2.7.<.A.F....]
00000006- 00 00 00 00 [.... ]
Since the documentation lists the field as an integer but shows the precision to be 0.001, I would assume that the actual value is the stored value multiplied by 0.001. The integer range would be -90000 to 90000.
The 4 bytes must be combined into a single integer. There are two ways to do this, big endian and little endian, and which you need depends on the machine that wrote the file. x86 PCs for example are little endian.
int little_endian = buffer[0] | buffer[1]<<8 | buffer[2]<<16 | buffer[3]<<24;
int big_endian = buffer[0]<<24 | buffer[1]<<16 | buffer[2]<<8 | buffer[3];
The &255 is used to remove the sign extension that occurs when you convert a signed char to a signed integer. Use unsigned char instead and you probably won't need it.
Edit: I think "half-word" refers to 2 bytes, so you'll need to skip 20 bytes instead of 10.

8-bit char to Hex representation

I'm trying to convert 8 bit char into hex view which looks like this:
00 03 80 45 E5 93 00 18 02 72 3B 90 88 64 11 00
45 FF 00 36 00 FF 45 00 00 34 7B FE 40 00 40 02
But some characters contain negative values which makes a larger hex value of more than 2 digits. how would i get each one as represented above?
I don't know what you are using for formatting, but make sure that you make your byte holding variable an unsigned char (assuming that char is 8-bits on your platform, which it is on all sane platforms), before formatting. If your platform has a sane BYTE typedef, use that. You can also use the boost::uint8_t type to store the byte and avoid these sorts of issues. For example:
char c=-25; // Oh no, this is one of those pesky "negative" characters
unsigned char byteVal=static_cast<unsigned char>(c); // FTFY
// Do the formatting with byteVal
"negative byte values" is an oxymoron, a byte is a number of bits without any sign typically an unsigned char which, when being 8 bits. can contain values 0-255 or in hex 00 to FF.