Big Endian and Little Endian for Files in C++

Big Endian and Little Endian for Files in C++ - c++

I am trying to write some processor independent code to write some files in big endian. I have a sample of code below and I can't understand why it doesn't work. All it is supposed to do is let byte store each byte of data one by one in big endian order. In my actual program I would then write the individual byte out to a file, so I get the same byte order in the file regardless of processor architecture.
#include <iostream>
int main (int argc, char * const argv[]) {
long data = 0x12345678;
long bitmask = (0xFF << (sizeof(long) - 1) * 8);
char byte = 0;
for(long i = 0; i < sizeof(long); i++) {
byte = data & bitmask;
data <<= 8;
}
return 0;
}
For some reason byte always has the value of 0. This confuses me, I am looking at the debugger and see this:
data = 00010010001101000101011001111000
bitmask = 11111111000000000000000000000000
I would think that data & mask would give 00010010, but it just makes byte 00000000 every time! How can his be? I have written some code for the little endian order and this works great, see below:
#include <iostream>
int main (int argc, char * const argv[]) {
long data = 0x12345678;
long bitmask = 0xFF;
char byte = 0;
for(long i = 0; i < sizeof(long); i++) {
byte = data & bitmask;
data >>= 8;
}
return 0;
}
Why does the little endian one work and the big endian not? Thanks for any help :-)

You should use the standard functions ntohl() and kin for this. They operate on explicit sized variables (i.e. uint16_t and uin32_t) rather than compiler-specific long, which necessary for portability.
Some platforms provide 64-bit versions in <endian.h>

In your example, data is 0x12345678.
Your first assignment to byte is therefore:
byte = 0x12000000;
which won't fit in a byte, so it gets truncated to zero.
try:
byte = (data & bitmask) >> (sizeof(long) - 1) * 8);

You're getting the shifting all wrong.
#include <iostream>
int main (int argc, char * const argv[]) {
long data = 0x12345678;
int shift = (sizeof(long) - 1) * 8
const unsigned long mask = 0xff;
char byte = 0;
for (long i = 0; i < sizeof(long); i++, shift -= 8) {
byte = (data & (mask << shift)) >> shift;
}
return 0;
}
Now, I wouldn't recommend you do things this way. I would recommend instead writing some nice conversion functions. Many compilers have these as builtins. So you can write your functions to do it the hard way, then switch them to just forward to the compiler builtin when you figure out what it is.
#include <tr1/cstdint> // To get uint16_t, uint32_t and so on.
inline uint16_t to_bigendian(uint16_t val, char bytes[2])
{
bytes[0] = (val >> 8) & 0xffu;
bytes[1] = val & 0xffu;
}
inline uint32_t to_bigendian(uint32_t val, char bytes[4])
{
bytes[0] = (val >> 24) & 0xffu;
bytes[1] = (val >> 16) & 0xffu;
bytes[2] = (val >> 8) & 0xffu;
bytes[3] = val & 0xffu;
}
This code is simpler and easier to understand than your loop. It's also faster. And lastly, it is recognized by some compilers and automatically turned into the single byte swap operation that would be required on most CPUs.

because you are masking off the top byte from an integer and then not shifting it back down 24 bits ...
Change your loop to:
for(long i = 0; i < sizeof(long); i++) {
byte = (data & bitmask) >> 24;
data <<= 8;
}

Related

Why are there two variants of the implementation of CRCs in software

I'm digging into the subtleties of CRCs. If I understand correctly, every CRC polynomial is provided in at least two representations, the normal one and the reversed one.
The normal one targets implementations where the content is processed from most signifiant bit to least significant bit and switched to the left (like for example in this wikipedia page).
The reversed one aims to handle LSb to MSb interfaces. If you process LSb to MSb with the reversed polynomial and switching to the right you get the same CRC value (also encoded LSb to MSb). This is described for example here. This is convenient for LSb to MSb communication interfaces.
What I don't understand is when you switch to software implementations. Why are there two variants of a software ie. byte implementation? (One for MSb to LSb, and one for the opposite bit order.)

You do not get the "same CRC value" (reflected or not) with the reflected calculation. It is an entirely different value, because the bits of the message are processed in the opposite order.
"when you switch": You simply use the CRC definition, reflected or not, that matches what the application is expecting. Whether the CRC is reflected is one of several parameters that define the CRC, along with the number of the bits in the CRC, the polynomial, the initial value, and the final exclusive or value. You can find the definition of over a hundred different CRCs here.
"why are there two": The forward implementation exists because that corresponds most closely to the mathematics, with the least significant term of the polynomial in the least significant bit of the binary representation of the polynomial. The reflected implementation exists because it was realized that it could be implemented in software a little more simply, with fewer instructions, but still have the same error-detection performance.
Here is an example for two common 32-bit CRCs with the same polynomial. Forward, CRC-32/BZIP bit-wise implementation:
uint32_t crc32bzip2_bit(uint32_t crc, void const *mem, size_t len) {
unsigned char const *data = mem;
if (data == NULL)
return 0;
crc = ~crc;
for (size_t i = 0; i < len; i++) {
crc ^= (uint32_t)data[i] << 24;
for (unsigned k = 0; k < 8; k++) {
crc = crc & 0x80000000 ? (crc << 1) ^ 0x4c11db7 : crc << 1;
}
}
crc = ~crc;
return crc;
}
Reflected CRC-32/ZIP bit-wise:
uint32_t crc32iso_hdlc_bit(uint32_t crc, void const *mem, size_t len) {
unsigned char const *data = mem;
if (data == NULL)
return 0;
crc = ~crc;
for (size_t i = 0; i < len; i++) {
crc ^= data[i];
for (unsigned k = 0; k < 8; k++) {
crc = crc & 1 ? (crc >> 1) ^ 0xedb88320 : crc >> 1;
}
}
crc = ~crc;
return crc;
}
The main savings is one instruction, the shift up of the data byte, that you can get rid of with the reflected implementation. Also the constant that you & with (1 vs. 0x80000000) is smaller, which may also save an instruction or a register, or perhaps just result in a shorter instruction, depending on the size of immediate values supported in the instruction set.
The shift is avoided for byte-wise calculations as well:
uint32_t crc32bzip2_byte(uint32_t crc, void const *mem, size_t len) {
unsigned char const *data = mem;
if (data == NULL)
return 0;
for (size_t i = 0; i < len; i++) {
crc = (crc << 8) ^
table_byte[((crc >> 24) ^ data[i]) & 0xff];
}
return crc;
}
vs.
uint32_t crc32iso_hdlc_byte(uint32_t crc, void const *mem, size_t len) {
unsigned char const *data = mem;
if (data == NULL)
return 0;
for (size_t i = 0; i < len; i++) {
crc = (crc >> 8) ^
table_byte[(crc ^ data[i]) & 0xff];
}
return crc;
}

c++: how to put relevant bits from uint32 into uint8?

I have a uint32 that I've flagged some bits on:
uint32 i = 0;
i |= (1 << 0);
i |= (1 << 5);
i |= (1 << 13);
i |= (1 << 19);
...
I want to convert it to a uint8 (by getting the state of its first 8 bits and disregarding the rest). Obviously I could do this:
uint8 j = 0;
for (int q = 0; q < 8; q++)
{
if (i & (1 << q))
{
j |= (1 << q);
}
}
But is there a fancy bitwise operation I can use to transfer the bits over in one fell swoop, without a loop?

You can achieve the same result by simply assigning the uint32 value to uint8.
int main()
{
unsigned int i = 0x00000888;
unsigned char j = i;
cout<<hex<<i<<endl;
cout<<hex<<+j<<endl;
return 0;
}
output:
888
88

Why not just mask those last 8 bits instead of running a loop over to see if individual bits are set?
const unsigned char bitMask = 0xFF;
j = (i & bitMask);
Note that C++ 14 though allows you to define binary literals right away
const unsigned char bitMask = 0b1111'1111;
The above is all you need. Just in case, if you need to get the subsequent byte positions, use the same mask 0xFF and make sure to right shift back the result to get the desired byte value.

Bitwise operator to calculate checksum

Am trying to come up with a C/C++ function to calculate the checksum of a given array of hex values.
char *hex = "3133455D332015550F23315D";
For e.g., the above buffer has 12 bytes and then last byte is the checksum.
Now what needs to done is, convert the 1st 11 individual bytes to decimal and then take there sum.
i.e., 31 = 49,
33 = 51,.....
So 49 + 51 + .....................
And then convert this decimal value to Hex. And then take the LSB of that hex value and convert that to binary.
Now take the 2's complement of this binary value and convert that to hex. At this step, the hex value should be equal to 12th byte.
But the above buffer is just an example and so it may not be correct.
So there're multiple steps involved in this.
Am looking for an easy way to do this using bitwise operators.
I did something like this, but it seems to take the 1st 2 bytes and doesn't give me the right answer.
int checksum (char * buffer, int size){
int value = 0;
unsigned short tempChecksum = 0;
int checkSum = 0;
for (int index = 0; index < size - 1; index++) {
value = (buffer[index] << 8) | (buffer[index]);
tempChecksum += (unsigned short) (value & 0xFFFF);
}
checkSum = (~(tempChecksum & 0xFFFF) + 1) & 0xFFFF;
}
I couldn't get this logic to work. I don't have enough embedded programming behind me to understand the bitwise operators. Any help is welcome.
ANSWER
I got this working with below changes.
for (int index = 0; index < size - 1; index++) {
value = buffer[index];
tempChecksum += (unsigned short) (value & 0xFFFF);
}
checkSum = (~(tempChecksum & 0xFF) + 1) & 0xFF;

Using addition to obtain a checksum is at least weird. Common checksums use bitwise xor or full crc. But assuming it is really what you need, it can be done easily with unsigned char operations:
#include <stdio.h>
char checksum(const char *hex, int n) {
unsigned char ck = 0;
for (int i=0; i<n; i+=1) {
unsigned val;
int cr = sscanf(hex + 2 * i, "%2x", &val); // convert 2 hexa chars to a byte value
if (cr == 1) ck += val;
}
return ck;
}
int main() {
char hex[] = "3133455D332015550F23315D";
char ck = checksum(hex, 11);
printf("%2x", (unsigned) (unsigned char) ck);
return 0;
}
As the operation are made on an unsigned char everything exceeding a byte value is properly discarded and you obtain your value (26 in your example).

CRC24Q implementation

I am trying to implement the algorithm of a CRC check, which basically created a value, based on an input message.
So, consider I have a hex message 3F214365876616AB15387D5D59, and I want to obtain the CRC24Q value of the message.
The algorithm that I found to do this is the following:
typedef unsigned long crc24;
crc24 crc_check(unsigned char *input) {
unsigned char *octets;
crc24 crc = 0xb704ce; // CRC24_INIT;
int i;
int len = strlen(input);
octets = input;
while (len--) {
crc ^= ((*octets++) << 16);
for (i = 0; i < 8; i++) {
crc <<= 1;
if (crc & 0x1000000)
crc ^= CRC24_POLY;
}
}
return crc & 0xFFFFFF;
}
where *input=3F214365876616AB15387D5D59.
The problem is that ((*octets++) << 16) will shift by 16 bits the ascii value of the hex character and not the character itself.
So, I made a function to convert the hex numbers to characters.
I know the implementation looks weird, and I wouldn't be surprised if it were wrong.
This is the convert function:
char* convert(unsigned char* message) {
unsigned char* input;
input = message;
int p;
char *xxxx[20];
xxxx[0]="";
for (p = 0; p < length(message) - 1; p = p + 2) {
char* pp[20];
pp[0] = input[0];
char *c[20];
*input++;
c[0]= input[0];
*input++;
strcat(pp,c);
char cc;
char tt[2];
cc = (char ) strtol(pp, &pp, 16);
tt[0]=cc;
strcat(xxxx,tt);
}
return xxxx;
}
SO:
unsigned char *msg_hex="3F214365876616AB15387D5D59";
crc_sum = crc_check(convert((msg_hex)));
printf("CRC-sum: %x\n", crc_sum);
Thank you very much for any suggestions.

Shouldn't the if (crc & 0x8000000) be if (crc & 0x1000000) otherwise you're testing the 28th bit not the 25th for 24-bit overflow

Integer Byte Swapping in C++

I'm working on a homework assignment for my C++ class. The question I am working on reads as follows:
Write a function that takes an unsigned short int (2 bytes) and swaps the bytes. For example, if the x = 258 ( 00000001 00000010 ) after the swap, x will be 513 ( 00000010 00000001 ).
Here is my code so far:
#include <iostream>
using namespace std;
unsigned short int ByteSwap(unsigned short int *x);
int main()
{
unsigned short int x = 258;
ByteSwap(&x);
cout << endl << x << endl;
system("pause");
return 0;
}
and
unsigned short int ByteSwap(unsigned short int *x)
{
long s;
long byte1[8], byte2[8];
for (int i = 0; i < 16; i++)
{
s = (*x >> i)%2;
if(i < 8)
{
byte1[i] = s;
cout << byte1[i];
}
if(i == 8)
cout << " ";
if(i >= 8)
{
byte2[i-8] = s;
cout << byte2[i];
}
}
//Here I need to swap the two bytes
return *x;
}
My code has two problems I am hoping you can help me solve.
For some reason both of my bytes are 01000000
I really am not sure how I would swap the bytes. My teachers notes on bit manipulation are very broken and hard to follow and do not make much sense me.
Thank you very much in advance. I truly appreciate you helping me.

New in C++23:
The standard library now has a function that provides exactly this facility:
#include <iostream>
#include <bit>
int main() {
unsigned short x = 258;
x = std::byteswap(x);
std::cout << x << endl;
}
Original Answer:
I think you're overcomplicating it, if we assume a short consists of 2 bytes (16 bits), all you need
to do is
extract the high byte hibyte = (x & 0xff00) >> 8;
extract the low byte lobyte = (x & 0xff);
combine them in the reverse order x = lobyte << 8 | hibyte;

It looks like you are trying to swap them a single bit at a time. That's a bit... crazy. What you need to do is isolate the 2 bytes and then just do some shifting. Let's break it down:
uint16_t x = 258;
uint16_t hi = (x & 0xff00); // isolate the upper byte with the AND operator
uint16_t lo = (x & 0xff); // isolate the lower byte with the AND operator
Now you just need to recombine them in the opposite order:
uint16_t y = (lo << 8); // shift the lower byte to the high position and assign it to y
y |= (hi >> 8); // OR in the upper half, into the low position
Of course this can be done in less steps. For example:
uint16_t y = (lo << 8) | (hi >> 8);
Or to swap without using any temporary variables:
uint16_t y = ((x & 0xff) << 8) | ((x & 0xff00) >> 8);

You're making hard work of that.
You only neeed exchange the bytes. So work out how to extract the two byte values, then how to re-assemble them the other way around
(homework so no full answer given)
EDIT: Not sure why I bothered :) Usefulness of an answer to a homework question is measured by how much the OP (and maybe other readers) learn, which isn't maximized by giving the answer to the homewortk question directly...

Here is an unrolled example to demonstrate byte by byte:
unsigned int swap_bytes(unsigned int original_value)
{
unsigned int new_value = 0; // Start with a known value.
unsigned int byte; // Temporary variable.
// Copy the lowest order byte from the original to
// the new value:
byte = original_value & 0xFF; // Keep only the lowest byte from original value.
new_value = new_value * 0x100; // Shift one byte left to make room for a new byte.
new_value |= byte; // Put the byte, from original, into new value.
// For the next byte, shift the original value by one byte
// and repeat the process:
original_value = original_value >> 8; // 8 bits per byte.
byte = original_value & 0xFF; // Keep only the lowest byte from original value.
new_value = new_value * 0x100; // Shift one byte left to make room for a new byte.
new_value |= byte; // Put the byte, from original, into new value.
//...
return new_value;
}

Ugly implementation of Jerry's suggestion to treat the short as an array of two bytes:
#include <iostream>
typedef union mini
{
unsigned char b[2];
short s;
} micro;
int main()
{
micro x;
x.s = 258;
unsigned char tmp = x.b[0];
x.b[0] = x.b[1];
x.b[1] = tmp;
std::cout << x.s << std::endl;
}

Using library functions, the following code may be useful (in a non-homework context):
unsigned long swap_bytes_with_value_size(unsigned long value, unsigned int value_size) {
switch (value_size) {
case sizeof(char):
return value;
case sizeof(short):
return _byteswap_ushort(static_cast<unsigned short>(value));
case sizeof(int):
return _byteswap_ulong(value);
case sizeof(long long):
return static_cast<unsigned long>(_byteswap_uint64(value));
default:
printf("Invalid value size");
return 0;
}
}
The byte swapping functions are defined in stdlib.h at least when using the MinGW toolchain.

#include <stdio.h>
int main()
{
unsigned short a = 258;
a = (a>>8)|((a&0xff)<<8);
printf("%d",a);
}

While you can do this with bit manipulation, you can also do without, if you prefer. Either way, you shouldn't need any loops though. To do it without bit manipulation, you'd view the short as an array of two chars, and swap the two chars, in roughly the same way as you would swap two items while (for example) sorting an array.
To do it with bit manipulation, the swapped version is basically the lower byte shifted left 8 bits ord with the upper half shifted left 8 bits. You'll probably want to treat it as an unsigned type though, to ensure the upper half doesn't get filled with one bits when you do the right shift.

This should also work for you.
#include <iostream>
int main() {
unsigned int i = 0xCCFF;
std::cout << std::hex << i << std::endl;
i = ( ((i<<8) & 0xFFFF) | ((i >>8) & 0xFFFF)); // swaps the bytes
std::cout << std::hex << i << std::endl;
}

A bit old fashioned, but still a good bit of fun.
XOR swap: ( see How does XOR variable swapping work? )
#include <iostream>
#include <stdint.h>
int main()
{
uint16_t x = 0x1234;
uint8_t *a = reinterpret_cast<uint8_t*>(&x);
std::cout << std::hex << x << std::endl;
*(a+0) ^= *(a+1) ^= *(a+0) ^= *(a+1);
std::cout << std::hex << x << std::endl;
}

This is a problem:
byte2[i-8] = s;
cout << byte2[i];//<--should be i-8 as well
This is causing a buffer overrun.
However, that's not a great way to do it. Look into the bit shift operators << and >>.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Big Endian and Little Endian for Files in C++ - c++

You should use the standard functions ntohl() and kin for this. They operate on explicit sized variables (i.e. uint16_t and uin32_t) rather than compiler-specific long, which necessary for portability. Some platforms provide 64-bit versions in <endian.h>

In your example, data is 0x12345678. Your first assignment to byte is therefore: byte = 0x12000000; which won't fit in a byte, so it gets truncated to zero. try: byte = (data & bitmask) >> (sizeof(long) - 1) * 8);

because you are masking off the top byte from an integer and then not shifting it back down 24 bits ... Change your loop to: for(long i = 0; i < sizeof(long); i++) { byte = (data & bitmask) >> 24; data <<= 8; }

Related

Why are there two variants of the implementation of CRCs in software

c++: how to put relevant bits from uint32 into uint8?

Bitwise operator to calculate checksum

CRC24Q implementation

Integer Byte Swapping in C++

Categories

Resources