How to get values from unaligned memory in a standard way?

How to get values from unaligned memory in a standard way? - c++

I know C++11 has some standard facilities which would allow to get integral values from unaligned memory. How could something like this be written in a more standard way?
template <class R>
inline R get_unaligned_le(const unsigned char p[], const std::size_t s) {
R r = 0;
for (std::size_t i = 0; i < s; i++)
r |= (*p++ & 0xff) << (i * 8); // take the first 8-bits of the char
return r;
}
To take the values stored in litte-endian order, you can then write:
uint_least16_t value1 = get_unaligned_le<uint_least16_t > (&buffer[0], 2);
uint_least32_t value2 = get_unaligned_le<uint_least32_t > (&buffer[2], 4);

How did the integral values get into the unaligned memory to begin with?
If they were memcpyed in, then you can use memcpy to get them out.
If they were read from a file or the network, you have to know their
format: how they were written to begin with. If they are four byte
big-endian 2s complement (the usual network format), then something
like:
// Supposes native int is at least 32 bytes...
unsigned
getNetworkInt( unsigned char const* buffer )
{
return buffer[0] << 24
| buffer[1] << 16
| buffer[2] << 8
| buffer[3];
}
This will work for any unsigned type, provided the type you're aiming
for is at least as large as the type you input. For signed, it depends
on just how portable you want to be. If all of your potential target
machines are 2's complement, and will have an integral type with the
same size as your input type, then you can use exactly the same code as
above. If your native machine is a 1's complement 36 bit machine (e.g.
a Unisys mainframe), and you're reading signed network format integers
(32 bit 2's complement), you'll need some additional logic.

As always, create the desired variable and populate it byte-wise:
#include <algorithm>
#include <type_traits>
template <typename R>
R get(unsigned char * p, std::size_t len = sizeof(R))
{
assert(len >= sizeof(R) && std::is_trivially_copyable<R>::value);
R result;
std::copy(p, p + sizeof(R), static_cast<unsigned char *>(&result));
return result;
}
This only works universally for trivially copyable types, though you can probably use it for on-trivial types if you have additional guarantees from elsewhere.

Related

Why Doesn't A Character Array Give an Unsigned Result

In this project I am supposed to receive a packet, and cast a part of it to an unsigned integer and get both Big-Endian and Little-Endian results. Originally, I wanted to just cast a pointer inside the byte array (packet) to an unsigned integer type that would automatically put the value received in Big-Endian form, like (uint32_be_t*)packet; similar to the way that it's automatically put into Little-Endian form when doing (uint32_t*)packet.
Since I couldn't find a type that automatically did this, I decided to create my own structure called "u32" which has the methods "get," which gets the value in Big-Endian form, and "get_le," which gets the value in Little-Endian form. However, I noticed that when I do this I get a negative result from the Little-Endian result.
struct u32 {
u8 data[4] = {};
uint32_t get() {
return ((uint32_t)data[3] << 0)
| ((uint32_t)data[2] << 8)
| ((uint32_t)data[1] << 16)
| ((uint32_t)data[0] << 24);
}
uint32_t get_le() {
return ((uint32_t)data[3] << 24)
| ((uint32_t)data[2] << 16)
| ((uint32_t)data[1] << 8)
| ((uint32_t)data[0] << 0);
}
};
In order to simulate a packet, I just created a character array and then cast a u32* to it like so:
int main() {
char ary[] = { 0x00, 0x00, 0x00, (char)0xF4 };
u32* v = (u32*)ary;
printf("%d %d\n", v->get(), v->get_le());
return 0;
}
But then I get the results: 244 -201326592
Why is this happening? The return type to "get_le" is uint32_t, and the first function, "get," which is supposed to return the Big-Endian unsigned integer, is performing correctly.
As a side note, this was just a test that popped into my head, so I went to the library to test it in-between classes, but unfortunately that means I have to use an online compiler (onlinegdb), but I figure it would work the same in Visual Studio. Also, if you have any suggestions as to how I could improve my code, it would be greatly appreciated. I am using Visual Studio 2019 and am allowed to use cstdlib.

Well, I daresay you want to use %u not %d in that printf() format-string!
%d assumes that the value is signed, so if the most-significant bit is 1 you get a minus sign.

There is a more elegant way to accomplish the same task. Just use uint32_t instead. You can use std::memcpy to convert between char arrays and uint32_t without invoking undefined behavior. This is what std::bit_cast does too. Reinterpreting a char* as an int* is undefined behavior. It is not the cause of your problem, because MSVC allows for it, but that's not really portable.
std::memcpy conversions or pointer casts will take place with native byte order, which is either little or big endian.
You can convert between byte orders using a builtin function. For MSVC, this would be:
_byteswap_ulong(x); // unsigned long is uint32_t on Windows
See the documentation of _byteswap_ulong. This will compile to just a single x86 bswap instruction, which is unlikely for your series of shifts. This can improve performance by a factor of 10x. GCC and clang have __builtin_bswap if you want portable code.
You can detect native endianness using std::endian or if you don't have C++20, __BYTE_ORDER__ macros. Converting to little-endian or big-endian would then just be doing nothing or performing a byte swap depending on your platform endianness.
#include <bit>
#include <cstring>
#include <cstdint>
uint32_t bswap(uint32_t x) {
return _byteswap_ulong(x);
}
uint32_t to_be(uint32_t x) {
return std::endian::native == std::endian::big ? x : bswap(x);
}
uint32_t to_le(uint32_t x) {
return std::endian::native == std::endian::little ? x : bswap(x);
}
int main() {
char ary[4] = { 0, 0, 0, (char) 0xF4 };
uint32_t v;
std::memcpy(&v, &ary, 4);
printf("%u %u\n", to_be(v), to_le(v));
return 0;
}

How to safely extract a signed field from a uint32_t into a signed number (int or uint32_t)

I have a project in which I am getting a vector of 32-bit ARM instructions, and a part of the instructions (offset values) needs to be read as signed (two's complement) numbers instead of unsigned numbers.
I used a uint32_t vector because all the opcodes and registers are read as unsigned and the whole instruction was 32-bits.
For example:
I have this 32-bit ARM instruction encoding:
uint32_t addr = 0b00110001010111111111111111110110
The last 19 bits are the offset of the branch that I need to read as signed integer branch displacement.
This part: 1111111111111110110
I have this function in which the parameter is the whole 32-bit instruction:
I am shifting left 13 places and then right 13 places again to have only the offset value and move the other part of the instruction.
I have tried this function casting to different signed variables, using different ways of casting and using other c++ functions, but it prints the number as it was unsigned.
int getCat1BrOff(uint32_t inst)
{
uint32_t temp = inst << 13;
uint32_t brOff = temp >> 13;
return (int)brOff;
}
I get decimal number 524278 instead of -10.
The last option that I think is not the best one, but it may work is to set all the binary values in a string. Invert the bits and add 1 to convert them and then convert back the new binary number into decimal. As I would of do it in a paper, but it is not a good solution.

It boils down to doing a sign extension where the sign bit is the 19th one.
There are two ways.
Use arithmetic shifts.
Detect sign bit and or with ones at high bits.
There is no portable way to do 1. in C++. But it can be checked on compilation time. Please correct me if the code below is UB, but I believe it is only implementation defined - for which we check at compile time.
The only questionable thing is conversion of unsigned to signed which overflows, and the right shift, but that should be implementation defined.
int getCat1BrOff(uint32_t inst)
{
if constexpr (int32_t(0xFFFFFFFFu) >> 1 == int32_t(0xFFFFFFFFu))
{
return int32_t(inst << uint32_t{13}) >> int32_t{13};
}
else
{
int32_t offset = inst & 0x0007FFFF;
if (offset & 0x00040000)
{
offset |= 0xFFF80000;
}
return offset;
}
}
or a more generic solution
template <uint32_t N>
int32_t signExtend(uint32_t value)
{
static_assert(N > 0 && N <= 32);
constexpr uint32_t unusedBits = (uint32_t(32) - N);
if constexpr (int32_t(0xFFFFFFFFu) >> 1 == int32_t(0xFFFFFFFFu))
{
return int32_t(value << unusedBits) >> int32_t(unusedBits);
}
else
{
constexpr uint32_t mask = uint32_t(0xFFFFFFFFu) >> unusedBits;
value &= mask;
if (value & (uint32_t(1) << (N-1)))
{
value |= ~mask;
}
return int32_t(value);
}
}
https://godbolt.org/z/rb-rRB

In practice, you just need to declare temp as signed:
int getCat1BrOff(uint32_t inst)
{
int32_t temp = inst << 13;
return temp >> 13;
}
Unfortunately this is not portable:
For negative a, the value of a >> b is implementation-defined (in most
implementations, this performs arithmetic right shift, so that the
result remains negative).
But I have yet to meet a compiler that doesn't do the obvious thing here.

XTEA-function with std::vector

I'm trying to encrypt a std::vector with XTEA. Because using std::vector brings various benefits dealing with big amounts of data, i want to use it.
The XTEA-Alogrithm uses two unsigned longs (v0 and v1) which take 64 bits of data, to encrypt them.
xtea_enc(unsigned char buf[], int length, unsigned char key[], unsigned char** outbuf)
/* Source http://pastebin.com/uEvZqmUj */
unsigned long v0 = *((unsigned long*)(buf+n));
unsigned long v1 = *((unsigned long*)(buf+n+4));
My problem is, that I'm looking for the best way to convert my char vector into a unsigned long pointer.
Or is there another way to split vector in 64-bit parts for the encryption function?

The insight comes in realizing that each char is a byte; thus a 64 bit number consists of 8 bytes or two 32 bit numbers.
Thus one 32 bit number can store 4 bytes, so you would for each 8 byte block in your char vector, store a pair of 4 byte numbers in a pair of 32 bit numbers. You would then pass this pair in to your xtea function, something like:
uint32_t datablock[2];
datablock[0] = (buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | (buf[3]);
datablock[1] = (buf[4] << 24) | (buf[5] << 16) | (buf[6] << 8) | (buf[7]);
where in this example, buf is the type char[8] (or more appropriately uint8_t[8]).
The bit-shift '<<' operator shifts the placement of where a given byte's bits should be stored in the uint32_t (thus for example, the first byte in the above example is stored in the first 8 bits of datablock[0]). The '|' operator provides a concatenaton of all bits so that you end up with the full 32 bit number. Hope that makes sense.

My problem is, that I'm looking for the best way to convert my char vector into a unsigned long pointer.
((unsigned long*)vec.data()) since C++11 or ((unsigned long*)&vec[0])) pre-c++11?
PS: i guess someone will come along and argue that it should be a reinterpret_cast<unsigned long*>() or something sooner or later, and they'll probably be right.
also, i used a std::string, but here's how i did the enciper loop:
string message = readMessage();
for (size_t i = 0; i < message.length(); i += 8)
{
encipher(32, (uint32_t *)&message[i], keys);
}
// now message is encrypted
and
for (size_t i = 0; i < message.length(); i += 8)
{
decipher(32, (uint32_t *)&message[i], keys);
}
// now message is decrypted (still may have padding bytes tho)
and i just used the sample C enciper/deciper functions from XTEA's wikipedia page.

How to store double - endian independent

Despite the fact that big-endian computers are not very widely used, I want to store the double datatype in an independant format.
For int, this is really simple, since bit shifts make that very convenient.
int number;
int size=sizeof(number);
char bytes[size];
for (int i=0; i<size; ++i)
bytes[size-1-i] = (number >> 8*i) & 0xFF;
This code snipet stores the number in big endian format, despite the machine it is being run on. What is the most elegant way to do this for double?

The best way for portability and taking format into account, is serializing/deserializing the mantissa and the exponent separately. For that you can use the frexp()/ldexp() functions.
For example, to serialize:
int exp;
unsigned long long mant;
mant = (unsigned long long)(ULLONG_MAX * frexp(number, &exp));
// then serialize exp and mant.
And then to deserialize:
// deserialize to exp and mant.
double result = ldexp ((double)mant / ULLONG_MAX, exp);

The elegant thing to do is to limit the endianness problem to as small a scope as possible. That narrow scope is the I/O boundary between your program and the outside world. For example, the functions that send binary data to / receive binary data from some other application need to be aware of the endian problem, as do the functions that write binary data to / read binary data from some data file. Make those interfaces cognizant of the representation problem.
Make everything else blissfully ignorant of the problem. Use the local representation everywhere else. Represent a double precision floating point number as a double rather than an array of 8 bytes, represent a 32 bit integer as an int or int32_t rather than an array of 4 bytes, et cetera. Dealing with the endianness problem throughout your code is going to make your code bloated, error prone, and ugly.

The same. Any numeric object, including double, is eventually several bytes which are interpreted in a specific order according to endianness. So if you revert the order of the bytes you'll get exactly the same value in the reversed endianness.

char *src_data;
char *dst_data;
for (i=0;i<N*sizeof(double);i++) *dst_data++=src_data[i ^ mask];
// where mask = 7, if native == low endian
// mask = 0, if native = big_endian
The elegance lies in mask which handles also short and integer types: it's sizeof(elem)-1 if the target and source endianness differ.

Not very portable and standards violating, but something like this:
std::array<unsigned char, 8> serialize_double( double const* d )
{
std::array<unsigned char, 8> retval;
char const* begin = reinterpret_cast<char const*>(d);
char const* end = begin + sizeof(double);
union
{
uint8 i8s[8];
uint16 i16s[4];
uint32 i32s[2];
uint64 i64s;
} u;
u.i64s = 0x0001020304050607ull; // one byte order
// u.i64s = 0x0706050403020100ull; // the other byte order
for (size_t index = 0; index < 8; ++index)
{
retval[ u.i8s[index] ] = begin[index];
}
return retval;
}
might handle a platform with 8 bit chars, 8 byte doubles, and any crazy-ass byte ordering (ie, big endian in words but little endian between words for 64 bit values, for example).
Now, this doesn't cover the endianness of doubles being different than that of 64 bit ints.
An easier approach might be to cast your double into a 64 bit unsigned value, then output that as you would any other int.

void reverse_endian(double number, char (&bytes)[sizeof(double)])
{
const int size=sizeof(number);
memcpy(bytes, &number, size);
for (int i=0; i<size/2; ++i)
std::swap(bytes[i], bytes[size-i-1]);
}

Reading "integer" size bytes from a char* array.

I want to read sizeof(int) bytes from a char* array.
a) In what scenario's do we need to worry if endianness needs to be checked?
b) How would you read the first 4 bytes either taking endianness into consideration or not.
EDIT : The sizeof(int) bytes that I have read needs to be compared with an integer value.
What is the best approach to go about this problem

Do you mean something like that?:
char* a;
int i;
memcpy(&i, a, sizeof(i));
You only have to worry about endianess if the source of the data is from a different platform, like a device.

a) You only need to worry about "endianness" (i.e., byte-swapping) if the data was created on a big-endian machine and is being processed on a little-endian machine, or vice versa. There are many ways this can occur, but here are a couple of examples.
You receive data on a Windows machine via a socket. Windows employs a little-endian architecture while network data is "supposed" to be in big-endian format.
You process a data file that was created on a system with a different "endianness."
In either of these cases, you'll need to byte-swap all numbers that are bigger than 1 byte, e.g., shorts, ints, longs, doubles, etc. However, if you are always dealing with data from the same platform, endian issues are of no concern.
b) Based on your question, it sounds like you have a char pointer and want to extract the first 4 bytes as an int and then deal with any endian issues. To do the extraction, use this:
int n = *(reinterpret_cast<int *>(myArray)); // where myArray is your data
Obviously, this assumes myArray is not a null pointer; otherwise, this will crash since it dereferences the pointer, so employ a good defensive programming scheme.
To swap the bytes on Windows, you can use the ntohs()/ntohl() and/or htons()/htonl() functions defined in winsock2.h. Or you can write some simple routines to do this in C++, for example:
inline unsigned short swap_16bit(unsigned short us)
{
return (unsigned short)(((us & 0xFF00) >> 8) |
((us & 0x00FF) << 8));
}
inline unsigned long swap_32bit(unsigned long ul)
{
return (unsigned long)(((ul & 0xFF000000) >> 24) |
((ul & 0x00FF0000) >> 8) |
((ul & 0x0000FF00) << 8) |
((ul & 0x000000FF) << 24));
}

Depends on how you want to read them, I get the feeling you want to cast 4 bytes into an integer, doing so over network streamed data will usually end up in something like this:
int foo = *(int*)(stream+offset_in_stream);

The easy way to solve this is to make sure whatever generates the bytes does so in a consistent endianness. Typically the "network byte order" used by various TCP/IP stuff is
best: the library routines htonl and ntohl work very well with this, and they
are usually fairly well optimized.
However, if network byte order is not being used, you may need to do things in
other ways. You need to know two things: the size of an integer, and the byte order.
Once you know that, you know how many bytes to extract and in which order to put
them together into an int.
Some example code that assumes sizeof(int) is the right number of bytes:
#include <limits.h>
int bytes_to_int_big_endian(const char *bytes)
{
int i;
int result;
result = 0;
for (i = 0; i < sizeof(int); ++i)
result = (result << CHAR_BIT) + bytes[i];
return result;
}
int bytes_to_int_little_endian(const char *bytes)
{
int i;
int result;
result = 0;
for (i = 0; i < sizeof(int); ++i)
result += bytes[i] << (i * CHAR_BIT);
return result;
}
#ifdef TEST
#include <stdio.h>
int main(void)
{
const int correct = 0x01020304;
const char little[] = "\x04\x03\x02\x01";
const char big[] = "\x01\x02\x03\x04";
printf("correct: %0x\n", correct);
printf("from big-endian: %0x\n", bytes_to_int_big_endian(big));
printf("from-little-endian: %0x\n", bytes_to_int_little_endian(little));
return 0;
}
#endif

How about
int int_from_bytes(const char * bytes, _Bool reverse)
{
if(!reverse)
return *(int *)(void *)bytes;
char tmp[sizeof(int)];
for(size_t i = sizeof(tmp); i--; ++bytes)
tmp[i] = *bytes;
return *(int *)(void *)tmp;
}
You'd use it like this:
int i = int_from_bytes(bytes, SYSTEM_ENDIANNESS != ARRAY_ENDIANNESS);
If you're on a system where casting void * to int * may result in alignment conflicts, you can use
int int_from_bytes(const char * bytes, _Bool reverse)
{
int tmp;
if(reverse)
{
for(size_t i = sizeof(tmp); i--; ++bytes)
((char *)&tmp)[i] = *bytes;
}
else memcpy(&tmp, bytes, sizeof(tmp));
return tmp;
}

You shouldn't need to worry about endianess unless you are reading the bytes from a source created on a different machine, e.g. a network stream.
Given that, can't you just use a for loop?
void ReadBytes(char * stream) {
for (int i = 0; i < sizeof(int); i++) {
char foo = stream[i];
}
}
}
Are you asking for something more complicated than that?

You need to worry about endianess only if the data you're reading is composed of numbers which are larger than one byte.
if you're reading sizeof(int) bytes and expect to interpret them as an int then endianess makes a difference. essentially endianness is the way in which a machine interprets a series of more than 1 bytes into a numerical value.

Just use a for loop that moves over the array in sizeof(int) chunks.
Use the function ntohl (found in the header <arpa/inet.h>, at least on Linux) to convert from bytes in the network order (network order is defined as big-endian) to local byte-order. That library function is implemented to perform the correct network-to-host conversion for whatever processor you're running on.

Why read when you can just compare?
bool AreEqual(int i, char *data)
{
return memcmp(&i, data, sizeof(int)) == 0;
}
If you are worrying about endianness when you need to convert all of integers to some invariant form. htonl and ntohl are good examples.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to get values from unaligned memory in a standard way? - c++

Related

Why Doesn't A Character Array Give an Unsigned Result

How to safely extract a signed field from a uint32_t into a signed number (int or uint32_t)

XTEA-function with std::vector

How to store double - endian independent

Reading "integer" size bytes from a char* array.

Categories

Resources