I have big enough
std::vector<byte> source
and I need to get four bytes from any offset in vector (for example, 10-13 bytes) and convert it to integer.
int ByteVector2Int(std::vector &source, int offset)
{
return (source[offset] | source[offset + 1] << 8 | source[offset + 2] << 16 | source[offset + 3] << 24);
}
This method called too offen, how I can do that with maximum perfomance?
Use memcpy. You might be tempted to use reinterpret_cast, but then you can easily end up with undefined behavior (for instance due to alignment issues). Also, pass a vector by a const reference:
int f(const std::vector<std::byte>& v, size_t n)
{
int temp;
memcpy(&temp, v.data() + n, sizeof(int));
return temp;
}
Note that compilers are very good in optimizations. In my case, GCC with -O2 resulted in:
mov rax, qword ptr [rdi]
mov eax, dword ptr [rax + rsi]
ret
So, there is no memcpy invoked and the assembly is minimal. Live demo: https://godbolt.org/z/oWGqej
UPDATE (based on question update)
After edit, you may also notice that the generated assembly is the very same (in my case) as for your approach:
int f2(const std::vector<std::byte>& v, size_t n)
{
return (int)(
(unsigned int)v[n]
+ ((unsigned int)v[n + 1] << 8)
+ ((unsigned int)v[n + 2] << 16)
+ ((unsigned int)v[n + 3] << 24) );
}
Live demo: https://godbolt.org/z/c9dE9W
Note that your code is not correct. First, bitwise operations are performed with std::byte which overflows, and second, there is no implicit conversion of std::byte to int.
Related
If you want to convert uint64_t to a uint8_t[8] (little endian). On a little endian architecture you can just do an ugly reinterpret_cast<> or memcpy(), e.g:
void from_memcpy(const std::uint64_t &x, uint8_t* bytes) {
std::memcpy(bytes, &x, sizeof(x));
}
This generates efficient assembly:
mov rax, qword ptr [rdi]
mov qword ptr [rsi], rax
ret
However it is not portable. It will have different behaviour on a little endian machine.
For converting uint8_t[8] to uint64_t there is a great solution - just do this:
void to(const std::uint8_t* bytes, std::uint64_t &x) {
x = (std::uint64_t(bytes[0]) << 8*0) |
(std::uint64_t(bytes[1]) << 8*1) |
(std::uint64_t(bytes[2]) << 8*2) |
(std::uint64_t(bytes[3]) << 8*3) |
(std::uint64_t(bytes[4]) << 8*4) |
(std::uint64_t(bytes[5]) << 8*5) |
(std::uint64_t(bytes[6]) << 8*6) |
(std::uint64_t(bytes[7]) << 8*7);
}
This looks inefficient but actually with Clang -O2 it generates exactly the same assembly as before, and if you compile on a big endian machine it will be smart enough to use a native byte swap instruction. E.g. this code:
void to(const std::uint8_t* bytes, std::uint64_t &x) {
x = (std::uint64_t(bytes[7]) << 8*0) |
(std::uint64_t(bytes[6]) << 8*1) |
(std::uint64_t(bytes[5]) << 8*2) |
(std::uint64_t(bytes[4]) << 8*3) |
(std::uint64_t(bytes[3]) << 8*4) |
(std::uint64_t(bytes[2]) << 8*5) |
(std::uint64_t(bytes[1]) << 8*6) |
(std::uint64_t(bytes[0]) << 8*7);
}
Compiles to:
mov rax, qword ptr [rdi]
bswap rax
mov qword ptr [rsi], rax
ret
My question is: is there an equivalent reliably-optimised construct for converting in the opposite direction? I've tried this, but it gets compiled naively:
void from(const std::uint64_t &x, uint8_t* bytes) {
bytes[0] = x >> 8*0;
bytes[1] = x >> 8*1;
bytes[2] = x >> 8*2;
bytes[3] = x >> 8*3;
bytes[4] = x >> 8*4;
bytes[5] = x >> 8*5;
bytes[6] = x >> 8*6;
bytes[7] = x >> 8*7;
}
Edit: After some experimentation, this code does get compiled optimally with GCC 8.1 and later as long as you use uint8_t* __restrict__ bytes. However I still haven't managed to find a form that Clang will optimise.
Here's what I could test based on the discussion in OP's comments:
void from_optimized(const std::uint64_t &x, std::uint8_t* bytes) {
std::uint64_t big;
std::uint8_t* temp = (std::uint8_t*)&big;
temp[0] = x >> 8*0;
temp[1] = x >> 8*1;
temp[2] = x >> 8*2;
temp[3] = x >> 8*3;
temp[4] = x >> 8*4;
temp[5] = x >> 8*5;
temp[6] = x >> 8*6;
temp[7] = x >> 8*7;
std::uint64_t* dest = (std::uint64_t*)bytes;
*dest = big;
}
Looks like this will make things clearer for the compiler and let it assume the necessary parameters to optimize it (both on GCC and Clang with -O2).
Compiling to x86-64 (little endian) on Clang 8.0.0 (test on Godbolt):
mov rax, qword ptr [rdi]
mov qword ptr [rsi], rax
ret
Compiling to aarch64_be (big endian) on Clang 8.0.0 (test on Godbolt):
ldr x8, [x0]
rev x8, x8
str x8, [x1]
ret
What about returning a value?
Easy to reason about and small assembly:
#include <cstdint>
#include <array>
auto to_bytes(std::uint64_t x)
{
std::array<std::uint8_t, 8> b;
b[0] = x >> 8*0;
b[1] = x >> 8*1;
b[2] = x >> 8*2;
b[3] = x >> 8*3;
b[4] = x >> 8*4;
b[5] = x >> 8*5;
b[6] = x >> 8*6;
b[7] = x >> 8*7;
return b;
}
https://godbolt.org/z/FCroX5
and big endian:
#include <stdint.h>
struct mybytearray
{
uint8_t bytes[8];
};
auto to_bytes(uint64_t x)
{
mybytearray b;
b.bytes[0] = x >> 8*0;
b.bytes[1] = x >> 8*1;
b.bytes[2] = x >> 8*2;
b.bytes[3] = x >> 8*3;
b.bytes[4] = x >> 8*4;
b.bytes[5] = x >> 8*5;
b.bytes[6] = x >> 8*6;
b.bytes[7] = x >> 8*7;
return b;
}
https://godbolt.org/z/WARCqN
(std::array not available for -target aarch64_be? )
First of all, the reason why your original from implementation cannot be optimized is because you are passing the arguments by reference and pointer. So, the compiler has to consider the possibility that both of of them point to the very same address (or at least that they overlap). As you have 8 consecutive read and write operations to the (potentially) same address, the as-if rule cannot be applied here.
Note, that just by removing the the & from the function signature, apparently GCC already considers this as proof that bytes does not point into x and thus this can safely be optimized. However, for Clang this is not good enough.
Technically, of course bytes can point to from's stack memory (aka. to x), but I think that would be undefined behavior and thus Clang just misses this optimization.
Your implementation of to doesn't suffer from this issue because you have implemented it in such a way that first you read all the values of bytes and then you make one big assignment to x. So even if x and bytes point to the same address, as you do all the reading first and all the writing afterwards (instead of mixing reads and writes as you do in from), this can be optimized.
Flávio Toribio's answer works because it does precisely this: It reads all the values first and only then writes to the destination.
However, there are less complicated ways to achieve this:
void from(uint64_t x, uint8_t* dest) {
uint8_t bytes[8];
bytes[7] = uint8_t(x >> 8*7);
bytes[6] = uint8_t(x >> 8*6);
bytes[5] = uint8_t(x >> 8*5);
bytes[4] = uint8_t(x >> 8*4);
bytes[3] = uint8_t(x >> 8*3);
bytes[2] = uint8_t(x >> 8*2);
bytes[1] = uint8_t(x >> 8*1);
bytes[0] = uint8_t(x >> 8*0);
*(uint64_t*)dest = *(uint64_t*)bytes;
}
gets compiled to
mov qword ptr [rsi], rdi
ret
on little endian and to
rev x8, x0
str x8, [x1]
ret
on big endian.
Note, that even if you passed x by reference, Clang would be able to optimize this. However, that would result in one more instruction each:
mov rax, qword ptr [rdi]
mov qword ptr [rsi], rax
ret
and
ldr x8, [x0]
rev x8, x8
str x8, [x1]
ret
respectively.
Also note, that you can improve your implementation of to with a similar trick: Instead of passing the result by non-const reference, take the "more natural" approach and just return it from the function:
uint64_t to(const uint8_t* bytes) {
return
(uint64_t(bytes[7]) << 8*7) |
(uint64_t(bytes[6]) << 8*6) |
(uint64_t(bytes[5]) << 8*5) |
(uint64_t(bytes[4]) << 8*4) |
(uint64_t(bytes[3]) << 8*3) |
(uint64_t(bytes[2]) << 8*2) |
(uint64_t(bytes[1]) << 8*1) |
(uint64_t(bytes[0]) << 8*0);
}
Summary:
Don't pass arguments by reference.
Do all the reading first, then all the writing.
Here are the best solutions I could get to for both, little endian and big endian. Note, how to and from are truly inverse operations that can be optimized to a no-op if executed one after another.
The code you've given is way overcomplicated. You can replace it with:
void from(uint64_t x, uint8_t* dest) {
x = htole64(x);
std::memcpy(dest, &x, sizeof(x));
}
Yes, this uses the Linux-ism htole64(), but if you're on another platform you can easily reimplement that.
Clang and GCC optimize this perfectly, on both little- and big-endian platforms.
Actual refined question:
Why does this not print 0?
#include "stdafx.h"
#include <iostream>
#include <string>
int _tmain(int argc, _TCHAR* argv[])
{
unsigned char barray[] = {1,2,3,4,5,6,7,8,9};
unsigned long weirdValue = barray[3] << 32;
std::cout << weirdValue; // prints 4
std::string bla;
std::getline(std::cin, bla);
return 0;
}
The disassembly of the shift operation:
10: unsigned long weirdValue = barray[3] << 32;
00411424 movzx eax,byte ptr [ebp-1Dh]
00411428 shl eax,20h
0041142B mov dword ptr [ebp-2Ch],eax
Original question:
I found the following snippet in some old code we maintain. It converts a byte array to multiple float values and adds the floats to a list. Why does it work for byte arrays greater than 4?
unsigned long ulValue = 0;
for (USHORT usIndex = 0; usIndex < m_oData.usNumberOfBytes; usIndex++)
{
if (usIndex > 0 && (usIndex % 4) == 0)
{
float* pfValue = (float*)&ulValue;
oValues.push_back(*pfValue);
ulValue = 0;
}
ulValue += (m_oData.pabyDataBytes[usIndex] << (8*usIndex)); // Why does this work for usIndex > 3??
}
I would understand that this works if << was a rotate operator, not a shift operator. Or if it was
ulValue += (m_oData.pabyDataBytes[usIndex] << (8*(usIndex%4)))
But the code like i found it just confuses me.
The code is compiled using VS 2005.
If i try the original snippet in the immediate window, it doesn't work though.
I know how to do this properly, i just want to know why the code and especially the shift operation works as it is.
Edit: The disassembly for the shift operation is:
13D61D0A shl ecx,3 // multiply uIndex by 8
13D61D0D shl eax,cl // shift to left, does nothing for multiples of 32
13D61D0F add eax,dword ptr [ulValue]
13D61D15 mov dword ptr [ulValue],eax
So the disassembly is fine.
The shift count is masked to 5 bits, which limits the range to 0-31.
A shift of 32 therefore is same as a shift of zero.
http://x86.renejeschke.de/html/file_module_x86_id_285.html
I've been trying to implement shift by vector in SSE2 intrinsics, but from experimentation and the intel intrinsic guide, it appears to only use the least-significant part of the vector.
To reword my question, given a vector {v1, v2, ..., vn} and a set of shifts {s1, s2, ..., sn}, how do I calculate a result {r1, r2, ..., rn} such that:
r1 = v1 << s1
r2 = v2 << s2
...
rn = vn << sn
since it appears that _mm_sll_epi* performs this:
r1 = v1 << s1
r2 = v2 << s1
...
rn = vn << s1
Thanks in advance.
EDIT:
Here's the code I have:
#include <iostream>
#include <cstdint>
#include <mmintrin.h>
#include <emmintrin.h>
namespace SIMD {
using namespace std;
class SSE2 {
public:
// flipped operands due to function arguments
SSE2(uint64_t a, uint64_t b, uint64_t c, uint64_t d) { low = _mm_set_epi64x(b, a); high = _mm_set_epi64x(d, c); }
uint64_t& operator[](int idx)
{
switch (idx) {
case 0:
_mm_storel_epi64((__m128i*)result, low);
return result[0];
case 1:
_mm_store_si128((__m128i*)result, low);
return result[1];
case 2:
_mm_storel_epi64((__m128i*)result, high);
return result[0];
case 3:
_mm_store_si128((__m128i*)result, high);
return result[1];
}
/* Undefined behaviour */
return 0;
}
SSE2& operator<<=(const SSE2& rhs)
{
low = _mm_sll_epi64(low, rhs.getlow());
high = _mm_sll_epi64(high, rhs.gethigh());
return *this;
}
void print()
{
uint64_t a[2];
_mm_store_si128((__m128i*)a, low);
cout << hex;
cout << a[0] << ' ' << a[1] << ' ';
_mm_store_si128((__m128i*)a, high);
cout << a[0] << ' ' << a[1] << ' ';
cout << dec;
}
__m128i getlow() const
{
return low;
}
__m128i gethigh() const
{
return high;
}
private:
__m128i low, high;
uint64_t result[2];
};
}
int main()
{
cout << "operator<<= test: vector << vector: ";
{
auto x = SIMD::SSE2(7, 8, 15, 10);
auto y = SIMD::SSE2(4, 5, 6, 7);
x.print();
y.print();
x <<= y;
if (x[0] != 112 || x[1] != 256 || x[2] != 960 || x[3] != 1280) {
cout << "FAILED: ";
x.print();
cout << endl;
} else {
cout << "PASSED" << endl;
}
}
return 0;
}
What should be happening gets results of {7 << 4 = 112, 8 << 5 = 256, 15 << 6 = 960, 10 << 7 = 1280}. The results seem to be {7 << 4 = 112, 8 << 4 = 128, 15 << 6 = 960, 15 << 6 = 640}, which isn't what I want.
Hope this helps, Jens.
If AVX2 is available, and your elements are 32 or 64 bits, your operation takes one variable-shift instruction: vpsrlvq, (__m128i _mm_srlv_epi64 (__m128i a, __m128i count) )
For 32bit elements with SSE4.1, see Shifting 4 integers right by different values SIMD. Depending on latency vs. throughput requirements, you can do separate shifts shift and then blend, or use a multiply (by a specially-constructed vector of powers of 2) to get variable-count left shifts and then do a same-count-for-all-elements right shift.
For your case, 64bit elements with runtime-variable shift counts:
There are only two elements per SSE vector, so we just need two shifts and then combine the results (which we can do with a pblendw, or with a floating-point movsd (which may cause extra bypass-delay latency on some CPUs), or we can use two shuffles, or we can do two ANDs and an OR.
__m128i SSE2_emulated_srlv_epi64(__m128i a, __m128i count)
{
__m128i shift_low = _mm_srl_epi64(a, count); // high 64 is garbage
__m128i count_high = _mm_unpackhi_epi64(count,count); // broadcast the high element
__m128i shift_high = _mm_srl_epi64(a, count_high); // low 64 is garbage
// SSE4.1:
// return _mm_blend_epi16(shift_low, shift_high, 0x0F);
#if 1 // use movsd to blend
__m128d blended = _mm_move_sd( _mm_castsi128_pd(shift_high), _mm_castsi128_pd(shift_low) ); // use movsd as a blend. Faster than multiple instructions on most CPUs, but probably bad on Nehalem.
return _mm_castpd_si128(blended);
#else // SSE2 without using FP instructions:
// if we're going to do it this way, we could have shuffled the input before shifting. Probably not helpful though.
shift_high = _mm_unpackhi_epi64(shift_high, shift_high); // broadcast the high64
return _mm_unpacklo_epi64(shift_high, shift_low); // combine
#endif
}
Other shuffles like pshufd or psrldq would work, but punpckhqdq gets the job done without needing an immediate byte, so it's one byte shorter. SSSE3 palignr could get the high element from one register and the low element from another register into one vector, but they'd be reversed (so we'd need a pshufd to swap high and low halves). shufpd would work to blend, but has no advantage over movsd.
See Agner Fog's microarch guide for the details of the potential bypass-delay latency from using an FP instruction between two integer instructions. It's probably fine on Intel SnB-family CPUs, because other FP shuffles are. (And yes, movsd xmm1, xmm0 runs on the shuffle unit in port5. Use movaps or movapd for reg-reg moves even of scalars if you don't need the merging behaviour).
This compiles (on Godbolt with gcc5.3 -O3) to
movdqa xmm2, xmm0 # tmp97, a
psrlq xmm2, xmm1 # tmp97, count
punpckhqdq xmm1, xmm1 # tmp99, count
psrlq xmm0, xmm1 # tmp100, tmp99
movsd xmm0, xmm2 # tmp102, tmp97
ret
I'm really confused by how uint32_t pointers work in C++
I was just fiddling around trying to learn TEA, and I didn't understand when they passed a uint32_t parameter to the encrypt function, and then in the function declared a uint32_t variable and assigning the parameter to it as if the parameter is an array.
Like this:
void encrypt (uint32_t* v, uint32_t* k) {
uint32_t v0=v[0], v1=v[1], sum=0, i;
So I decided to play around with uint32_t pointers, and wrote this short code:
int main ()
{
uint32_t *plain_text;
uint32_t key;
unsigned int temp = 123232;
plain_text = &temp;
key = 7744;
cout << plain_text[1] << endl;
return 0;
}
And it blew my mind when the output was the value of "key". I have no idea how it works... and then when I tried with plain_text[0], it came back with the value of "temp".
So I'm stuck as hell trying to understand what's happening.
Looking back at the TEA code, is the uint32_t* v pointing to an array rather than a single unsigned int? And was what I did just a fluke?
uint32_t is a type. It means unsigned 32-bit integer. On your system it is probably a typedef name for unsigned int.
There's nothing special about a pointer to this particular type; you can have pointers to any type.
The [] in C and C++ are actually pointer indexing notation. p[0] means to retrieve the value at the location the pointer points to. p[1] gets the value at the next memory location after that. Then p[2] is the next location after that, and so on.
You can use this notation with arrays too because the name of an array is converted to a pointer to its first element when used like this.
So, your code plain_text[1] tries to read the next element after temp. Since temp is not actually an array, this causes undefined behaviour. In your particular case, the manifestation of this undefined behaviour is that it managed to read the memory address after temp without crashing, and that address was the same address where key is stored.
Formally your program has undefined behavior.
The expression plain_text[1] is equivalent to *(plain_text + 1) ([expr.sub] / 1). Although you can point to one past the end of an array (objects that aren't arrays are still considered single-element arrays for the purposes of pointer arithmetic ([expr.unary.op] / 3)), you cannot dereference this address ([expr.unary.op] / 1).
At this point the compiler can do whatever it wants to, in this case it has simply decided to treat the expression as if it were pointing to an array and that plain_text + 1, i.e. &temp + 1 points to the next uint32_t object in the stack, which in this case by coincidence is key.
You can see what's going on if you look at the assembly
mov DWORD PTR -16[rbp], 123232 ; unsigned int temp=123232;
lea rax, -16[rbp]
mov QWORD PTR -8[rbp], rax ; plain_text=&temp;
mov DWORD PTR -12[rbp], 7744 ; key=7744;
mov rax, QWORD PTR -8[rbp]
add rax, 4 ; plain_text[1], i.e. -16[rbp] + 4 == -12[rbp] == key
mov eax, DWORD PTR [rax]
mov edx, eax
mov rcx, QWORD PTR .refptr._ZSt4cout[rip]
call _ZNSolsEj ; std::ostream::operator<<(unsigned int)
mov rdx, QWORD PTR .refptr._ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_[rip]
mov rcx, rax
call _ZNSolsEPFRSoS_E ; std::ostream::operator<<(std::ostream& (*)(std::ostream&))
mov eax, 0
add rsp, 48
pop rbp
ret
In C and C++ arrays decay to pointers, resulting in array/pointer equivalence.
a[1]
when a is a simple type is equivalent to
*(a + 1)
If a is an array of simple types, a will decay at the earliest opportunity to the address of element 0.
int arr[5] = { 0, 1, 2, 3, 4 };
int i = 10;
int* ptr;
ptr = arr;
std::cout << *ptr << "\n"; // outputs 0
ptr = &arr[0]; // same address
std::cout << *ptr << "\n"; // outputs 0
std::cout << ptr[4] << "\n"; // outputs 4
std::cout << *(ptr + 4) << "\n"; // outputs 4
ptr = &i;
std::cout << *ptr << "\n"; // outputs 10
std::cout << ptr[0] << "\n";
std::cout << ptr[1] << "\n"; // UNDEFINED BEHAVIOR.
std::cout << *(ptr + 1) << "\n"; // UNDEFINED BEHAVIOR.
To understand ptr[0] and ptr[1] you simply have to understand pointer arithmetic.
uint32_t *plain_text; // In memory, four bytes are reserved for ***plain_text***
uint32_t key; // In memory, the next four bytes after ***plain_text*** are reserved for ***key***
Thus: &plain_text[0] is plain_text and &plain_text[1] refers to the the next four bytes which are at &key.
This scenario may explain that behaviour.
Well my problem is as follows:
I'm trying to translate an x86 assembly source code to c++ source code.
Explanation as to what registers are.
skip this if you know what they are and how they work.
As you may or may not know, assembly language makes use of "general purpose registers".
In x86 assembly these registers are, and can be considered as "4 bytes" in length variables ( int var in c++ ), their names are: eax, ebx, ecx and edx.
Now, these registers are each respectively broken down into ax, bx, cx and dx that represent the 2 bytes less significant value of each register.
ax, bx, cx and dx are also broken down into ah, bx, ch and dh ( most significant byte ) and al, bl, cl and dl ( less significant byte ).
So, for example:
If I set eax:
EAX = 0xAB12CDEF
that would automatically change ax, al and ah
AX would become 0xCDEF
AH would become 0xCD
AL would become 0xEF
My question is: How do I make that possible in C++ ?
int eax, ax, ah, al;
eax = 0xAB12CDEF
How can I make, ax, ah and al, change at the same time?
Or is it possible to make them pointers to different portions eax, if so, how?
Thanks!
P.S. Also how could i use to make another variable be a char ?
How could I make variable new variable "char chAL" point to al which points to eax.
So that when i make a change to chAL, the changes would automatically reverberate to eax, ah and al.
If your goal is to emulate X86 assembly code, then indeed you need to support the behaviour of X86 registers.
Here's a simple implementation using a union:
#include <iostream>
#include <cstdint>
using namespace std;
union reg_t {
uint64_t rx;
uint32_t ex;
uint16_t x;
struct {
uint8_t l;
uint8_t h;
};
};
int main(){
reg_t a;
a.rx = 0xdeadbeefcafebabe;
cout << "rax = " << hex << a.rx << endl;
cout << "eax = " << hex << a.ex << endl;
cout << "ax = " << hex << a.x << endl;
cout << "al = " << hex << (uint16_t)a.l << endl;
cout << "ah = " << hex << (uint16_t)a.h << endl;
cout << "ax & 0xFF = " << hex << (a.x & 0xFF) << endl;
cout << "(ah << 8) + al = " << hex << (a.h << 8) + a.l << endl;
}
output:
rax = deadbeefcafebabe
eax = cafebabe
ax = babe
al = be
ah = ba
ax & 0xFF = be
(ah << 8) + al = babe
You'll get the correct result on the right platform (little-endian). You'll have to swap
bytes, and/or add padding for other platforms.
That's the basic, down to earth solution, which will certainly work on many x86 platforms (at least X86/linux/g++ works fine), but the behaviour this very approach relies on seems undefined in C++.
Here's another approach using a byte array to store register content:
class x86register {
uint8_t bytes[8];
public:
x86register &operator =(const uint64_t &v){
for (int i = 0; i < 8; i++)
bytes[i] = (v >> (i * 8)) & 0xff;
return *this;
}
x86register &operator =(const uint32_t &v){
for (int i = 0; i < 4; i++)
bytes[i] = (v >> (i * 8)) & 0xff;
return *this;
}
x86register &operator =(const uint16_t &v){
for (int i = 0; i < 2; i++)
bytes[i] = (v >> (i * 8)) & 0xff;
return *this;
}
x86register &operator =(const uint8_t &v){
bytes[0] = v;
return *this;
}
operator uint64_t(){
uint64_t res = 0;
for (int i = 7; i >= 0; i--)
res = (res << 8) + bytes[i];
return res;
}
operator uint32_t(){
uint32_t res = 0;
for (int i = 4; i >= 0; i--)
res = (res << 8) + bytes[i];
return res;
}
operator uint16_t(){
uint16_t res = 0;
for (int i = 2; i >= 0; i--)
res = (res << 8) + bytes[i];
return res;
}
operator uint8_t(){
return bytes[0];
}
};
This simple class should work regardless of endianness on the running platform. Also, you probably want to add a few other accessors/mutators to handle the HSB (AH, BH, etc) of word registers.
You can extract parts of eax using bitwise operations, like this:
void main()
{
int eax, ax, ah, al;
eax = 0xAB12CDEF;
ax = eax & 0x0000FFFF;
ah = (eax & 0x0000FF00) >> 8;
al = eax & 0x000000FF;
printf("ax = eax & 0x0000FFFF = 0x%X\n", ax);
printf("ah = (eax & 0x0000FF00) >> 8 = 0x%X\n", ah);
printf("al = eax & 0x000000FF = 0x%X\n", al);
}
Output
ax = eax & 0x0000FFFF = 0xCDEF
ah = (eax & 0x0000FF00) >> 8 = 0xCD
al = eax & 0x000000FF = 0xEF
You could also define macro like that:
#define AX(dw) ((dw) & 0x0000FFFF)
#define AH(dw) ((dw) & 0x0000FF00) >> 8)
#define AL(dw) ((dw) & 0x000000FF)
void main()
{
int eax = 0xAB12CDEF;
cout << "ax = " << hex << AX(eax) << endl; // prints ax = 0xCDEF
}
If you want it to work as simply as you've put the example ints, you can get away with it through reinterpret casts, though this violates pointer aliasing rules, so the behavior is undefined.
std::uint32_t eax = 0xAB12CDEF;
std::uint16_t& ax = reinterpret_cast<std::uint16_t*>(&eax)[1];
std::uint8_t& ah = reinterpret_cast<std::uint8_t&>(ax);
std::uint8_t& al = (&ah)[1];
The second line casts the address of eax to a std::uint16_t*, by applying [1] to that, you get the second half of the 32 bits.
The third line is just a cast to uint8_t, which works because ah will be the same as the front of ax.
Indexing into the address of ah by 1 gives the following byte, which is al.
What you're trying to do seems pretty unsafe and strange though. So to get the most similar behavior in the sanest way, you could just use a custom type. However the results will be consistent from machine to machine in the below, but they won't in the above because of different endian schemes.
class Reg {
private:
std::uint32_t data_;
public:
Reg(std::uint32_t in) : data_{in} { }
std::uint32_t ex() const {
return data_;
}
std::uint16_t x() const {
return static_cast<std::uint16_t>(data_ & 0xFFFF);
}
std::uint8_t h() const {
return static_cast<std::uint8_t>((data_ & 0xFF00) >> 8);
}
std::uint8_t l() const {
return static_cast<std::uint8_t>(data_ & 0xFF);
}
};