I'm doing reverse-engineery stuff and patching a game's memory via DLL. Usually I stick to the same old way of patching everything in a single or several functions. But it feels like it could be pulled off better by using a struct array which defines the memory writes that need to take place and looping through them all in one go. Much easier to manage, IMO.
I wanna make it constant, though. So the data is all there in one go (in .rdata) instead of having to dynamically allocate memory for such things each patch, which is a simple task with 'bytesize' data, for example:
struct struc_patch
{
BYTE val[8]; // max size of each patch (usually I only use 5 bytes anyway for call and jmp writes)
// I can of course increase this if really needed
void *dest;
char size;
} patches[] =
{
// simply write "01 02 03 04" to 0x400000
{{0x1, 0x2, 0x3, 0x4}, (void*)0x400000, 4},
};
//[...]
for each(struc_patch p in patches)
{
memcpy(p.dest, p.val, p.size);
}
But when I want to get fancier with the types, I find no way to specify an integer like "0x90909090" as the byte array "90 90 90 90". So this won't work:
struct struc_patch
{
BYTE val[8]; // max size of each patch (usually I only use 5 bytes anyway for call and jmp writes)
// I can of course increase this if really needed
void *dest;
char size;
} patches[] =
{
// how to write "jmp MyHook"? Here, the jmp offset will be truncated instead of overlapping in the array. Annoying.
{{0xE9, (DWORD)&MyHook - 0x400005}, (void*)0x400000, 5},
};
Of course the major problem is that &MyHook has to be resolved by the compiler. Any other way to get the desired result and keep it const?
I've got little experience with STL, to be honest. So if there is a solution using that, I might need it explained in detail in order to understand the code properly. I'm a big C/C++/WinAPI junkie lol, but it's for a game written in a similar nature, so it fits.
I dont think anything from the STL will help you with this, not at compile time.
There might be a fancy way of doing with templates what you did with macros. (comma separating the bytes)
But I recommend doing something simple like this:
struct jump_insn
{
unsigned char opcode;
unsigned long addr;
} jump_insns[] = {
{0xe9, (unsigned long)&MyHook - 0x400005}
};
struct mem
{
unsigned char val[8];
} mems[] = {
{1,2,3,4}
};
struct struc_patch
{
unsigned char *val; // max size of each patch (usually I only use 5 bytes anyway for call and jmp writes)
// I can of course increase this if really needed
void *dest;
char size;
} patches[] =
{
// simply write "01 02 03 04" to 0x400000
{(unsigned char*)(&mems[0]), (void*)0x400000, 4},
// how to write "jmp MyHook"? Here, the jmp offset will be truncated instead of overlapping in the array. Annoying.
{(unsigned char*)(&jump_insns[0]), (void*)0x400000, 5},
};
You can't do everything inline and you will need new types for different kind of patches, but they can be arbitrarily long (not just 8 bytes) and everything will be in .rodata.
A better way to handle that is to calculate the address difference on the fly. For instance (source):
#define INST_CALL 0xE8
void InterceptLocalCode(BYTE bInst, DWORD pAddr, DWORD pFunc, DWORD dwLen)
{
BYTE *bCode = new BYTE[dwLen];
::memset(bCode, 0x90, dwLen);
DWORD dwFunc = pFunc - (pAddr + 5);
bCode[0] = bInst;
*(DWORD *)&bCode[1] = dwFunc;
WriteBytes((void*)pAddr, bCode, dwLen);
delete[] bCode;
}
void PatchCall(DWORD dwAddr, DWORD dwFunc, DWORD dwLen)
{
InterceptLocalCode(INST_CALL, dwAddr, dwFunc, dwLen);
}
dwAddr is the address to put the call instruction in, dwFunc is the function to call and dwLen is the length of the instruction to replace (basically used to calculate how many NOPs to put in).
To summarize, my solution (thanks to Nicolas' suggestion):
#pragma pack(push)
#pragma pack(1)
#define POFF(d,a) (DWORD)d-(a+5)
struct jump_insn
{
const BYTE opcode = 0xE9;
DWORD offset;
};
struct jump_short_insn
{
const BYTE opcode = 0xEB;
BYTE offset;
};
struct struc_patch
{
void *data;
void *dest;
char size;
};
#pragma pack(pop)
And in use:
// Patches
jump_insn JMP_HOOK_LoadButtonTextures = {POFF(&HOOK_LoadButtonTextures, 0x400000)};
struc_patch patches[] =
{
{&JMP_HOOK_LoadButtonTextures, IntToPtr(0x400000)},
};
Using class member const's I can define everything much easier and cleaner and it can simply all be memcpy'd. The pack pragma is of course required to ensure that memcpy doesn't copy the 3 align bytes between the BYTE opcode and DWORD value.
Thanks all, helped me make my patching methods a lot more robust.
Related
I am trying to port embed code in windows platform.
I have come across below problem I am posting a sample code here.
here even after I use int24 size remains 12 bytes in windows why ?
struct INT24
{
INT32 data : 24;
};
struct myStruct
{
INT32 a;
INT32 b;
INT24 c;
};
int _tmain(int argc, _TCHAR* argv[])
{
unsigned char myArr[11] = { 0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00, 0xFF,0xFF,0xFF };
myStruct *p = (myStruct*)myArr;
cout << sizeof(*p);
}
There are two reasons, each of which would be enough by themselves.
Presumably the size of INT32 is 4 bytes. The size of INT24 is also 4 bytes, because it contains a INT32 bit field. Since, myStruct contains 3 members of size 4, its size must therefore be at least 12.
Presumably the alignment requirement of INT32 is 4. So, even if the size of INT24 were 3, the size of myStruct would still have to be 12, because it must have at least the alignment requirement of INT32 and therefore the size of myStruct must be padded to the nearest multiple of 4.
any way or workaround ?
This is implementation specific, but the following hack may work for some compilers/cpu combinations. See the manual of your compiler for the syntax for similar feature, and the manual for your target cpu whether it supports unaligned memory access. Also do realize that unaligned memory access does have a performance penalty.
#pragma pack(push, 1)
struct INT24
{
INT32 data : 24;
};
#pragma pack(pop)
#pragma pack(push, 1)
struct myStruct
{
INT32 a;
INT32 b;
INT24 c;
};
#pragma pack(pop)
Packing a bit field might not work the same in all compilers. Be sure to check how yours behaves.
I think that a standard compliant way would be to store char arrays of sizes 3 and 4, and whenever you need to read or write one of the integer, you'd have to std::memcpy the value. That would be a bit burdensome to implement and possibly also slower than the #pragma pack hack.
Sadly for you, the compiler in optimising the code for a particular architecture reserves the right to pad out the structure by inserting spaces between members and even at the end of the structure.
Using a bit field does not reduce the size of the struct; you still get the whole of the "fielded" type in the struct.
The standard guarantees that the address of the first member of a struct is the same as the address of the struct, unless it's a polymorphic type.
But all is not lost: you can rely on the fact that an array of char will always be contiguous and contain no packing.
If CHAR_BIT is defined as 8 on your system (it probably is), you can model an array of 24 bit types on an array of char. If it's not 8 then even this approach will not work: I'd then suggest resorting to inline assembly.
There are many situations (especially in low-level programming), where the binary layout of the data is important. For example: hardware/driver manipulation, network protocols, etc.
In C++ I can read/write arbitrary binary structures using char* and bitwise operations (masks and shifts), but that's tedious and error-prone. Obviously, I try to limit the scope of these operations and encapsulate them in higher-level APIs, but it's still a pain.
C++ bitfields seem to offer a developer-friendly solution to this problem, but unfortunately their storage is implementation specific.
NathanOliver mentionned std::bitset which basically allows you to access individual bits of an integer with a nice operator[] but lacks accessors for multi-bit fields.
Using meta-programming and/or macros, it's possible to abstract the bitwise operations in a library. Since I don't want to reinvent the wheel, I'm looking for a (preferably STL or boost) library that does that.
For the record, I'm looking into this for a DNS resolver, but the problem and its solution should be generic.
Edit: short answer: it turns out bitfield's storage is reliable in practice (even if it's not mandated by the standard) since system/network libraries use them and yeild well behaved programs when compiled with mainstream compilers.
From the C++14 standard (N3797 draft), section 9.6 [class.bit], paragraph 1:
Allocation of bit-fields within a class object is implementation-defined.
Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit.
[ Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. — end note ]
Although notes are non-normative, every implementation I'm aware of uses one of two layouts: either big-endian or little endian bit order.
Note that:
You must specify padding manually. This implies that you must know the size of your types (e.g. by using <cstdint>).
You must use unsigned types.
The preprocessor macros for detecting the bit order are implementation-dependent.
Usually the bit order endianness is the same as the byte order endianness. I believe there is a compiler flag to override it, though, but I can't find it.
For examples, look in netinet/tcp.h and other nearby headers.
Edit by OP: for example tcp.h defines
struct
{
u_int16_t th_sport; /* source port */
u_int16_t th_dport; /* destination port */
tcp_seq th_seq; /* sequence number */
tcp_seq th_ack; /* acknowledgement number */
# if __BYTE_ORDER == __LITTLE_ENDIAN
u_int8_t th_x2:4; /* (unused) */
u_int8_t th_off:4; /* data offset */
# endif
# if __BYTE_ORDER == __BIG_ENDIAN
u_int8_t th_off:4; /* data offset */
u_int8_t th_x2:4; /* (unused) */
# endif
// ...
}
And since it works with mainstream compilers, it means bitset's memory layout is reliable in practice.
Edit:
This is portable within one endianness:
struct Foo {
uint16_t x: 10;
uint16_t y: 6;
};
But this may not be because it straddles a 16-bit unit:
struct Foo {
uint16_t x: 10;
uint16_t y: 12;
uint16_t z: 10;
};
And this may not be because it has implicit padding:
struct Foo {
uint16_t x: 10;
};
We have this in production code where we had to port MIPS code to x86-64
https://codereview.stackexchange.com/questions/54342/template-for-endianness-free-code-data-always-packed-as-big-endian
Works well for us.
It's basically a template without any storage, the template arguments specify the position of the relevant bits.
If you need multiple fields, you put multiple specializations of the template together in a union, together with an array of bytes to provide storage.
The template has overloads for assignment of value and a conversion operator to unsigned for reading the value.
In addition, if the fields are larger than a byte, they are stored in big-endian byte order, which is sometimes useful when implementing cross-platform protocols.
here's a usage example:
union header
{
unsigned char arr[2]; // space allocation, 2 bytes (16 bits)
BitFieldMember<0, 4> m1; // first 4 bits
BitFieldMember<4, 5> m2; // The following 5 bits
BitFieldMember<9, 6> m3; // The following 6 bits, total 16 bits
};
int main()
{
header a;
memset(a.arr, 0, sizeof(a.arr));
a.m1 = rand();
a.m3 = a.m1;
a.m2 = ~a.m1;
return 0;
}
It's simple to implement bit fields with known positions with C++:
template<typename T, int POS, int SIZE>
struct BitField {
T *data;
BitField(T *data) : data(data) {}
operator int() const {
return ((*data) >> POS) & ((1ULL << SIZE)-1);
}
BitField& operator=(int x) {
T mask( ((1ULL << SIZE)-1) << POS );
*data = (*data & ~mask) | ((x << POS) & mask);
return *this;
}
};
The above toy implementation allows for example to define a 12-bit field in a unsigned long long variable with
unsigned long long var;
BitField<unsigned long long, 7, 12> muxno(&var);
and the generated code to access the field value is just
0000000000000020 <_Z6getMuxv>:
20: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax ; Get &var
27: 48 8b 00 mov (%rax),%rax ; Get content
2a: 48 c1 e8 07 shr $0x7,%rax ; >> 7
2e: 25 ff 0f 00 00 and $0xfff,%eax ; keep 12 bits
33: c3 retq
Basically what you'd have to write by hand
I have written an implementation of bit fields in C++ as a library header file. An example I give in the documentation is that, instead of writing this:
struct A
{
union
{
struct
{
unsigned x : 5;
unsigned a0 : 2;
unsigned a1 : 2;
unsigned a2 : 2;
}
u;
struct
{
unsigned x : 5;
unsigned all_a : 6;
}
v;
};
};
// …
A x;
x.v.all_a = 0x3f;
x.u.a1 = 0;
you can write:
typedef Bitfield<Bitfield_traits_default<> > Bf;
struct A : private Bitfield_fmt
{
F<5> x;
F<2> a[3];
};
typedef Bitfield_w_fmt<Bf, A> Bwf;
// …
Bwf::Format::Define::T x;
BITF(Bwf, x, a) = 0x3f;
BITF(Bwf, x, a[1]) = 0;
There's an alternative interface, under which the last two lines of the above would change to:
#define BITF_U_X_BWF Bwf
#define BITF_U_X_BASE x
BITF(X, a) = 0x3f;
BITF(X, a[1]) = 0;
Using this implementation of bit fields, the traits template parameter gives the programmer a lot of flexibility. Memory is just processor memory by default, or it can be an abstraction, with the programmer providing functions to perform "memory" reads and writes. The abstracted memory is a sequence of elements of any unsigned integral type (chosen by the programmer). Fields can be laid out either from least-to-most or most-to-least significance. The layout of fields in memory can be the reverse of what they are in the format structure.
The implementation is located at: https://github.com/wkaras/C-plus-plus-library-bit-fields
(As you can see, I unfortunately was not able to fully avoid use of macros.)
I have created a library for that:
Portable Bitfields
It works similar to the solution provided by #CpusPuzzle.
Basic example:
enum class Id
{
f1, f2, f3
};
using namespace jungles;
using Register = Bitfields<
uint16_t,
Field{.id = Id::f1, .size = 3},
Field{.id = Id::f2, .size = 9},
Field{.id = Id::f3, .size = 4}>;
r.at<Id::f1>() = 0b101;
r.at<Id::f2>() = 0b001111100;
r.at<Id::f3>() = 0b0110;
ASSERT(r.extract<Id::f1>() == 0b1010000000000000);
ASSERT(r.extract<Id::f2>() == 0b0000011111000000);
ASSERT(r.extract<Id::f3>() == 0b0000000000000110);
ASSERT(r.serialize() == 0b1010011111000110);
Deserialization:
Register r{0b0101110001110110};
// XXXYYYYYYYYYZZZZ
ASSERT(r.at<Id::f1>() == 0b010);
ASSERT(r.at<Id::f2>() == 0b111000111);
ASSERT(r.at<Id::f3>() == 0b0110);
C is designed for low-level bit manipulation. It's easy enough to declare a buffer of unsigned chars, and set it to any bit pattern you want. Especially if your bit strings are very short so fit into one of the integral types.
One potential problem is byte endiannness. C can't "see" this at all, but just as integers have an endianness, so too do bytes, when serialised. Another is the very small number of machines that don't use octets for bytes. C guarantees a byte shall be at least an octet, but 32 and 9 are real-world implementations. In those circumstances, you have to take a decision whether to simply ignore the uper bits (in which case naive code should work), or treat them as part of the bitstream (in which case you've got to be careful to fold CHAR_BIT into your calculations). It's also hard to test the code as you unlikely to find it easy to get your hands on a CHAR+BIT 32 machine.
I've been trying to use 'thunking' so I can use member functions to legacy APIs which expects a C function. I'm trying to use a similar solution to this. This is my thunk structure so far:
struct Thunk
{
byte mov; // ↓
uint value; // mov esp, 'value' <-- replace the return address with 'this' (since this thunk was called with 'call', we can replace the 'pushed' return address with 'this')
byte call; // ↓
int offset; // call 'offset' <-- we want to return here for ESP alignment, so we use call instead of 'jmp'
byte sub; // ↓
byte esp; // ↓
byte num; // sub esp, 4 <-- pop the 'this' pointer from the stack
//perhaps I should use 'ret' here as well/instead?
} __attribute__((packed));
The following code is a test of mine which uses this thunk structure (but it does not yet work):
#include <iostream>
#include <sys/mman.h>
#include <cstdio>
typedef unsigned char byte;
typedef unsigned short ushort;
typedef unsigned int uint;
typedef unsigned long ulong;
#include "thunk.h"
template<typename Target, typename Source>
inline Target brute_cast(const Source s)
{
static_assert(sizeof(Source) == sizeof(Target));
union { Target t; Source s; } u;
u.s = s;
return u.t;
}
void Callback(void (*cb)(int, int))
{
std::cout << "Calling...\n";
cb(34, 71);
std::cout << "Called!\n";
}
struct Test
{
int m_x = 15;
void Hi(int x, int y)
{
printf("X: %d | Y: %d | M: %d\n", x, y, m_x);
}
};
int main(int argc, char * argv[])
{
std::cout << "Begin Execution...\n";
Test test;
Thunk * thunk = static_cast<Thunk*>(mmap(nullptr, sizeof(Thunk),
PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0));
thunk->mov = 0xBC; // mov esp
thunk->value = reinterpret_cast<uint>(&test);
thunk->call = 0xE8; // call
thunk->offset = brute_cast<uint>(&Test::Hi) - reinterpret_cast<uint>(thunk);
thunk->offset -= 10; // Adjust the relative call
thunk->sub = 0x83; // sub
thunk->esp = 0xEC; // esp
thunk->num = 0x04; // 'num'
// Call the function
Callback(reinterpret_cast<void (*)(int, int)>(thunk));
std::cout << "End execution\n";
}
If I use that code; I receive a segmentation fault within the Test::Hi function. The reason is obvious (once you analyze the stack in GDB) but I do not know how to fix this. The stack is not aligned properly.
The x argument contains garbage but the y argument contains the this pointer (see the Thunk code). That means the stack is misaligned by 8 bytes, but I still don't know why this is the case. Can anyone tell why this is happening? x and y should contain 34 and 71 respectively.
NOTE: I'm aware of the fact that this is does not work in all scenarios (such as MI and VC++ thiscall convention) but I want to see if I can get this work, since I would benefit from it a lot!
EDIT: Obviously I also know that I can use static functions, but I see this more as a challenge...
Suppose you have a standalone (non-member, or maybe static) cdecl function:
void Hi_cdecl(int x, int y)
{
printf("X: %d | Y: %d | M: %d\n", x, y, m_x);
}
Another function calls it this way:
push 71
push 36
push (return-address)
call (address-of-hi)
add esp, 8 (stack cleanup)
You want to replace this by the following:
push 71
push 36
push this
push (return-address)
call (address-of-hi)
add esp, 4 (cleanup of this from stack)
add esp, 8 (stack cleanup)
For this, you have to read the return-address from the stack, push this, and then, push the return-address. And for the cleanup, add 4 (not subtract) to esp.
Regarding the return address - since the thunk must do some cleanup after the callee returns, it must store the original return-address somewhere, and push the return-address of the cleanup part of the thunk. So, where to store the original return-address?
In a global variable - might be an acceptable hack (since you probably don't need your solution to be reentrant)
On the stack - requires moving the whole block of parameters (using a machine-language equivalent of memmove), whose length is pretty much unknown
Please also note that the resulting stack is not 16-byte-aligned; this can lead to crashes if the function uses certain types (those that require 8-byte and 16-byte alignment - the SSE ones, for example; also maybe double).
Can anybody please explain what's going on?
My MSVC 2008 project's structure member alignment setting is set to 16 bytes (/Zp16) alignment, however one of the following structures is being aligned by 16 bytes and another is aligned only by 8 bytes... WHY?!!!
struct HashData
{
void *pData;
const char* pName;
int crc;
bool bModified;
}; // sizeof (HashData) == 4 + 4 + 4 + 1 + padding = 16 bytes, ok
class StringHash
{
HashData data[1024];
int mask;
int size;
}; // sizeof(StringHash) == 1024 * 16 + 4 + 4 + 0 = 16392 bytes, why not 16400 bytes?
This may not look like a big deal, but it's a big problem for me, since I am forced to emulate the MSVC structures alignment in GCC and specifying the aligned(16) attribute makes the sizeof (StringHash) == 16400!
Please tell me, when and why MSVC overrides the /Zp16 setting, I absolutely can't fathom it...
I think you misunderstood the /Zp16 option.
MSDN says,
When you specify this option, each structure member after the first is
stored on either the size of the member type or n-byte boundaries
(where n is 1, 2, 4, 8, or 16), whichever is smaller.
Please read the "whichever is smaller". It doesn't say that the struct will be padded by 16. It rather defines the boundary of each member relative to each other, starting from the first member.
What you basically want is align (C++) attribute, which says
Use __declspec(align(#)) to precisely control the alignment of user-defined data
So try this:
_declspec(align(16)) struct StringHash
{
HashData data[1024];
int mask;
int size;
};
std::cout << sizeof(StringHash) << std::endl;
It should print what you expect.
Or you can use #pragma pack(16).
Consider using the pack pragma directive:
// Set packing to 16 byte alignment
#pragma pack(16)
struct HashData
{
void *pData;
const char* pName;
int crc;
bool bModified;
};
class StringHash
{
HashData data[1024];
int mask;
int size;
};
// Restore default packing
#pragma pack()
See: pack and Working with Packing Structures
I am trying to perform a less-than-32bit read over the PCI bus to a VME-bridge chip (Tundra Universe II), which will then go onto the VME bus and picked up by the target.
The target VME application only accepts D32 (a data width read of 32bits) and will ignore anything else.
If I use bit field structure mapped over a VME window (nmap'd into main memory) I CAN read bit fields >24 bits, but anything less fails. ie :-
struct works {
unsigned int a:24;
};
struct fails {
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
};
struct main {
works work;
fails fail;
}
volatile *reg = function_that_creates_and_maps_the_vme_windows_returns_address()
This shows that the struct works is read as a 32bit, but a read via fails struct of a for eg reg->fail.a is getting factored down to a X bit read. (where X might be 16 or 8?)
So the questions are :
a) Where is this scaled down? Compiler? OS? or the Tundra chip?
b) What is the actual size of the read operation performed?
I basiclly want to rule out everything but the chip. Documentation on that is on the web, but if it can be proved that the data width requested over the PCI bus is 32bits then the problem can be blamed on the Tundra chip!
edit:-
Concrete example, code was:-
struct SVersion
{
unsigned title : 8;
unsigned pecversion : 8;
unsigned majorversion : 8;
unsigned minorversion : 8;
} Version;
So now I have changed it to this :-
union UPECVersion
{
struct SVersion
{
unsigned title : 8;
unsigned pecversion : 8;
unsigned majorversion : 8;
unsigned minorversion : 8;
} Version;
unsigned int dummy;
};
And the base main struct :-
typedef struct SEPUMap
{
...
...
UPECVersion PECVersion;
};
So I still have to change all my baseline code
// perform dummy 32bit read
pEpuMap->PECVersion.dummy;
// get the bits out
x = pEpuMap->PECVersion.Version.minorversion;
And how do I know if the second read wont actually do a real read again, as my original code did? (Instead of using the already read bits via the union!)
Your compiler is adjusting the size of your struct to a multiple of its memory alignment setting. Almost all modern compilers do this. On some processors, variables and instructions have to begin on memory addresses that are multiples of some memory alignment value (often 32-bits or 64-bits, but the alignment depends on the processor architecture). Most modern processors don't require memory alignment anymore - but almost all of them see substantial performance benefit from it. So the compilers align your data for you for the performance boost.
However, in many cases (such as yours) this isn't the behavior you want. The size of your structure, for various reasons, can turn out to be extremely important. In those cases, there are various ways around the problem.
One option is to force the compiler to use different alignment settings. The options for doing this vary from compiler to compiler, so you'll have to check your documentation. It's usually a #pragma of some sort. On some compilers (the Microsoft compilers, for instance) it's possible to change the memory alignment for only a very small section of code. For example (in VC++):
#pragma pack(push) // save the current alignment
#pragma pack(1) // set the alignment to one byte
// Define variables that are alignment sensitive
#pragma pack(pop) // restore the alignment
Another option is to define your variables in other ways. Intrinsic types are not resized based on alignment, so instead of your 24-bit bitfield, another approach is to define your variable as an array of bytes.
Finally, you can just let the compilers make the structs whatever size they want and manually record the size that you need to read/write. As long as you're not concatenating structures together, this should work fine. Remember, however, that the compiler is giving you padded structs under the hood, so if you make a larger struct that includes, say, a works and a fails struct, there will be padded bits in between them that could cause you problems.
On most compilers, it's going to be darn near impossible to create a data type smaller than 8 bits. Most architectures just don't think that way. This shouldn't be a huge problem because most hardware devices that use datatypes of smaller than 8-bits end up arranging their packets in such a way that they still come in 8-bit multiples, so you can do the bit manipulations to extract or encode the values on the data stream as it leaves or comes in.
For all of the reasons listed above, a lot of code that works with hardware devices like this work with raw byte arrays and just encode the data within the arrays. Despite losing a lot of the conveniences of modern language constructs, it ends up just being easier.
I am wondering about the value of sizeof(struct fails). Is it 1? In this case, if you perform the read by dereferencing a pointer to a struct fails, it looks correct to issue a D8 read on the VME bus.
You can try to add a field unsigned int unused:29; to your struct fails.
The size of a struct is not equal to the sum of the size of its fields, including bit fields. Compilers are allowed, by the C and C++ language specifications, to insert padding between fields in a struct. Padding is often inserted for alignment purposes.
The common method in embedded systems programming is to read the data as an unsigned integer then use bit masking to retrieve the interesting bits. This is due to the above rule that I stated and the fact that there is no standard compiler parameter for "packing" fields in a structure.
I suggest creating an object ( class or struct) for interfacing with the hardware. Let the object read the data, then extract the bits as bool members. This puts the implementation as close to the hardware. The remaining software should not care how the bits are implemented.
When defining bit field positions / named constants, I suggest this format:
#define VALUE (1 << BIT POSITION)
// OR
const unsigned int VALUE = 1 << BIT POSITION;
This format is more readable and has the compiler perform the arithmetic. The calculation takes place during compilation and has no impact during run-time.
As an example, the Linux kernel has inline functions that explicitly handle memory-mapped IO reads and writes. In newer kernels it's a big macro wrapper that boils down to an inline assembly movl instruction, but it older kernels it was defined like this:
#define readl(addr) (*(volatile unsigned int *) (addr))
#define writel(b,addr) ((*(volatile unsigned int *) (addr)) = (b))
Ian - if you want to be sure as to the size of things you're reading/writing I'd suggest not using structs like this to do it - it's possible the sizeof of the fails struct is just 1 byte - the compiler is free to decide what it should be based on optimizations etc- I'd suggest reading/writing explicitly using int's or generally the things you need to assure the sizes of and then doing something else like converting to a union/struct where you don't have those limitations.
It is the compiler that decides what size read to issue. To force a 32 bit read, you could use a union:
union dev_word {
struct dev_reg {
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
} fail;
uint32_t dummy;
};
volatile union dev_word *vme_map_window();
If reading the union through a volatile-qualified pointer isn't enough to force a read of the whole union (I would think it would be - but that could be compiler-dependent), then you could use a function to provide the required indirection:
volatile union dev_word *real_reg; /* Initialised with vme_map_window() */
union dev_word * const *reg_func(void)
{
static union dev_word local_copy;
static union dev_word * const static_ptr = &local_copy;
local_copy = *real_reg;
return &static_ptr;
}
#define reg (*reg_func())
...then (for compatibility with the existing code) your accesses are done as:
reg->fail.a
The method described earlier of using the gcc flag -fstrict-volatile-bitfields and defining bitfield variables as volatile u32 works, but the total number of bits defined must be greater than 16.
For example:
typedef union{
vu32 Word;
struct{
vu32 LATENCY :3;
vu32 HLFCYA :1;
vu32 PRFTBE :1;
vu32 PRFTBS :1;
};
}tFlashACR;
.
tFLASH* const pFLASH = (tFLASH*)FLASH_BASE;
#define FLASH_LATENCY pFLASH->ACR.LATENCY
.
FLASH_LATENCY = Latency;
causes gcc to generate code
.
ldrb r1, [r3, #0]
.
which is a byte read. However, changing the typedef to
typedef union{
vu32 Word;
struct{
vu32 LATENCY :3;
vu32 HLFCYA :1;
vu32 PRFTBE :1;
vu32 PRFTBS :1;
vu32 :2;
vu32 DUMMY1 :8;
vu32 DUMMY2 :8;
};
}tFlashACR;
changes the resultant code to
.
ldr r3, [r2, #0]
.
I believe the only solution is to
1) edit/create my main struct as all 32bit ints (unsigned longs)
2) keep my original bit-field structs
3) each access I require,
3.1) I have to read the struct member as a 32bit word, and cast it into the bit-field struct,
3.2) read the bit-field element I require. (and for writes, set this bit-field, and write the word back!)
(1) Which is a same, because then I lose the intrinsic types that each member of the "main/SEPUMap" struct are.
End solution :-
Instead of :-
printf("FirmwareVersionMinor: 0x%x\n", pEpuMap->PECVersion);
This :-
SPECVersion ver = *(SPECVersion*)&pEpuMap->PECVersion;
printf("FirmwareVersionMinor: 0x%x\n", ver.minorversion);
Only problem I have is writting! (Writes are now Read/Modify/Writes!)
// Read - Get current
_HVPSUControl temp = *(_HVPSUControl*)&pEpuMap->HVPSUControl;
// Modify - set to new value
temp.OperationalRequestPort = true;
// Write
volatile unsigned int *addr = reinterpret_cast<volatile unsigned int*>(&pEpuMap->HVPSUControl);
*addr = *reinterpret_cast<volatile unsigned int*>(&temp);
Just have to tidy that code up into a method!
#define writel(addr, data) ( *(volatile unsigned long*)(&addr) = (*(volatile unsigned long*)(&data)) )
I had same problem on ARM using GCC compiler, where write into memory is only through bytes rather than 32bit word.
The solution is to define bit-fields using volatile uint32_t (or required size to write):
union {
volatile uint32_t XY;
struct {
volatile uint32_t XY_A : 4;
volatile uint32_t XY_B : 12;
};
};
but while compiling you need add to gcc or g++ this parameter:
-fstrict-volatile-bitfields
more in gcc documentation.