Is it possible to get access to an individual member of a struct or class without knowing the names of its member variables?
I would like to do an "offsetof(struct, tyname)" without having the struct name or member variable name hard coded amoungst other things.
thanks.
Sure. If you have a struct and you know the offset and the type of the member variable, you can access it using pointers.
struct my_struct {
int member1;
char member2;
short member3;
char member4;
}
...
struct my_struct obj;
short member3 = *((short*)((char*)&obj + 5));
That'll get the value of member3, which is 5 bytes on from the start of obj on an x86 computer. However, you want to be careful. First of all, if the struct changes, your data will be garbage. We're casting all over the place, so you get no type safety, and the compiler won't warn you if something's awry. You'll also need to make sure the compiler's not packing the struct to align variables to word boundaries, or the offset will change.
This isn't a pleasant thing to do, and I'd avoid it if I were you, but yes, it can be done.
C and C++ are compiled languages without built-in "reflection" features. This means that regardless of what you do and how you do it, one way or another the path will always start from an explicit hard-coded value, be that a member name or an compile-time offset value. That means that if you want to select a struct member based on some run-time key, you have no other choice but to manually create a mapping of some kind that would map the key value to something that identifies a concrete struct member.
In C++ in order to identify a struct member at run-time you can use such feature as pointers-to-members. In C your only choice is to use an offset value.
Another issue is, of course, specifying the type of the members, if your members can have different types. But you provided no details about that, so I can't say whether you need to deal with it or not.
We had a similar problem some years ago: A huge struct of configuration information that we wanted to reflect on. So we wrote a Perl script to find the struct, parse its members, and output a C++ file that looked like:
struct ConfField
{ const char* name;
int type;
size_t offset;
};
ConfField confFields[] = {
{ "version", eUInt32, 0 },
{ "seqID", eUInt32, 4 },
{ "timestamp", eUInt64, 8 },
// ... lots more ...
{ 0, 0, 0 }
};
And we'd feed the script with the output from gcc -E.
Nowadays, I understand that gccxml can output an XML file representing any C++ source that gcc can compile, since it actually uses the g++ front end to do the parsing. So I'd recommend pairing it with an XML-parsing script (I'd use Python with the lxml library) to find out everything you ever wanted to know about your C++ source.
Somewhere in your code you need to reference the data member in the struct. However you can create a variable that is a pointer to a struct data member and from then on you no longer need to reference it by name.
struct foo
{
int member1;
int member2;
};
typedef int (foo::*intMemberOfFoo);
intMemberOfFoo getMember()
{
if (rand() > RAND_MAX / 2) return &foo::member1;
else return &foo::member2;
}
foo f;
void do_somthing()
{
intMemberOfFoo m = getMember();
f.*m = 0;
}
The technical answer is 'yes' because C++ is Turing-complete and you can do almost anything if you try hard enough. The more practical answer is probably 'no' since there is no safe and easy way of doing exactly what you want.
I agree with GMan. What exactly are you trying to do that makes you think you need this technique?
Well you will have to set up some stuff first, but it can be done. Expanding on Samir's response
struct my_struct {
int member1;
char member2;
short member3;
char member4;
}
you can create a table of offsets:
my_struct tmp;
int my_struct_offsets[4]={
0,
(char*)&(tmp.member2)-(char*)&(tmp.member1),
(char*)&(tmp.member3)-(char*)&(tmp.member1),
(char*)&(tmp.member4)-(char*)&(tmp.member1)
}
this will take into account different alignments on different systems
Related
I am not sure if this is possible at all in standard C++, so whether it even is possible to do, could be a secondary way to put my question.
I have this binary data which I want to read and re-create using structs. This data is originally created as a stream with the content appended to a buffer, field by field at a time; nothing special about that. I could simply read it as a stream, the same way it was written. Instead, I merely wanted to see if letting the compiler do the math for me, was possible, and instead implementing the binary data as a data structure instead.
The fields of the binary data have a predictable order which allows it to be represented as a data type, the issue I am having is with the depth and variable length of repeating fields. I am hoping the example code below makes it clearer.
Simple Example
struct Common {
int length;
};
struct Boo {
long member0;
char member1;
};
struct FooSimple : Common {
int count;
Boo boo_list[];
};
char buffer[1024];
int index = 15;
((FooSimple *)buffer)->boo_list[index].member0;
Advanced Example
struct Common {
int length;
};
struct Boo {
long member0;
char member1;
};
struct Goo {
int count;
Boo boo_list[];
};
struct FooAdvanced : Common {
int count;
Goo goo_list[];
};
char buffer[1024];
int index0 = 5, index1 = 15;
((FooAdvanced *)buffer)->goo_list[index0].boo_list[index1].member0;
The examples are not supposed to relate. I re-used some code due to lack of creativity for unique names.
For the simple example, there is nothing unusual about it. The Boo struct is of fixed size, therefore the compiler can do the calculations just fine, to reach the member0 field.
For the advanced example, as far as I can tell at least, it isn't as trivial of a case. The problem that I see, is that if I use the array selector operator to select a Goo object from the inline array of Goo-elements (goo_list), the compiler will not be able to do the offset calculations properly unless it makes some assumptions; possibly assuming that all preceding Goo-elements in the array have zero Boo-elements in the inline array (boo_list), or some other constant value. Naturally, that won't be the case.
Question(s):
What ways are there to achieve the offset computations to be done by the compiler, despite the inline arrays having variable lengths? Unless I am missing something, I believe templates can't help at all, due to their compile-time nature.
Is this even possible to achieve in C++?
How do you handle the case with instantiating a FoodAdvanced object, by feeding a variable number of Goo and Boo element counts to the goo_list and boo_list members, respectively?
If it is impossible, would I have to write some sort of wrapper code to handle the calculations instead?
Is it possible to write a preprocessor macro that automatically iterates for all members of a structure?
I have such a structure (automatically generated from a Simulink model):
typedef struct {
real_T driveStatusword;
real_T posSensor[2];
real_T softAbortDemand;
} ExtU_motionCtrlRTOS_T;
And a similar one:
struct CoreInputOffsets
{
uint32_t driveStatusword;
uint32_t posSensor;
uint32_t softAbortDemand;
};
And I would like to do such an operation:
void getCoreInputOffsets(CoreInputOffsets* pCoreInputOffsets)
{
pCoreInputOffsets->driveStatusword = offsetof(ExtU_motionCtrlRTOS_T, driveStatusword);
pCoreInputOffsets->posSensor = offsetof(ExtU_motionCtrlRTOS_T, posSensor);
pCoreInputOffsets->softAbortDemand = offsetof(ExtU_motionCtrlRTOS_T, softAbortDemand);
}
But without having to edit this function each time the structure changes, by iterating for all members of CoreInputOffsets.
from c++14, yes we do have compile time reflection of (almost all) aggregate types, see Antony Polukhin's magic get library (and this cppcon presentation to see how it works). I think you can also make it work in c++11 with some ABI support.
For example, to assign to an ExtU_motionCtrlRTOS_T x; you'd simply write
boost::pfr::flat_structure_tie(x) = boost::pfr::flat_structure_tie(some_unrelated_pod);
where I assumed that members are assigned in order. Note that I used the flat tie version, to assign nested arrays element-wise.
Now, in light of the above, it would be wiser to avoid relying on offsetof() as you're doing now and just exploit all compile time info for related operations (this will also probably give you faster code).
Anyway, if you still want to get offsets, a verbatim transcription of your code may look like:
#include <boost/pfr/flat/core.hpp>
struct CoreInputOffsets
{
uint32_t driveStatusword;
uint32_t posSensor[2];
uint32_t softAbortDemand;
};
template <typename T,std::size_t... Is>
void assignOffsets( CoreInputOffsets& offsets, std::index_sequence<Is...> )
{
T t;
(( boost::pfr::flat_get<Is>(offsets) = reinterpret_cast<char*>(&boost::pfr::flat_get<Is>(t)) - reinterpret_cast<char*>(&boost::pfr::flat_get<0>(t)) ), ...);
}
template <typename T>
void assignOffsets( CoreInputOffsets& offsets )
{
assignOffsets<T>( offsets, std::make_index_sequence< boost::pfr::flat_tuple_size<T>::value >{} );
}
void getCoreInputOffsets(CoreInputOffsets* pCoreInputOffsets)
{
assignOffsets<ExtU_motionCtrlRTOS_T>( *pCoreInputOffsets );
}
with the caveats:
this is c++17 (you can make it c++14 compliant though)
the code taking the actual offsets needs a dummy ExtU_motionCtrlRTOS_T; this is not a big deal given that you'll assign it just once, I suppose
the code taking the actual offsets via pointer substraction gives undefined behavior standard-wise, you'll need to verify it's legal for your platform
CoreInputOffsets::posSensor shall be an array and will get two offsets now
There are no "automatic" means, no.
And unfortunately your structures are automatically generated. If they were under your full control, I'd recommend the REFLECTABLE macro like it is described here.
Please read that answer, maybe you can restructure your code and/or work flow to make it work?
Basic problem
I'm in a tricky situation that requires taking a pointer to a struct mainset and turning this into a pointer to a struct subset, whose fields are a contiguous subset of the fields of mainset, starting from the first. Is such a thing possible, with well-defined behavior? I realize that this is a pretty terrible thing to do, but I have good and frustrating reasons to do it [explained at the bottom for patient readers].
My attempt an an implementation seems to work, on OS X with the clang compiler:
#include <iostream>
struct mainset {
size_t size;
uint32_t reflex_size;
};
struct subset {
size_t size;
};
using namespace std;
int main(int argc, char *argv[]) {
mainset test = {1, 1};
subset* stest = reinterpret_cast<subset*>(&test);
std::cout << stest->size << std::endl;
}
The output is indeed 1, as I expect. However, I wonder: am I just getting lucky with a particular compiler and a simple case (in reality my structs are more complicated), or will this work in general?
Also, a follow-up question: for other annoying reasons, I worry that I might need to make my larger struct
struct mainset {
uint32_t reflex_size;
size_t size;
};
instead, with the extra field coming at the front. Could my implementation be extended to work in this case? I tried replacing &test with &test+sizeof(test.reflex_size) but this didn't work; the output of the cout statement was 0.
Explanation of why I have to do this
My project uses the GSL library for linear algebra. This library makes use of structs of the form
struct gsl_block {
size_t size;
double* data;
}
and similar structs like gsl_vector and gsl_matrix. So, I've used these structs as members of my C++ classes; no problem. A recently demanded feature for my project, however, is to enable reflection to my classes with the Reflex tool, part of the ROOT ecosystem. To enable reflection for a struct like this in Reflex, I must add an annotation like
struct gsl_block {
size_t size;
double* data; //[size]
}
This annotation tells Reflex that that the length of the array is provided by the field size of the same struct. Normally that would be that, but Reflex and ROOT have a very unfortunate limitation: the length field must be 32 bit. Having been informed that this limitation won't be fixed anytime soon, and not having the time/resources to fix it myself, I'm looking for workarounds. My idea is to somehow embed a struct bit-compatible with gsl_block within a larger struct:
struct extended_gsl_block {
size_t size;
double* data; //[reflex_size]
uint32_t reflex_size;
}
and similar things for gsl_vector and gsl_matrix; I can ensure that reflex_size and size are always equal (neither is ever bigger than ~50) and Reflex will be able to parse this header correctly (I hope; if reflex_size is required to precede data as a field something more difficult would be required). Since GSL routines work with pointers to these structs, my idea is this: given a pointer extended_gsl_block*, somehow get a pointer to just the fields size and data and reinterpret_cast this into a gsl_block*.
You are in luck.
The classes you show as an example conform to the requirements of standard layout types.
You can read more here:
http://en.cppreference.com/w/cpp/language/data_members#Standard_layout
You can test this premise in the compiler with:
static_assert(std::is_standard_layout<gsl_block>::value, "not a standard layout");
I have the following declaration in a file that gets generated by a perl script ( during compilation ):
struct _gamedata
{
short res_count;
struct
{
void * resptr;
short id;
short type;
} res_table[3];
}
_gamecoderes =
{
3,
{
{ &char_resource_ID_RES_welcome_object_ID,1002, 1001 },
{ &blah_resource_ID_RES_another_object_ID,1004, 1003 },
{ &char_resource_ID_RES_someting_object_ID,8019, 1001 },
}
};
My problem is that struct _gamedata is generated during compile time and the number of items in res_table will vary. So I can't provide a type declaring the size of res_table in advance.
I need to parse an instance of this structure, originally I was doing this via a pointer to a char ( and not defining struct _gamedata as a type. But I am defining res_table.
e.g.
char * pb = (char *)_gamecoderes;
// i.e. pb points to the instance of `struct _gamedata`.
short res_count = (short *)pb;
pb+=2;
res_table * entry = (res_table *)pb;
for( int i = 0; i < res_count; i++ )
{
do_something_with_entry(*entry);
}
I'm getting wierd results with this. I'm not sure how to declare a type _struct gamedata as I need to be able to handle a variable length for res_table at compile time.
Since the struct is anonymous, there's no way to refer to the type of this struct. (res_table is just the member name, not the type's name). You should provide a name for the struct:
struct GameResult {
short type;
short id;
void* resptr;
};
struct _gamedata {
short res_count;
GameResult res_table[3];
};
Also, you shouldn't cast the data to a char*. The res_count and entry's can be extracted using the -> operator. This way the member offsets can be computed correctly.
_gamedata* data = ...;
short res_count = data->res_count;
GameResult* entry = data->res_table;
or simply:
_gamedata* data;
for (int i = 0; i < data->res_count; ++ i)
do_something_with_entry(data->res_table[i]);
Your problem is alignment. There will be at least two bytes of padding in between res_count and res_table, so you cannot simply add two to pb. The correct way to get a pointer to res_table is:
res_table *table = &data->res_table;
If you insist on casting to char* and back, you must use offsetof:
#include <stddef.h>
...
res_table *table = (res_table *) (pb + offsetof(_gamedata, res_table));
Note: in C++ you may not use offsetof with "non-POD" data types (approximately "types you could not have declared in plain C"). The correct idiom -- without casting to char* and back -- works either way.
Ideally use memcpy(3), at least use type _gamedata, or define a protocol
We can consider two use cases. In what I might call the programmer-API type, serialization is an internal convenience and the record format is determined by the compiler and library. In the more formally defined and bulletproof implementation, a protocol is defined and a special-purpose library is written to portably read and write a stream.
The best practice will differ depending on whether it makes sense to create a versioned protocol and develop stream I/O operations.
API
The best and most completely portable implementation when reading from compiler-oject serialized streams would be to declare or dynamically allocate an exact or max-sized _gamedata and then use memcpy(3) to pull the data out of the serial stream or device memory or whatever it is. This lets the compiler allocate the object that is accessed by compiler code and it lets the developer allocate the object that is accessed by developer (i.e., char *) logic.
But at a minimum, set a pointer to _gamedata and the compiler will do everything for you. Note also that res_table[n] will always be at the "right" address regardless of the size of the res_table[] array. It's not like making it bigger changes the location of the first element.
General serialization best practice
If the _gamedata object itself is in a buffer and potentially misaligned, i,e., if it is anything other than an object allocated for a _gamedata type by the compiler or dynamically by a real allocator, then you still have potential alignment issues and the only correct solution is to memcpy(3) each discrete type out of the buffer.
A typical error is to use the misaligned pointer anyway, because it works (slowly) on x86. But it may not work on mobile devices, or future architectures, or on some architectures when in kernel mode, or with advanced optimizations enabled. It's best to stick with real C99.
It's a protocol
Finally, when serializing binary data in any fashion you are really defining a protocol. So, for maximum robustness, don't let the compiler define your protocol. Since you are in C, you can generally handle each fundamental object discretely with no loss in speed. If both the writer and reader do it, then only the developers have to agree on the protocol, not the developers and the compilers and the build team, and the C99 authors, and Dennis M. Ritchie, and probably some others.
As #Zack points out, there is padding between elements of your structure.
I'm assuming you have a char* because you've serialized the structure (in a cache, on disk, or over the network). Just because you are starting with a char * doesn't mean you have to access the entire struct the hard way. Cast it to a typed pointer, and let the compiler do the work for you:
_gamedata * data = (_gamedata *) my_char_pointer;
for( int i = 0; i < data->res_count; i++ )
{
do_something_with_entry(*data->res_table[i]);
}
Let's say that I want to get the size in bytes or in chars for the name field from:
struct record
{
int id;
TCHAR name [50];
};
sizeof(record.name) does not work.
The solution for this is not so pretty as you may think:
size_in_byte = sizeof(((struct record *) 0)->name)
size_in_chars = _countof(((struct record *) 0)->name)
If you want to use the second one on other platforms than Windows try:
#define _countof(array) (sizeof(array)/sizeof(array[0]))
If you create an instance first, it will work.
record r;
sizeof(r.name);
In C++:
#include <iostream>
using namespace std;;
struct record
{
int id;
char name [50];
};
int main() {
cout << sizeof( record::name) << endl;
}
Edit: A couple of people have pointed out that this is C++0x code, so I guess I must retract my unkind comment regarding VC++. This is not a programming construct I have ever used in my own C++ code, but I have to wonder why sizeof would not work this way in C++03? You hand it a name and it gives you the size. I'd have thought it would take some effort for it not to work. But such is the wonder of the C++ Standard :-)
record is the name of a type, but record.name is not. You somehow have to access name through an instance of the struct. Sorin's answer is the usual C solution:
sizeof ((struct record*)0)->name;
This creates a pseudo-pointer to an instance (or pointer to a pseudo-instance) of struct record, then access the name member, and pass that expression to sizeof. It works because sizeof doesn't attempt to evaluate the pointer expression, it just uses it to compute the size.
You might wanna read this, as it discusses the very same issue and provides all the options mentioned in this thread, and a little more.
struct record
{
static const int kMaxNameChars=50;
int id;
TCHAR name [kMaxNameChars];
};
sizeof(TCHAR)*record::kMaxNameChars //"sizeof(record.name)"
//record::kMaxNameChars sufficient for many purposes.
Portable, perfectly safe and IMO being explicit about raw array length is good practice.
(edit: you might have to macro it in C, if the compiler gets upset about variable array lengths. if you do, consider defining a static const int to the value of the macro anyway!)