Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have an file handler, that reads and writes custom binary files. To save myself work I've inherited multiple data from one base struct (base_p):
struct base_p{
protected:
uint32_t v0= 0;
uint32_t v1= 0;
uint32_t v2= 0;
friend struct f_block;
// other friends
};
Now during building of my internal blocks of that file, I've inherited that struct into multiple non POD type:
struct f_block:public base_p{
f_block(){
this->v0 = 0x3f580058; // an magic number of the block
this->v1 = 0x1000004D; // version and use type
this->v2 = 0;
}
f_block(uint32_t data){
this->v0 = 0x3f580058; // an magic number of the block
this->v1 = 0x1000004D; // version and usage type
this->v2 = data;
}
operator uint8_t(){ return this->v1 & 0xff; }
operator bool(){return this->v2 != 0 ;}
friend std::ostream& operator<<( std::ostream& os, f_block& fb );
};
std::ostream& operator<<( std::ostream& os, f_block& fb ){
union {
uint32_t vrt;
unsigned char parts[4];
}temp;
temp. vrt = fb. v2;
return os << temp. parts[3] << temp. parts[2] << temp. parts[1] << temp. parts[0];
}
Inside my file handler, I have use for both structures. For example if I need to pass data somewhere, I need to extract data as base_p. But my block definition has an extra feature that usage serves and compression key and within data (v2) I have also stored some information such as bit offset from the end, length of featured blocks . . . and many others. So extracting function would look something along of lines:
struct file_handler{
std::vector<base_p> data;
file_handler():data(0){}
virtual bool read(const char *filename) = 0; // to be overridden
virtual bool write(const char *filename)= 0; // to be overridden
virtual bool set( base_p &data){
// if data is base_p , push_back data as normal POD
// if data is f_block, push_back data as deflated version
// don't store anything in vector if none
}
virtual bool get( base_p &data){
// if data is base_p, returns lasts 3 elements v2(data) fields
// if data is f_block, returns last element as f_block - inflated,
// set data as 0 if none
}
}
I've tried to catch an error of calling a function that doesn't exist in base_p, but it doesn't suit me since I am building multiple file handlers upon file_handler struct that should accept other data types. I am secure enough to start building other file types once I can successfully implement file_handler. What I need is something along of data type switch statement.
Since I come from python background, in it I could do something along of lines isinstance or something similar. But this is C++ so there isn't such implementation - that I am aware of.
I have searched and I've found some elements that seem to have potential to solve my problem, but most of them are for outdated versions or too abstract to wrap my head around to generate an logic solution.
SFINAE and void : mentions some sort of concept that SFINAEs follow, and compound concepts which is too abstract in question for me to successfully make valid implementation.
HasMember SFINAE : which seems feasible for constructor recognition, but best answer is written in c++03 with no mention if it translates to c++11 aka version I am currently using.
Is there a way to distinguish between PODs and non-PODs ?
I come from python background
C++ is not Python.
If all you have is a void* and no idea where it came from or what object it points to, then there is nothing you can do. C++ is a statically-typed language. This means that objects do not automatically store within themselves anything which identifies them as being of a particular type. All of that typing information is done at compile-time based on the types the compiler sees in the code.
The type present here is void, which is the "not a type" type. So any typing information has been lost. And it cannot be recovered by looking at some bytes behind a void*.
As far as the contents of the first byte behind that void* is concerned, there is no difference between a pointer to subs and a pointer to a my_data.
It sounds like you want to be able to take an arbitrary block of bytes, and determine whether those bytes constitute the underlying representation of an instance of the type my_data.
You can't. No such thing is meaningful. Those bytes do not have a type.
You will have to manually deserialise the bytes, using rules that you devise, picking out whether the values that lie therein (when so interpreted) match the preconditions for values in your my_data objects.
Usually we'd manage the data ourselves with some "header", indicating "this is an instance of a my_type" so that you at least know with some probability that it is so (don't forget this header may still be found in arbitrary data by pure chance). But you need to build that logic into the serialisation stage (on the way out) as well.
Related
I've been trying to learn a bit about reverse engineering and how to essentially wrap an existing class (that we do not have the source for, we'll call it PrivateClass) with our own class (we'll call it WrapperClass).
Right now I'm basically calling the constructor of PrivateClass while feeding a pointer to WrapperClass as the this argument...
Doing this populates m_string_address, m_somevalue1, m_somevalue2, and missingBytes with the PrivateClass object data. The dilemma now is that I am noticing issues with the original program (first a crash that was resolved by adding m_u1 and m_u2) and then text not rendering that was fixed by adding mData[2900].
I'm able to deduce that m_u1 and m_u2 hold the size of the string in m_string_address, but I wasn't expecting there to be any other member variables after them (which is why I was surprised with mData[2900] resolving the text rendering problem). 2900 is also just a random large value I threw in.
So my question is how can we determine the real size of a class that we do not have the source for? Is there a tool that will tell you what variables exist in a class and their order (or atleast the correct datatypes or datatype sizes of each variable). I'm assuming this might be possible by processing assembly in an address range into a semi-decompiled state.
class WrapperClass
{
public:
WrapperClass(const wchar_t* original);
private:
uintptr_t m_string_address;
int m_somevalue1;
int m_somevalue2;
char missingBytes[2900];
};
WrapperClass::WrapperClass(const wchar_t* original)
{
typedef void(__thiscall* PrivateClassCtor)(void* pThis, const wchar_t* original);
PrivateClassCtor PrivateClassCtorFunc = PrivateClassCtor(DLLBase + 0x1c00);
PrivateClassCtorFunc(this, original);
}
So my question is how can we determine the real size of a class that
we do not have the source for?
You have to guess or logically deduce it for yourself. Or just guess. If guessing doesn't work out for you, you'll have to guess again.
Is there a tool that will tell you what variables exist in a class and
their order (or atleast the correct datatypes or datatype sizes of
each variable) I'm assuming by decompiling and processing assembly in
an address range.
No, there is not. The type of meta information that describes a class, it's members, etc. simply isn't written out as the program does not need it nor are there currently no facilities defined in the C++ Standard that would require a compiler to generate that information.
There are exactly zero guarantees that you can reliably 'guess' the size of a class. You can however probably make a reasonable estimate in most cases.
The one thing you can be sure of though: the only problem is when you have too little memory for a class instance. Having too much memory isn't really a problem at all (Which is what adding 2900 extra bytes works).
On the assumption that the code was originally well written (e.g. the developer decided to initialise all the variables nicely), then you may be able to guess the size using something like this:
#define MAGIC 0xCD
// allocate a big buffer
char temp_buffer[8092];
memset(temp_buffer, MAGIC, 8092);
// call ctor
PrivateClassCtor PrivateClassCtorFunc = PrivateClassCtor(DLLBase + 0x1c00);
PrivateClassCtorFunc(this, original);
// step backwards until we find a byte that isn't 0xCD.
// Might want to change the magic value and run again
// just to be sure (e.g. the original ctor sets the last
// few bytes of the class to 0xCD by coincidence.
//
// Obviously fails if the developer never initialises member vars though!
for(int i = 8091; i >= 0; --i) {
if(temp_buffer[i] != MAGIC) {
printf("class size might be: %d\n", i + 1);
break;
}
}
That's probably a decent guess, however the only way to be 100% sure would be to stick a breakpoint where you call the ctor, switch to assembly view in your debugger of choice, and then step through the assembly line by line to see what the max address being written to is.
I have assignment which asks one to write a function for any data type.The function is supposed to print the bytes of the structure and identify the total number of bytes the data structure uses along with differentiating between bytes used for members and bytes used for padding.
My immediate reaction, along with most of the classes reaction was to use templates. This allows you to write the function once and gather the run time type of the objects passed into the function. Using memset and typeid's one can easily accomplish what has been asked. However, our prof. just saw our discussion about templates and damned templates to hell.
After seeing this I was thrown for a loop and I'm looking for a little guidance as the best way to get around this. Some things I've looked into:
void pointers with explicit casting (this seems like it'd get messy)
base class with virtual functions only from which all data structures inherit from, seems a bit odd to do.
a base class with 'friendships' to each of our data structures.
rewriting a function for each data structure in our problem set (what I imagine is the worst possible solution).
Was hoping I overlooked a common c++ tool, does anyone have any ideas?
Treat the function as stupid as possible, in fact, treat it as if it doesn't know anything and all information must be passed to it.
Parameters to the function:
Structure address, as a uint8_t *. (Needed to print the bytes)
Structure size, in bytes. (Needed to print the bytes and to print the
total size)
A vector of member information: member length OR the sum of the bytes used by the members.
The vector is needed to fulfill the requirement of printing the bytes used by the members and the bytes used by padding. Optionally you could pass the sum of the members.
Example:
void Analyze_Structure(uint8_t const * p_structure,
size_t size_of_structure,
size_t size_occupied_by_members);
The trick of this assignment is to figure out how to have the calling function determine these items.
Hope this helps.
Edit 1:
struct Apple
{
char a;
int weight;
double protein_per_gram;
};
int main(void)
{
Apple granny_smith;
Analyze_Structure((uint8_t *) &granny_smith,
sizeof(Apple),
sizeof(granny_smith.a)
+ sizeof(granny_smith.weight)
+ sizeof(granny_smith.protein_per_gram);
return 0;
}
I have assignment which asks one to write a function for any data type.
This means either templates (which your prof. dismissed), void*, or variable number of arguments (simiar to printf).
The function is supposed to print the bytes of the structure
void your_function(void* data, std::size_t size)
{
std::uint8_t* bytes = reinterpret_cast<std::uint8_t*>(data);
for(auto x = bytes; x != bytes + size; ++x)
std::clog << "0x" << std::hex << static_cast<std::uint32_t>(*x) << " ";
}
[...] and identify the total number of bytes the data structure uses along with differentiating between bytes used for members and bytes used for padding.
On this one, I'm lost: the bytes used for padding are (by definition) not part of the structure. Consider:
struct x { char c; char d; char e; }; // sizeof(x) == 3;
x instance{ 0, 0, 0 };
your_function(&instance, sizeof(x)); // passes 3, not 4 (4 for 32bits architecture)
Theoretically, you could also pass alignof(instance) to the function, but that won't tell you the alignment of the fields in memory (as far as I know it is not standardized, but I may be wrong).
There are a few possibilities here:
Your prof. learned "hacky" C++ that was considered good code 10 or 20 years ago and didn't update his knowledge (C-style code, pointers, direct memory access and "smart hacks" are all in here).
He didn't know how to express exactly what he wanted or the terminology to use ("write a function for any data type" is too vague: as a developer, if I got this assignment, the first thing to do would be to ask for details - like "how will it be used?" and "what is the expected function signature").
For example, this could be achieved - to a degree - with macros, but if he wants you to use macros in place of functions and templates, you should probably contemplate changing professors.
He meant that you should write some arbitrary data type (like my struct x above) and define your API around that (unlikely).
I am not sure that such a function can be built without a minimum of introspection: you need to know what the struct members are, otherwise you only have access to the size of the struct.
Anyway, here is my proposal for a solution that should work without introspection, provided the user of the code "cooperates".
Your functions will take as arguments void* and size_t for the address and sizeof of the struct.
0) let the user create a struct of the desired type.
1) let the user call a function of yours that sets all bytes to 0.
2) let the user assign a value to every field of the struct.
3) let the user call a function of yours that keeps a record of every byte that is still 0.
4) let the user call a function of yours that sets all bytes to 1.
5) let the user assign a value to every field of the struct again. (Same values as the first time!)
6) let the user call a function of yours and count the bytes that are still 1 AND were marked before. These are padding bytes.
The reason to try with values 0 then 1 is that the values assigned by the user could include bytes 0; but they can't be bytes 0 and bytes 1 at the same time so one of the test will exclude them.
struct _S { int I; char C } S;
Fill0(S, sizeof(S));
// User cooperation
S.I= 0;
S.C= '\0';
Mark0(S, sizeof(S)); // Has some form of static storage
Fill1(S, sizeof(S));
// User cooperation
S.I= 0;
S.C= '\0';
DetectPadding(S, sizeof(S));
You can pack all of this in a single function that takes a callback function argument that does the member assignments.
void Assign(void* pS) // User-written callback
{
struct _S& S= *(struct _S)pS;
S.I= 0;
S.C= '\0';
}
I'm trying to implement deserialization where the mapping to field/member is only known at runtime (its complicated). Anyway what I'm trying to do is something like the following:
Class A
{
public:
int a; // index 0
float b; // index 1
char c; // index 2
}
Then I have two arrays, one with the index of the field and the other with something that indicates the type. I then want to iterate over the arrays and write to the fields from a byte stream.
Sorry for the crappy description but I just don't know how to implement it in code. Any ideas would be appreciated thanks!
Yes you can, the there are two things you need to look out for when doing it though.
First of all make sure you start writing from (const char*)&A.a because all compilers append stuff that doesn't really concern you at the start of an object (visualc puts the vtable there for instance) and you won't be writing what you think you are if you start from the address of the object.
Second you might want to do a #pragma pack(1) before declaring any class that needs to be written to disk because the compilers usually align class members to make DMA transfers more efficient and you might end up having problems with this as well.
On the dynamic part of it, if making one class definition for each field combination you want to have is acceptable, then it's ok to do it like this, otherwise you'd be better off including a hash table in your class and serializing/deserializing its' contents by writing key-value pairs to the file
I can't think of a language construct that will be able to give your a field address given an index at runtime. If you could have the "type" array to actually include field sizes you would have been able to do something like:
istream &in = <get it somehow>;
size_t *field_size = <get it somehow>;
size_t num_of_fields = <get it somehow>;
A a;
char *ptr = reinterpret_cast<char *>(&a);
for (int i = 0; i < num_of_fields; i++)
{
in.read(ptr, field_size[i]);
ptr += field_size[i];
}
Note that this will be true if your class is simple and doesn't have any virtual function members
(or inheritcs from such a class). If that is the case, you would do better to include a dummy member
for getting to the byte offset where fields start within the class:
class A
{
int __dummy; /* must be the first data member in the class */
...
<rest of your class definition here>
};
and now change the initialization of ptr as follows:
ptr = reinterpret_cast<char *>(&a) + offsetof(A, __dummy);
Another implicit assumption for this code is that machine byte-order is the same for both the machine running this code and the machine from which the serialized data is received. If not, then you will need to convert the byte ordering of the data read from the stream. This conversion is of course type dependent but you could have another array of conversion functions per field.
There are a lot of issues and decisions needed. At the simplest, you could keep an offset into A per field, you can switch on type and set through a pointer to the field. For example - assuming there's a int16_t encoding field numbers in the input stream, making no effort to use static_cast<> etc. where it's a little nicer to do so, and assuming a 0 field number input terminator...
A a;
char* pa = (char*)&a;
char* p_v = (char*)&input_buffer;
...
while ((field_num = *(int16_t)p_v) && (p_v += sizeof(int16_t)))
switch (type[field_num])
{
case Int32:
*(int32_t*)(p_a + offset[field_num]) = *(int32_t*)(p_v);
p_v += sizeof(int32_t);
break;
...
}
You may want to consider using e.g. ntohl() etc. to handle endianness conversions.
Let the compiler do it:
Write an operator>> function.
I have the following declaration in a file that gets generated by a perl script ( during compilation ):
struct _gamedata
{
short res_count;
struct
{
void * resptr;
short id;
short type;
} res_table[3];
}
_gamecoderes =
{
3,
{
{ &char_resource_ID_RES_welcome_object_ID,1002, 1001 },
{ &blah_resource_ID_RES_another_object_ID,1004, 1003 },
{ &char_resource_ID_RES_someting_object_ID,8019, 1001 },
}
};
My problem is that struct _gamedata is generated during compile time and the number of items in res_table will vary. So I can't provide a type declaring the size of res_table in advance.
I need to parse an instance of this structure, originally I was doing this via a pointer to a char ( and not defining struct _gamedata as a type. But I am defining res_table.
e.g.
char * pb = (char *)_gamecoderes;
// i.e. pb points to the instance of `struct _gamedata`.
short res_count = (short *)pb;
pb+=2;
res_table * entry = (res_table *)pb;
for( int i = 0; i < res_count; i++ )
{
do_something_with_entry(*entry);
}
I'm getting wierd results with this. I'm not sure how to declare a type _struct gamedata as I need to be able to handle a variable length for res_table at compile time.
Since the struct is anonymous, there's no way to refer to the type of this struct. (res_table is just the member name, not the type's name). You should provide a name for the struct:
struct GameResult {
short type;
short id;
void* resptr;
};
struct _gamedata {
short res_count;
GameResult res_table[3];
};
Also, you shouldn't cast the data to a char*. The res_count and entry's can be extracted using the -> operator. This way the member offsets can be computed correctly.
_gamedata* data = ...;
short res_count = data->res_count;
GameResult* entry = data->res_table;
or simply:
_gamedata* data;
for (int i = 0; i < data->res_count; ++ i)
do_something_with_entry(data->res_table[i]);
Your problem is alignment. There will be at least two bytes of padding in between res_count and res_table, so you cannot simply add two to pb. The correct way to get a pointer to res_table is:
res_table *table = &data->res_table;
If you insist on casting to char* and back, you must use offsetof:
#include <stddef.h>
...
res_table *table = (res_table *) (pb + offsetof(_gamedata, res_table));
Note: in C++ you may not use offsetof with "non-POD" data types (approximately "types you could not have declared in plain C"). The correct idiom -- without casting to char* and back -- works either way.
Ideally use memcpy(3), at least use type _gamedata, or define a protocol
We can consider two use cases. In what I might call the programmer-API type, serialization is an internal convenience and the record format is determined by the compiler and library. In the more formally defined and bulletproof implementation, a protocol is defined and a special-purpose library is written to portably read and write a stream.
The best practice will differ depending on whether it makes sense to create a versioned protocol and develop stream I/O operations.
API
The best and most completely portable implementation when reading from compiler-oject serialized streams would be to declare or dynamically allocate an exact or max-sized _gamedata and then use memcpy(3) to pull the data out of the serial stream or device memory or whatever it is. This lets the compiler allocate the object that is accessed by compiler code and it lets the developer allocate the object that is accessed by developer (i.e., char *) logic.
But at a minimum, set a pointer to _gamedata and the compiler will do everything for you. Note also that res_table[n] will always be at the "right" address regardless of the size of the res_table[] array. It's not like making it bigger changes the location of the first element.
General serialization best practice
If the _gamedata object itself is in a buffer and potentially misaligned, i,e., if it is anything other than an object allocated for a _gamedata type by the compiler or dynamically by a real allocator, then you still have potential alignment issues and the only correct solution is to memcpy(3) each discrete type out of the buffer.
A typical error is to use the misaligned pointer anyway, because it works (slowly) on x86. But it may not work on mobile devices, or future architectures, or on some architectures when in kernel mode, or with advanced optimizations enabled. It's best to stick with real C99.
It's a protocol
Finally, when serializing binary data in any fashion you are really defining a protocol. So, for maximum robustness, don't let the compiler define your protocol. Since you are in C, you can generally handle each fundamental object discretely with no loss in speed. If both the writer and reader do it, then only the developers have to agree on the protocol, not the developers and the compilers and the build team, and the C99 authors, and Dennis M. Ritchie, and probably some others.
As #Zack points out, there is padding between elements of your structure.
I'm assuming you have a char* because you've serialized the structure (in a cache, on disk, or over the network). Just because you are starting with a char * doesn't mean you have to access the entire struct the hard way. Cast it to a typed pointer, and let the compiler do the work for you:
_gamedata * data = (_gamedata *) my_char_pointer;
for( int i = 0; i < data->res_count; i++ )
{
do_something_with_entry(*data->res_table[i]);
}
Is it possible to get access to an individual member of a struct or class without knowing the names of its member variables?
I would like to do an "offsetof(struct, tyname)" without having the struct name or member variable name hard coded amoungst other things.
thanks.
Sure. If you have a struct and you know the offset and the type of the member variable, you can access it using pointers.
struct my_struct {
int member1;
char member2;
short member3;
char member4;
}
...
struct my_struct obj;
short member3 = *((short*)((char*)&obj + 5));
That'll get the value of member3, which is 5 bytes on from the start of obj on an x86 computer. However, you want to be careful. First of all, if the struct changes, your data will be garbage. We're casting all over the place, so you get no type safety, and the compiler won't warn you if something's awry. You'll also need to make sure the compiler's not packing the struct to align variables to word boundaries, or the offset will change.
This isn't a pleasant thing to do, and I'd avoid it if I were you, but yes, it can be done.
C and C++ are compiled languages without built-in "reflection" features. This means that regardless of what you do and how you do it, one way or another the path will always start from an explicit hard-coded value, be that a member name or an compile-time offset value. That means that if you want to select a struct member based on some run-time key, you have no other choice but to manually create a mapping of some kind that would map the key value to something that identifies a concrete struct member.
In C++ in order to identify a struct member at run-time you can use such feature as pointers-to-members. In C your only choice is to use an offset value.
Another issue is, of course, specifying the type of the members, if your members can have different types. But you provided no details about that, so I can't say whether you need to deal with it or not.
We had a similar problem some years ago: A huge struct of configuration information that we wanted to reflect on. So we wrote a Perl script to find the struct, parse its members, and output a C++ file that looked like:
struct ConfField
{ const char* name;
int type;
size_t offset;
};
ConfField confFields[] = {
{ "version", eUInt32, 0 },
{ "seqID", eUInt32, 4 },
{ "timestamp", eUInt64, 8 },
// ... lots more ...
{ 0, 0, 0 }
};
And we'd feed the script with the output from gcc -E.
Nowadays, I understand that gccxml can output an XML file representing any C++ source that gcc can compile, since it actually uses the g++ front end to do the parsing. So I'd recommend pairing it with an XML-parsing script (I'd use Python with the lxml library) to find out everything you ever wanted to know about your C++ source.
Somewhere in your code you need to reference the data member in the struct. However you can create a variable that is a pointer to a struct data member and from then on you no longer need to reference it by name.
struct foo
{
int member1;
int member2;
};
typedef int (foo::*intMemberOfFoo);
intMemberOfFoo getMember()
{
if (rand() > RAND_MAX / 2) return &foo::member1;
else return &foo::member2;
}
foo f;
void do_somthing()
{
intMemberOfFoo m = getMember();
f.*m = 0;
}
The technical answer is 'yes' because C++ is Turing-complete and you can do almost anything if you try hard enough. The more practical answer is probably 'no' since there is no safe and easy way of doing exactly what you want.
I agree with GMan. What exactly are you trying to do that makes you think you need this technique?
Well you will have to set up some stuff first, but it can be done. Expanding on Samir's response
struct my_struct {
int member1;
char member2;
short member3;
char member4;
}
you can create a table of offsets:
my_struct tmp;
int my_struct_offsets[4]={
0,
(char*)&(tmp.member2)-(char*)&(tmp.member1),
(char*)&(tmp.member3)-(char*)&(tmp.member1),
(char*)&(tmp.member4)-(char*)&(tmp.member1)
}
this will take into account different alignments on different systems