Dereferencing a pointer to a variable with an unknown type

Dereferencing a pointer to a variable with an unknown type - c++

I didn't know exactly how to explain the problem that I am having right now, so sorry if I am being vague in the title of the question.
What I am having right now is a list of virtual addresses that are being stored in variables. For example, I'm having
0x8c334dd
stored in a char variable. This address is the address of another variable that has data on it. What I want to do is to go to that address and get the data that is stored on it.
My assumption was that dereferencing the pointer would have been the best way to go, unfortunately I don't know the type of the variable that the address is pointing to, so how does dereferencing works in this case? I cannot do: *(char *) 8c334dd because I don't know the type of the variable that the address is pointing to...
If I cast it as an (int *) I get some of the data of some of the variables that some addresses are pointing to (remember that I have several addresses) but for others I am just getting an address, and I need the data (this variables are structs, chars, etc).
I am working with the ELF Symbol Table

In general, C++ or C have no way of knowing what type of pointer you have.
The usual way to solve this problem is to make the pointer point to a struct, and have a known position in the struct indicate the type of the data. Usually the known position is the first position in the struct.
Example:
// signature value; use any value unlikely to happen by chance
#define VAR_SIG 0x11223344
typedef enum
{
vartypeInvalid = 0,
vartypeInt,
vartypeFloat,
vartypeDouble,
vartypeString,
vartypeMax // not a valid vartype
} VARTYPE;
typedef struct
{
VARTYPE type;
#ifdef DEBUG
uint32_t sig;
#endif // DEBUG
union data
{
int i;
float f;
double d;
char *s;
};
} VAR;
You can then do a sanity check: you can see if the type field has a value greater than vartypeInvalid and less than vartypeMax (and you will never need to edit those names in the sanity check code; if you add more types, you add them before vartypeMax in the list). Also, for a DEBUG build, you can check that the signature field sig contains some specific signature value. (This means that your init code to init a VAR instance needs to always set the sig field, of course.)
If you do something like this, then how do you initialize it? Runtime code will always work:
VAR v;
#ifdef DEBUG
v.sig = VAR_SIG;
#endif // DEBUG
v.type = vartypeFloat;
v.data = 3.14f;
What if you want to initialize it at compile time? It's easy if you want to initialize it with an integer value, because the int type is the first type in the union:
VAR v =
{
vartypeInt,
#ifdef DEBUG
VAR_SIG,
#endif // DEBUG
1234
};
If you are using a C99 compliant version of C, you can actually initialize the struct with a field name and have it assign any type. But Microsoft C isn't C99 compliant, so the above is a nightmare if you want to init your struct with a float or double value. (If you cast the float value to an integer, C won't just change the type, it will round the value; and there is no trick I know of to portably get a 32-bit integer value that correctly represents a 32-bit float at compile time in a C program.)
Compile time float packing/punning
If you are working with pointers, though, that's easy. Just make the first field name in the union be a pointer type, cast the pointer to void * and init the struct as above (the pointer would go where 1234 went above).
If you are reading tables written by someone else's code, and you don't have a way to add a type identifier field, I don't have a general answer for you. I guess you could try reading the pointer out as different types, and see which one(s) work?

Just wanted to add something, for people out there working with the ELF symbol table, I've found the DIEs in the DWARF file easier to work with. You can get the addresses, types and names of variables using DWARF instead of ELF, and libdwarf has good documentation.

Related

How do you determine the size of a class when reverse engineering?

I've been trying to learn a bit about reverse engineering and how to essentially wrap an existing class (that we do not have the source for, we'll call it PrivateClass) with our own class (we'll call it WrapperClass).
Right now I'm basically calling the constructor of PrivateClass while feeding a pointer to WrapperClass as the this argument...
Doing this populates m_string_address, m_somevalue1, m_somevalue2, and missingBytes with the PrivateClass object data. The dilemma now is that I am noticing issues with the original program (first a crash that was resolved by adding m_u1 and m_u2) and then text not rendering that was fixed by adding mData[2900].
I'm able to deduce that m_u1 and m_u2 hold the size of the string in m_string_address, but I wasn't expecting there to be any other member variables after them (which is why I was surprised with mData[2900] resolving the text rendering problem). 2900 is also just a random large value I threw in.
So my question is how can we determine the real size of a class that we do not have the source for? Is there a tool that will tell you what variables exist in a class and their order (or atleast the correct datatypes or datatype sizes of each variable). I'm assuming this might be possible by processing assembly in an address range into a semi-decompiled state.
class WrapperClass
{
public:
WrapperClass(const wchar_t* original);
private:
uintptr_t m_string_address;
int m_somevalue1;
int m_somevalue2;
char missingBytes[2900];
};
WrapperClass::WrapperClass(const wchar_t* original)
{
typedef void(__thiscall* PrivateClassCtor)(void* pThis, const wchar_t* original);
PrivateClassCtor PrivateClassCtorFunc = PrivateClassCtor(DLLBase + 0x1c00);
PrivateClassCtorFunc(this, original);
}

So my question is how can we determine the real size of a class that
we do not have the source for?
You have to guess or logically deduce it for yourself. Or just guess. If guessing doesn't work out for you, you'll have to guess again.
Is there a tool that will tell you what variables exist in a class and
their order (or atleast the correct datatypes or datatype sizes of
each variable) I'm assuming by decompiling and processing assembly in
an address range.
No, there is not. The type of meta information that describes a class, it's members, etc. simply isn't written out as the program does not need it nor are there currently no facilities defined in the C++ Standard that would require a compiler to generate that information.

There are exactly zero guarantees that you can reliably 'guess' the size of a class. You can however probably make a reasonable estimate in most cases.
The one thing you can be sure of though: the only problem is when you have too little memory for a class instance. Having too much memory isn't really a problem at all (Which is what adding 2900 extra bytes works).
On the assumption that the code was originally well written (e.g. the developer decided to initialise all the variables nicely), then you may be able to guess the size using something like this:
#define MAGIC 0xCD
// allocate a big buffer
char temp_buffer[8092];
memset(temp_buffer, MAGIC, 8092);
// call ctor
PrivateClassCtor PrivateClassCtorFunc = PrivateClassCtor(DLLBase + 0x1c00);
PrivateClassCtorFunc(this, original);
// step backwards until we find a byte that isn't 0xCD.
// Might want to change the magic value and run again
// just to be sure (e.g. the original ctor sets the last
// few bytes of the class to 0xCD by coincidence.
//
// Obviously fails if the developer never initialises member vars though!
for(int i = 8091; i >= 0; --i) {
if(temp_buffer[i] != MAGIC) {
printf("class size might be: %d\n", i + 1);
break;
}
}
That's probably a decent guess, however the only way to be 100% sure would be to stick a breakpoint where you call the ctor, switch to assembly view in your debugger of choice, and then step through the assembly line by line to see what the max address being written to is.

Explaining C++ (C Binding Library) Function

I'm trying to understand a Function/Method in a Library in order to port it to Java however some parameters don't make any sense to me and reading the source code the library is based on is not helping.
Function (Note the API has few comments (We can also ignore the calc handle since it's got a supplier method))
Ssr calc_ssr(CalcHandle *calc, NoteInfo *rows, size_t num_rows, float music_rate, float score_goal) {
std::vector<NoteInfo> note_info(rows, rows + num_rows);
auto skillsets = MinaSDCalc(
note_info,
music_rate,
score_goal,
reinterpret_cast<Calc*>(calc)
);
return skillset_vector_to_ssr(skillsets);
}
NoteInfo Struct
struct NoteInfo
{
unsigned int notes;
float rowTime;
};
MinaSDCalc
// Function to generate SSR rating
auto
MinaSDCalc(const std::vector<NoteInfo>& NoteInfo,
const float musicrate,
const float goal,
Calc* calc) -> std::vector<float>
{
if (NoteInfo.size() <= 1) {
return dimples_the_all_zero_output;
}
calc->ssr = true;
calc->debugmode = false;
return calc->CalcMain(NoteInfo, musicrate, min(goal, ssr_goal_cap));
}
Calc expected input file data (Only care about the #Notes: ...)
Pastebin
Question
What is NoteInfo in calc_ssr, I don't know any C or C++ so the *rows to me just seems like a pointer to a Noteinfo instance, however the MinaSDCalc methods requires an Array/Vector which using a pointer to a single instance doesn't make sense to me (pairing this with the fact that NoteInfo needs another parameter rowTime which I think is time of Note occurrence in the file which means that value must not be constant otherwise the produced result would be inaccurate)
Github Project: https://github.com/kangalioo/minacalc-standalone (The code alone may not explain enough but it's worth a try; best to look at API.h and discern what's used from there. Though I do warn you a lot of the Code is esoteric)
Sorry if this doesn't make much sense but I've been looking into this since June/July and this API is the closest abstraction from the bare C++ code I could find.

NoteInfo * rows here is pass by pointer. So, rows actually is a pointer to an instance of type NoteInfo. This is one of the ways to pass arrays in c++ to a function. Since arrays are contiguous in memory so we can just increment the pointer by one and get the next element of the array.
for example look at these three ways to do exactly one thing, parameter to pass an array to a function :-
1. void myFunction(int *param) {}
2. void myFunction(int param[10]) {}
3. void myFunction(int param[]) {}
Look into this link for more understanding : https://www.tutorialspoint.com/cplusplus/cpp_passing_arrays_to_functions.htm
Also search for pass by pointer and pass by reference to look into different ways of passing arguments in c++.
2.however the MinaSDCalc methods requires an Array/Vector which using a pointer to a single instance doesn't make sense to me: as to this question of yours, you can now see MinaSDCalc is actually getting an array and not a single instance as passing the pointer is also one of the ways of passing an array in c++.

An array of structures within a structure - what's the pointer type?

I have the following declaration in a file that gets generated by a perl script ( during compilation ):
struct _gamedata
{
short res_count;
struct
{
void * resptr;
short id;
short type;
} res_table[3];
}
_gamecoderes =
{
3,
{
{ &char_resource_ID_RES_welcome_object_ID,1002, 1001 },
{ &blah_resource_ID_RES_another_object_ID,1004, 1003 },
{ &char_resource_ID_RES_someting_object_ID,8019, 1001 },
}
};
My problem is that struct _gamedata is generated during compile time and the number of items in res_table will vary. So I can't provide a type declaring the size of res_table in advance.
I need to parse an instance of this structure, originally I was doing this via a pointer to a char ( and not defining struct _gamedata as a type. But I am defining res_table.
e.g.
char * pb = (char *)_gamecoderes;
// i.e. pb points to the instance of `struct _gamedata`.
short res_count = (short *)pb;
pb+=2;
res_table * entry = (res_table *)pb;
for( int i = 0; i < res_count; i++ )
{
do_something_with_entry(*entry);
}
I'm getting wierd results with this. I'm not sure how to declare a type _struct gamedata as I need to be able to handle a variable length for res_table at compile time.

Since the struct is anonymous, there's no way to refer to the type of this struct. (res_table is just the member name, not the type's name). You should provide a name for the struct:
struct GameResult {
short type;
short id;
void* resptr;
};
struct _gamedata {
short res_count;
GameResult res_table[3];
};
Also, you shouldn't cast the data to a char*. The res_count and entry's can be extracted using the -> operator. This way the member offsets can be computed correctly.
_gamedata* data = ...;
short res_count = data->res_count;
GameResult* entry = data->res_table;
or simply:
_gamedata* data;
for (int i = 0; i < data->res_count; ++ i)
do_something_with_entry(data->res_table[i]);

Your problem is alignment. There will be at least two bytes of padding in between res_count and res_table, so you cannot simply add two to pb. The correct way to get a pointer to res_table is:
res_table *table = &data->res_table;
If you insist on casting to char* and back, you must use offsetof:
#include <stddef.h>
...
res_table *table = (res_table *) (pb + offsetof(_gamedata, res_table));
Note: in C++ you may not use offsetof with "non-POD" data types (approximately "types you could not have declared in plain C"). The correct idiom -- without casting to char* and back -- works either way.

Ideally use memcpy(3), at least use type _gamedata, or define a protocol
We can consider two use cases. In what I might call the programmer-API type, serialization is an internal convenience and the record format is determined by the compiler and library. In the more formally defined and bulletproof implementation, a protocol is defined and a special-purpose library is written to portably read and write a stream.
The best practice will differ depending on whether it makes sense to create a versioned protocol and develop stream I/O operations.
API
The best and most completely portable implementation when reading from compiler-oject serialized streams would be to declare or dynamically allocate an exact or max-sized _gamedata and then use memcpy(3) to pull the data out of the serial stream or device memory or whatever it is. This lets the compiler allocate the object that is accessed by compiler code and it lets the developer allocate the object that is accessed by developer (i.e., char *) logic.
But at a minimum, set a pointer to _gamedata and the compiler will do everything for you. Note also that res_table[n] will always be at the "right" address regardless of the size of the res_table[] array. It's not like making it bigger changes the location of the first element.
General serialization best practice
If the _gamedata object itself is in a buffer and potentially misaligned, i,e., if it is anything other than an object allocated for a _gamedata type by the compiler or dynamically by a real allocator, then you still have potential alignment issues and the only correct solution is to memcpy(3) each discrete type out of the buffer.
A typical error is to use the misaligned pointer anyway, because it works (slowly) on x86. But it may not work on mobile devices, or future architectures, or on some architectures when in kernel mode, or with advanced optimizations enabled. It's best to stick with real C99.
It's a protocol
Finally, when serializing binary data in any fashion you are really defining a protocol. So, for maximum robustness, don't let the compiler define your protocol. Since you are in C, you can generally handle each fundamental object discretely with no loss in speed. If both the writer and reader do it, then only the developers have to agree on the protocol, not the developers and the compilers and the build team, and the C99 authors, and Dennis M. Ritchie, and probably some others.

As #Zack points out, there is padding between elements of your structure.
I'm assuming you have a char* because you've serialized the structure (in a cache, on disk, or over the network). Just because you are starting with a char * doesn't mean you have to access the entire struct the hard way. Cast it to a typed pointer, and let the compiler do the work for you:
_gamedata * data = (_gamedata *) my_char_pointer;
for( int i = 0; i < data->res_count; i++ )
{
do_something_with_entry(*data->res_table[i]);
}

Accessing any structs members at run-time

Is it possible to get access to an individual member of a struct or class without knowing the names of its member variables?
I would like to do an "offsetof(struct, tyname)" without having the struct name or member variable name hard coded amoungst other things.
thanks.

Sure. If you have a struct and you know the offset and the type of the member variable, you can access it using pointers.
struct my_struct {
int member1;
char member2;
short member3;
char member4;
}
...
struct my_struct obj;
short member3 = *((short*)((char*)&obj + 5));
That'll get the value of member3, which is 5 bytes on from the start of obj on an x86 computer. However, you want to be careful. First of all, if the struct changes, your data will be garbage. We're casting all over the place, so you get no type safety, and the compiler won't warn you if something's awry. You'll also need to make sure the compiler's not packing the struct to align variables to word boundaries, or the offset will change.
This isn't a pleasant thing to do, and I'd avoid it if I were you, but yes, it can be done.

C and C++ are compiled languages without built-in "reflection" features. This means that regardless of what you do and how you do it, one way or another the path will always start from an explicit hard-coded value, be that a member name or an compile-time offset value. That means that if you want to select a struct member based on some run-time key, you have no other choice but to manually create a mapping of some kind that would map the key value to something that identifies a concrete struct member.
In C++ in order to identify a struct member at run-time you can use such feature as pointers-to-members. In C your only choice is to use an offset value.
Another issue is, of course, specifying the type of the members, if your members can have different types. But you provided no details about that, so I can't say whether you need to deal with it or not.

We had a similar problem some years ago: A huge struct of configuration information that we wanted to reflect on. So we wrote a Perl script to find the struct, parse its members, and output a C++ file that looked like:
struct ConfField
{ const char* name;
int type;
size_t offset;
};
ConfField confFields[] = {
{ "version", eUInt32, 0 },
{ "seqID", eUInt32, 4 },
{ "timestamp", eUInt64, 8 },
// ... lots more ...
{ 0, 0, 0 }
};
And we'd feed the script with the output from gcc -E.
Nowadays, I understand that gccxml can output an XML file representing any C++ source that gcc can compile, since it actually uses the g++ front end to do the parsing. So I'd recommend pairing it with an XML-parsing script (I'd use Python with the lxml library) to find out everything you ever wanted to know about your C++ source.

Somewhere in your code you need to reference the data member in the struct. However you can create a variable that is a pointer to a struct data member and from then on you no longer need to reference it by name.
struct foo
{
int member1;
int member2;
};
typedef int (foo::*intMemberOfFoo);
intMemberOfFoo getMember()
{
if (rand() > RAND_MAX / 2) return &foo::member1;
else return &foo::member2;
}
foo f;
void do_somthing()
{
intMemberOfFoo m = getMember();
f.*m = 0;
}

The technical answer is 'yes' because C++ is Turing-complete and you can do almost anything if you try hard enough. The more practical answer is probably 'no' since there is no safe and easy way of doing exactly what you want.
I agree with GMan. What exactly are you trying to do that makes you think you need this technique?

Well you will have to set up some stuff first, but it can be done. Expanding on Samir's response
struct my_struct {
int member1;
char member2;
short member3;
char member4;
}
you can create a table of offsets:
my_struct tmp;
int my_struct_offsets[4]={
0,
(char*)&(tmp.member2)-(char*)&(tmp.member1),
(char*)&(tmp.member3)-(char*)&(tmp.member1),
(char*)&(tmp.member4)-(char*)&(tmp.member1)
}
this will take into account different alignments on different systems

Getting the size in bytes or in chars of a member of a struct or union in C/C++?

Let's say that I want to get the size in bytes or in chars for the name field from:
struct record
{
int id;
TCHAR name [50];
};
sizeof(record.name) does not work.

The solution for this is not so pretty as you may think:
size_in_byte = sizeof(((struct record *) 0)->name)
size_in_chars = _countof(((struct record *) 0)->name)
If you want to use the second one on other platforms than Windows try:
#define _countof(array) (sizeof(array)/sizeof(array[0]))

If you create an instance first, it will work.
record r;
sizeof(r.name);

In C++:
#include <iostream>
using namespace std;;
struct record
{
int id;
char name [50];
};
int main() {
cout << sizeof( record::name) << endl;
}
Edit: A couple of people have pointed out that this is C++0x code, so I guess I must retract my unkind comment regarding VC++. This is not a programming construct I have ever used in my own C++ code, but I have to wonder why sizeof would not work this way in C++03? You hand it a name and it gives you the size. I'd have thought it would take some effort for it not to work. But such is the wonder of the C++ Standard :-)

record is the name of a type, but record.name is not. You somehow have to access name through an instance of the struct. Sorin's answer is the usual C solution:
sizeof ((struct record*)0)->name;
This creates a pseudo-pointer to an instance (or pointer to a pseudo-instance) of struct record, then access the name member, and pass that expression to sizeof. It works because sizeof doesn't attempt to evaluate the pointer expression, it just uses it to compute the size.

You might wanna read this, as it discusses the very same issue and provides all the options mentioned in this thread, and a little more.

struct record
{
static const int kMaxNameChars=50;
int id;
TCHAR name [kMaxNameChars];
};
sizeof(TCHAR)*record::kMaxNameChars //"sizeof(record.name)"
//record::kMaxNameChars sufficient for many purposes.
Portable, perfectly safe and IMO being explicit about raw array length is good practice.
(edit: you might have to macro it in C, if the compiler gets upset about variable array lengths. if you do, consider defining a static const int to the value of the macro anyway!)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js