I have a C++ class; this class is as follows:
First, the header:
class PageTableEntry {
public:
PageTableEntry(bool modified = true);
virtual ~PageTableEntry();
bool modified();
void setModified(bool modified);
private:
PageTableEntry(PageTableEntry &existing);
PageTableEntry &operator=(PageTableEntry &rhs);
bool _modified;
};
And the .cpp file
#include "PageTableEntry.h"
PageTableEntry::PageTableEntry(bool modified) {
_modified = modified;
}
PageTableEntry::~PageTableEntry() {}
bool PageTableEntry::modified() {
return _modified;
}
void PageTableEntry::setModified(bool modified) {
_modified = modified;
}
I set a breakpoint on all 3 lines in the .cpp file involving _modified so I can see exactly where they are being set/changed/read. The sequence goes as follows:
Breakpoint in constructor is triggered. _modified variable is confirmed to be set to true
Breakpoint in accessor is triggered. _modified variable is FALSE!
This occurs with every instance of PageTableEntry. The class itself is not changing the variable - something else is. Unfortunately, I don't know what. The class is created dynamically using new and is passed around (as pointers) to various STL structures, including a vector and a map. The mutator is never called by my own code (I haven't gotten to that point yet) and the STL structures shouldn't be able to, and since the breakpoint is never called on the mutator I can only assume that they aren't.
Clearly there's some "gotcha" where private variables can, under certain circumstances, be changed without going through the class's mutator, triggered by who-knows-what situation, but I can't imagine what it could be. Any thoughts?
UPDATE:
The value of this at each stage:
Constructor 1: 0x100100210
Constructor 2: 0x100100400
Accessor 1: 0x1001003f0
Accessor 2: 0x100100440
UPDATE2:
(code showing where PageTableEntry is accessed)
// In constructor:
_tableEntries = std::map<unsigned int, PageTableEntry *>();
// To get an entry in the table (body of testAddr() function, address is an unsigned int:
std::map<unsigned int, PageTableEntry *>::iterator it;
it = _tableEntries.find(address);
if (it == _tableEntries.end()) {
return NULL;
}
return (PageTableEntry *)&(*it);
// To create a new entry:
PageTableEntry *entry = testAddr(address);
if (!entry) {
entry = new PageTableEntry(_currentProcessID, 0, true, kStorageTypeDoesNotExist);
_tableEntries.insert(std::pair<unsigned int, PageTableEntry *>(address, entry));
}
Those are the only points at which PageTableEntry objects are stored and retrieved from STL structures in order to cause the issue. All other functions utilize the testAddr() function to retrieve entries.
UNRELATED: Since C++ now has 65663 questions, and so far 164 have been asked today, that means that only just today the number of C++ tagged questions exceeded a 16-bit unsigned integer. Useful? No. Interesting? Yes. :)
Either your debugger isn't reporting the value correctly (not unheard of, and even expected in optimized builds) or you have memory corruption elsewhere in your program. The code you've shown is more-or-less fine and should behave as you expect.
EDIT corresponding to your 'UPDATE2':
The problem is in this line:
return (PageTableEntry *)&(*it);
The type of *it is std::pair<unsigned const, PageTableEntry*>&, so you're effectively reinterpret-casting a std::pair<unsigned const, PageTableEntry*>* to a PageTableEntry*. Change that line to:
return it->second;
Keep an eye out for other casts in your codebase. Needing a cast in the first place is a code smell, and the result of doing a cast incorrectly can be undefined behavior, including manifesting as memory corruption as you're seeing here. Using C++-style casts instead of C-style casts makes it trivial to find where casts occur in your codebase so they can be reviewed easily (hint, hint).
std::map<>::find() returns an iterator which when dereferenced returns a std::map<>::value_type. The value_type in this case is a std::pair<>. You're returning the address of that pair rather than the PageTableEntry. I believe you want the following:
// To get an entry in the table (body of testAddr() function, address is an unsigned int:
std::map<unsigned int, PageTableEntry *>::iterator it;
it = _tableEntries.find(address);
if (it == _tableEntries.end()) {
return NULL;
}
return (*it).second;
P.S.: C-style casts are evil. The compiler would have issued a diagnostic with a C++ cast in place. :)
Try looking at the value of this at each breakpoint.
That copy constructor and assignment operator are both going to be used a lot if you're using STL containers. Maybe if you show us the code for those, we would see something wrong.
Could you add another unique value to the class to track the PageTableEntry s?
I know that I've had problems like this where the real problem was that there were multiple entries that looked the same and the breakpoints might switch PageTableEntry s without me realizing it.
Related
I've been trying to learn a bit about reverse engineering and how to essentially wrap an existing class (that we do not have the source for, we'll call it PrivateClass) with our own class (we'll call it WrapperClass).
Right now I'm basically calling the constructor of PrivateClass while feeding a pointer to WrapperClass as the this argument...
Doing this populates m_string_address, m_somevalue1, m_somevalue2, and missingBytes with the PrivateClass object data. The dilemma now is that I am noticing issues with the original program (first a crash that was resolved by adding m_u1 and m_u2) and then text not rendering that was fixed by adding mData[2900].
I'm able to deduce that m_u1 and m_u2 hold the size of the string in m_string_address, but I wasn't expecting there to be any other member variables after them (which is why I was surprised with mData[2900] resolving the text rendering problem). 2900 is also just a random large value I threw in.
So my question is how can we determine the real size of a class that we do not have the source for? Is there a tool that will tell you what variables exist in a class and their order (or atleast the correct datatypes or datatype sizes of each variable). I'm assuming this might be possible by processing assembly in an address range into a semi-decompiled state.
class WrapperClass
{
public:
WrapperClass(const wchar_t* original);
private:
uintptr_t m_string_address;
int m_somevalue1;
int m_somevalue2;
char missingBytes[2900];
};
WrapperClass::WrapperClass(const wchar_t* original)
{
typedef void(__thiscall* PrivateClassCtor)(void* pThis, const wchar_t* original);
PrivateClassCtor PrivateClassCtorFunc = PrivateClassCtor(DLLBase + 0x1c00);
PrivateClassCtorFunc(this, original);
}
So my question is how can we determine the real size of a class that
we do not have the source for?
You have to guess or logically deduce it for yourself. Or just guess. If guessing doesn't work out for you, you'll have to guess again.
Is there a tool that will tell you what variables exist in a class and
their order (or atleast the correct datatypes or datatype sizes of
each variable) I'm assuming by decompiling and processing assembly in
an address range.
No, there is not. The type of meta information that describes a class, it's members, etc. simply isn't written out as the program does not need it nor are there currently no facilities defined in the C++ Standard that would require a compiler to generate that information.
There are exactly zero guarantees that you can reliably 'guess' the size of a class. You can however probably make a reasonable estimate in most cases.
The one thing you can be sure of though: the only problem is when you have too little memory for a class instance. Having too much memory isn't really a problem at all (Which is what adding 2900 extra bytes works).
On the assumption that the code was originally well written (e.g. the developer decided to initialise all the variables nicely), then you may be able to guess the size using something like this:
#define MAGIC 0xCD
// allocate a big buffer
char temp_buffer[8092];
memset(temp_buffer, MAGIC, 8092);
// call ctor
PrivateClassCtor PrivateClassCtorFunc = PrivateClassCtor(DLLBase + 0x1c00);
PrivateClassCtorFunc(this, original);
// step backwards until we find a byte that isn't 0xCD.
// Might want to change the magic value and run again
// just to be sure (e.g. the original ctor sets the last
// few bytes of the class to 0xCD by coincidence.
//
// Obviously fails if the developer never initialises member vars though!
for(int i = 8091; i >= 0; --i) {
if(temp_buffer[i] != MAGIC) {
printf("class size might be: %d\n", i + 1);
break;
}
}
That's probably a decent guess, however the only way to be 100% sure would be to stick a breakpoint where you call the ctor, switch to assembly view in your debugger of choice, and then step through the assembly line by line to see what the max address being written to is.
I did a bit of an experiment to try to understand references in C++:
#include <iostream>
#include <vector>
#include <set>
struct Description {
int a = 765;
};
class Resource {
public:
Resource(const Description &description) : mDescription(description) {}
const Description &mDescription;
};
void print_set(const std::set<Resource *> &resources) {
for (auto *resource: resources) {
std::cout << resource->mDescription.a << "\n";
}
}
int main() {
std::vector<Description> descriptions;
std::set<Resource *> resources;
descriptions.push_back({ 10 });
resources.insert(new Resource(descriptions.at(0)));
// Same as description (prints 10)
print_set(resources);
// Same as description (prints 20)
descriptions.at(0).a = 20;
print_set(resources);
// Why? (prints 20)
descriptions.clear();
print_set(resources);
// Object is written to the same address (prints 50)
descriptions.push_back({ 50 });
print_set(resources);
// Create new array
descriptions.reserve(100);
// Invalid address
print_set(resources);
for (auto *res : resources) {
delete res;
}
return 0;
}
https://godbolt.org/z/TYqaY6Tz8
I don't understand what is going on here. I have found this excerpt from C++ FAQ:
Important note: Even though a reference is often implemented using an address in the underlying assembly language, please do not think of a reference as a funny looking pointer to an object. A reference is the object, just with another name. It is neither a pointer to the object, nor a copy of the object. It is the object. There is no C++ syntax that lets you operate on the reference itself separate from the object to which it refers.
This creates some questions for me. So, if reference is the object itself and I create a new object in the same memory address, does this mean that the reference "becomes" the new object? In the example above, vectors are linear arrays; so, as long as the array points to the same memory range, the object will be valid. However, this becomes a lot trickier when other data sets are being used (e.g sets, maps, linked lists) because each "node" typically points to different parts of memory.
Should I treat references as undefined if the original object is destroyed? If yes, is there a way to identify that the reference is destroyed other than a custom mechanism that tracks the references?
Note: Tested this with GCC, LLVM, and MSVC
The note is misleading, treating references as syntax sugar for pointers is fine as a mental model. In all the ways a pointer might dangle, a reference will also dangle. Accessing dangling pointers/references is undefined behaviour (UB).
int* p = new int{42};
int& i = *p;
delete p;
void f(int);
f(*p); // UB
f(i); // UB, with the exact same reason
This also extends to the standard containers and their rules about pointer/reference invalidation. The reason any surprising behaviour happens in your example is simply UB.
The way I explain this to myself is:
Pointer is like a finger on your hands. It can point to memory blocks, think of them as a keyboard. So pointer literally points to a keypad that holds something or does something.
Reference is a nickname for something. Your name may be for example Michael Johnson, but people may call you Mike, MJ, Mikeson etc. Anytime you hear your nickname, person who called REFERED to the same thing - you. If you do something to yourself, reference will show the change too. If you point at something else, it won't affect what you previously pointed on (unless you're doing something weird), but rather point on something new. That being said, as in the accepted answer above, if you do something weird with your fingers and your nicknames, you'll see weird things happening.
References are likely the most important feature that C++ has that is critical in coding for beginners. Many schools today start with MATLAB which is insanely slow when you wish to do things seriously. One of the reasons is the lack of controlling references in MATLAB (yes it has them, make a class and derive from the handle - google it out) as you would in C++.
Look these two functions:
double fun1(std::valarray<double> &array)
{
return array.max();
}
double fun2(std::valarray<double> array)
{
return array.max();
}
These simple two functions are very different. When you have some STL array and use fun1, function will expect nickname for that array, and will process it directly without making a copy. fun2 on the other hand will take the input array, create its copy, and process the copy.
Naturally, it is much more efficient to use references when making functions to process inputs in C++. That being said, you must be certain not to change your input in any way, because that will affect original input array in another piece of code where you generated it - you are processing the same thing, just called differently.
This makes references useful for a bit controversial coding, called side-effects.
In C++ you can't make a function with multiple outputs directly without making a custom data type. One workaround is a side effect in example like this:
#include <stdio.h>
#include <valarray>
#include <iostream>
double fun3(std::valarray<double> &array, double &min)
{
min = array.min();
return array.max();
}
int main()
{
std::valarray<double> a={1, 2, 3, 4, 5};
double sideEffectMin;
double max = fun3(a,sideEffectMin);
std::cout << "max of array is " << max << " min of array is " <<
sideEffectMin<<std::endl;
return 0;
}
So fun3 is expecting a reference to a double data type. In other words, it wants the second input to be a nickname for another double variable. This function then goes to alter the reference, and this will also alter the input. Both name and nickname get altered by the function, because it's the same "thing".
In main function, variable sideEffectMin is initialized to 0, but it will get a value when fun3 function is called. Therefore, you got 2 outputs from fun3.
The example shows you the trick with side effect, but also to be ware not to alter your inputs, specially when they are references to something else, unless you know what you are doing.
First of all: I know that most optimization bugs are due to programming errors or relying on facts which may change depending on optimization settings (floating point values, multithreading issues, ...).
However I experienced a very hard to find bug and am somewhat unsure if there is any way to prevent these kind of errors from happening without turning the optimization off. Am I missing something? Could this really be an optimizer bug? Here's a simplified example:
struct Data {
int a;
int b;
double c;
};
struct Test {
void optimizeMe();
Data m_data;
};
void Test::optimizeMe() {
Data * pData; // Note that this pointer is not initialized!
bool first = true;
for (int i = 0; i < 3; ++i) {
if (first) {
first = false;
pData = &m_data;
pData->a = i * 10;
pData->b = i * pData->a;
pData->c = pData->b / 2;
} else {
pData->a = ++i;
} // end if
} // end for
};
int main(int argc, char *argv[]) {
Test test;
test.optimizeMe();
return 0;
}
The real program of course has a lot more to do than this. But it all boils down to the fact that instead of accessing m_data directly, a (previously unitialized) pointer is being used. As soon as I add enough statements to the if (first)-part, the optimizer seems to change the code to something along these lines:
if (first) {
first = false;
// pData-assignment has been removed!
m_data.a = i * 10;
m_data.b = i * m_data.a;
m_data.c = m_data.b / m_data.a;
} else {
pData->a = ++i; // This will crash - pData is not set yet.
} // end if
As you can see, it replaces the unnecessary pointer dereference with a direct write to the member struct. However it does not do this in the else-branch. It also removes the pData-assignment. Since the pointer is now still unitialized, the program will crash in the else-branch.
Of course there are various things which could be improved here, so you might blame it on the programmer:
Forget about the pointer and do what the optimizer does - use m_data directly.
Initialize pData to nullptr - that way the optimizer knows that the else-branch will fail if the pointer is never assigned. At least it seems to solve the problem in my test-environment.
Move the pointer assignment in front of the loop (effectively initializing pData with &m_data, which then could also be a reference instead of a pointer (for good measure). This makes sense because pData is needed in all cases so there is no reason to do this inside the loop.
The code is obviously smelly, to say the least, and I'm not trying to "blame" the optimizer for doing this. But I'm asking: What am I doing wrong? The program might be ugly, but it's valid code...
I should add that I'm using VS2012 with C++/CLI and v110_xp-Toolset. Optimization is set to /O2. Please also note that if you really want to reproduce the problem (that's not really the point of this question though) you need to play around with the complexity of the program. This is a very simplified example and the optimizer sometimes doesn't remove the pointer assignment. Hiding &m_data behind a function seems to "help".
EDIT:
Q: How do I know that the compiler is optimizing it to something like the example provided?
A: I'm not very good at reading assembler, I have looked at it however and have made 3 observations which make me believe that it's behaving this way:
As soon as optimization kicks in (adding more assignments usually does the trick) the pointer assignment has no associated assembler statement. It also hasn't been moved up to the declaration, so it's really left uninitialized it seems (at least to me).
In cases where the program crashes, the debugger skips the assignment statement. In cases where the program runs without problems, the debugger stops there.
If I watch the content of pData and the content of m_data while debugging, it clearly shows that all assignments in the if-branch have an effect on m_data and m_data receives the correct values. The pointer itself it still pointing to the same uninitialized value it had from the beginning. Therefore I have to assume that it is in fact not using the pointer to make the assignments at all.
Q: Does it have to do anything with i (Loop unrolling)?
A: No, the actual program actually uses do { ... } while() to loop over a SQL SELECT-resultset so the iteration count is completely runtime-specific and cannot be predetermined by the compiler.
It sure looks like an bug to me. It's fine for the optimizer to eliminate the unnecessary redirection, but it should not eliminate the assignment to pData.
Of course, you can work around the problem by assigning to pData before the loop (at least in this simple example). I gather that the problem in your actual code isn't as easily resolved.
I also vote for an optimizer bug if it is really reproducible in this example. To overrule the optimizer you could try to declare pData as volatile.
Here is my code, I casted the buffer to different type of objects, is this what causes the failure? I really want to know why the FromBase::find2(int key) works, but not FromBase::find(int key)?
class Base
{
public:
virtual int find(int key)=0;
int keys[4];
};
class FromBase:public Base
{
public:
FromBase();
int find(int key);
int find2(int key);
};
FromBase::FromBase()
{
for(int i=0;i<4;i++)
keys[i]=-1;
}
int FromBase::find(int key)
{
for(int i=0;i<4;i++){
if(keys[i]==key)
return i;
}
return i;
};
int FromBase::find2(int key)
{
for(int i=0;i<4;i++){
if(keys[i]==key)
return i;
}
return i;
};
int main()
{
FromBase frombase;
FILE* fptr=fopen("object.dat","w");
fwrite((void*)&frombase,48,1,fptr);
fclose(fptr);
char object[48];
fptr=fopen("object.dat","r");
fread((void*)object,48,1,fptr);
// looks like this works
(FromBase*)object->find2(7);
//These two do not work, I got segmentation fault!
(FromBase*)object->find(7);
(Base*)object->find(7);
}
The reason I want to do this is because I need to read the object from a file, thus I need to cast the buffer to an particular type then I can call the mothod.
There is a high chance that you are overwriting the virtual function table with your code leading to a bad address when you call the method. You cannot just save objects into a file and expect to restore them by just restoring the memory content at the time they were saved.
There are some nice libraries like boost::serialization to save and restore objects. I would urge you to read about this or to turn your objects into plain old data types (structs) containing no references or addresses.
There are several reasons why this code is not guaranteed to work. I think the biggest concern is this code here:
char object[48];
The number 48 here is a magic number and there's absolutely no guarantee that the size of the object you're writing out is 48 bytes. If you want to have a buffer large enough to hold an object, use sizeof to see how many bytes you need:
char object[sizeof(FromBase)];
Moreover, this is not guaranteed to work due to alignment issues. Every object in C++ has an alignment, some number that its address must be a multiple of. When you declare a variable, C++ ensures that it has the proper alignment for its type, though there's no guarantee that it ends up having the alignment of any other type. This means that when you declare a char array, there's no guarantee that it's aligned the same way as a real FromBase object would be, and using the buffer as an object of that type results in undefined behavior.
As others have pointed out, though, you also have a problem because this line:
fopen("object.dat","r");
Doesn't update the local variable you're using to keep track of the file pointer, so what you're reading back is almost certainly going to be garbage (if you read back anything at all). The segfault is probably coming from the bytes for the virtual dispatch table not being read back in correctly.
// will these two methods work? I got segmentation fault!
(FromBase*)object->find(7);
(Base*)object->find(7);
No they will not work. The segmentation fault might be a hint ;)
object is a type on the stack, which is fine, but you need to call the class constructor. If this was valid c++, ANY memory could be casted to any class.
I'd start off by creating the class on the stack and call some Load()-method on it, e.g.
FromBase object;
object.Load("object.dat");
And let the Load()-method read the data from file and set values on the internal data.
Apart from all the other problems that people have pointed out.
I absolutely shocked that nobody has mentioned that:
(FromBase*)object->find2(7);
Is just NOT guaranteed to work.
You are depending on a raft of implementation details. object is an array of char! Not a FromBase thus the compiler has not had the chance to initialize any of its implementation dependent details.
Even if we assume that the implementation uses a vtable (and thus a vtable pointer in the class). Does the implementation use a relative pointer or an absolute pointer. Assuming you want to save with one run and then reload the next time? Are you assuming the vtable is actually located in the same location between different runs (what happens when you load this part of the application from a dynamic library)!
This is just horrible. You SHOULD NOT DO THIS EVER.
If you want to serialize and de-serialize the object from storage. Then the class has to know how to do the serialization itself. Thus all the correct constructors/destructors get called at the correct time.
First problem I can see when you use fopen second time:
fopen("object.dat","r"); //problem - your code
which should be this:
fptr = fopen("object.dat","r"); //fix (atleast one fix)
That means, in your code you're trying to read data using fptr which is already closed!
One problem is that the array of characters do not have a method called find.
The cast do not convert the array to FromBase or Base. It only tells the compiler to ignore the error.
I've written a program that allocates a new object of the class T like this:
T* obj = new T(tid);
where tid is an int
Somewhere else in my code, I'm trying to release the object I've allocated, which is inside a vector, using:
delete(myVec[i]);
and then:
myVec[i] = NULL;
Sometimes it passes without any errors, and in some cases it causes a crash—a segmentation fault.
I've checked before calling delete, and that object is there—I haven't deleted it elsewhere before.
What can cause this crash?
This is my code for inserting objects of the type T to the vector:
_myVec is global
int add() {
int tid = _myVec.size();
T* newT = new T (tid);
if (newT == NULL){
return ERR_CODE;
}
_myVec.push_back(newT);
// _myVec.push_back(new T (tid));
return tid;
}
as it is - the program sometimes crash.
When I replace the push_back line with the commented line, and leave the rest as it is-it works.
but when I replace this code with:
int add() {
int tid = _myVec.size();
if (newT == NULL){
return ERR_CODE;
}
_myVec.push_back(new T (tid));
return tid;
}
it crashes in a different stage...
the newT in the second option is unused, and still - changes the whole process... what is going on here?
Segfaulting mean trying to manipulate a memory location that shouldn't be accessible to the application.
That means that your problem can come from three cases :
Trying to do something with a pointer that points to NULL;
Trying to do something with an uninitialized pointer;
Trying to do something with a pointer that pointed to a now deleted object;
1) is easy to check so I assume you already do it as you nullify the pointers in the vector. If you don't do checks, then do it before the delete call. That will point the case where you are trying to delete an object twice.
3) can't happen if you set NULL to the pointer in the vector.
2) might happen too. In you case, you're using a std::vector, right? Make sure that implicit manipulations of the vector (like reallocation of the internal buffer when not big enough anymore) doesn't corrupt your list.
So, first check that you delete NULL pointers (note that delete(NULL) will not throw! it's the standard and valid behaviour! ) - in your case you shouldn't get to the point to try to delete(NULL).
Then if it never happen, check that you're not having your vector fill with pointers pointing to trash. For example, you should make sure you're familiar with the [Remove-Erase idiom][1].
Now that you added some code I think I can see the problem :
int tid = _myVec.size();
You're using indice as ids.
Now, all depends on the way you delete your objects. (please show it for a more complete answer)
You just set the pointer to NULL.
You remove the pointer from the vector.
If you only do 1), then it should be safe (if you don't bother having a vector that grows and never get released and ids aren't re-used).
If you do 2. then this is all wrong : each time you remove an object from a vector, all the object still contains after the removed object position will be lowered by one. Making any stored id/index invalid.
Make sure you're coherent on this point, it is certainly a source of errors.
that segmentation fault is most probably and memory access violation. Some reasons
1) object already deallocated. be sure you set that array position on NULL after delete
2) you are out of array bounds
3) if you access that array from multiple threads make sure you are synchronizing correctly
If you're completely certain that pointer points to a valid object, and that the act of deleting it causes the crash, then you have heap corruption.
You should try using a ptr_vector, unlike your code, it's guaranteed to be exception-safe.
Hint: if you write delete, you're doing it wrong
You can't be sure that the object is still valid: the memory that was occupied by the object is not necessarily cleaned, and therefore, you can be seeing something that appears to be your object but it is not anymore.
You can use a mark in order to be sure that the object is still alive, and delete that mark in the destructor.
class A {
public:
static const unsigned int Inactive;
static const unsigned int Active;
A();
~A();
/* more things ...*/
private:
unsigned int mark;
};
const unsigned int A::Inactive = 0xDEADBEEF;
const unsigned int A::Active = 0x11BEBEEF;
A::A() : mark( Active )
{}
A::~A()
{
mark = Inactive;
}
This way, checking the first 4 bytes in your object you can easily verify whether your object has finished its live or not.