Segmentation fault when calling virtual method - c++

Here is my code, I casted the buffer to different type of objects, is this what causes the failure? I really want to know why the FromBase::find2(int key) works, but not FromBase::find(int key)?
class Base
{
public:
virtual int find(int key)=0;
int keys[4];
};
class FromBase:public Base
{
public:
FromBase();
int find(int key);
int find2(int key);
};
FromBase::FromBase()
{
for(int i=0;i<4;i++)
keys[i]=-1;
}
int FromBase::find(int key)
{
for(int i=0;i<4;i++){
if(keys[i]==key)
return i;
}
return i;
};
int FromBase::find2(int key)
{
for(int i=0;i<4;i++){
if(keys[i]==key)
return i;
}
return i;
};
int main()
{
FromBase frombase;
FILE* fptr=fopen("object.dat","w");
fwrite((void*)&frombase,48,1,fptr);
fclose(fptr);
char object[48];
fptr=fopen("object.dat","r");
fread((void*)object,48,1,fptr);
// looks like this works
(FromBase*)object->find2(7);
//These two do not work, I got segmentation fault!
(FromBase*)object->find(7);
(Base*)object->find(7);
}
The reason I want to do this is because I need to read the object from a file, thus I need to cast the buffer to an particular type then I can call the mothod.

There is a high chance that you are overwriting the virtual function table with your code leading to a bad address when you call the method. You cannot just save objects into a file and expect to restore them by just restoring the memory content at the time they were saved.
There are some nice libraries like boost::serialization to save and restore objects. I would urge you to read about this or to turn your objects into plain old data types (structs) containing no references or addresses.

There are several reasons why this code is not guaranteed to work. I think the biggest concern is this code here:
char object[48];
The number 48 here is a magic number and there's absolutely no guarantee that the size of the object you're writing out is 48 bytes. If you want to have a buffer large enough to hold an object, use sizeof to see how many bytes you need:
char object[sizeof(FromBase)];
Moreover, this is not guaranteed to work due to alignment issues. Every object in C++ has an alignment, some number that its address must be a multiple of. When you declare a variable, C++ ensures that it has the proper alignment for its type, though there's no guarantee that it ends up having the alignment of any other type. This means that when you declare a char array, there's no guarantee that it's aligned the same way as a real FromBase object would be, and using the buffer as an object of that type results in undefined behavior.
As others have pointed out, though, you also have a problem because this line:
fopen("object.dat","r");
Doesn't update the local variable you're using to keep track of the file pointer, so what you're reading back is almost certainly going to be garbage (if you read back anything at all). The segfault is probably coming from the bytes for the virtual dispatch table not being read back in correctly.

// will these two methods work? I got segmentation fault!
(FromBase*)object->find(7);
(Base*)object->find(7);
No they will not work. The segmentation fault might be a hint ;)
object is a type on the stack, which is fine, but you need to call the class constructor. If this was valid c++, ANY memory could be casted to any class.
I'd start off by creating the class on the stack and call some Load()-method on it, e.g.
FromBase object;
object.Load("object.dat");
And let the Load()-method read the data from file and set values on the internal data.

Apart from all the other problems that people have pointed out.
I absolutely shocked that nobody has mentioned that:
(FromBase*)object->find2(7);
Is just NOT guaranteed to work.
You are depending on a raft of implementation details. object is an array of char! Not a FromBase thus the compiler has not had the chance to initialize any of its implementation dependent details.
Even if we assume that the implementation uses a vtable (and thus a vtable pointer in the class). Does the implementation use a relative pointer or an absolute pointer. Assuming you want to save with one run and then reload the next time? Are you assuming the vtable is actually located in the same location between different runs (what happens when you load this part of the application from a dynamic library)!
This is just horrible. You SHOULD NOT DO THIS EVER.
If you want to serialize and de-serialize the object from storage. Then the class has to know how to do the serialization itself. Thus all the correct constructors/destructors get called at the correct time.

First problem I can see when you use fopen second time:
fopen("object.dat","r"); //problem - your code
which should be this:
fptr = fopen("object.dat","r"); //fix (atleast one fix)
That means, in your code you're trying to read data using fptr which is already closed!

One problem is that the array of characters do not have a method called find.
The cast do not convert the array to FromBase or Base. It only tells the compiler to ignore the error.

Related

How do you determine the size of a class when reverse engineering?

I've been trying to learn a bit about reverse engineering and how to essentially wrap an existing class (that we do not have the source for, we'll call it PrivateClass) with our own class (we'll call it WrapperClass).
Right now I'm basically calling the constructor of PrivateClass while feeding a pointer to WrapperClass as the this argument...
Doing this populates m_string_address, m_somevalue1, m_somevalue2, and missingBytes with the PrivateClass object data. The dilemma now is that I am noticing issues with the original program (first a crash that was resolved by adding m_u1 and m_u2) and then text not rendering that was fixed by adding mData[2900].
I'm able to deduce that m_u1 and m_u2 hold the size of the string in m_string_address, but I wasn't expecting there to be any other member variables after them (which is why I was surprised with mData[2900] resolving the text rendering problem). 2900 is also just a random large value I threw in.
So my question is how can we determine the real size of a class that we do not have the source for? Is there a tool that will tell you what variables exist in a class and their order (or atleast the correct datatypes or datatype sizes of each variable). I'm assuming this might be possible by processing assembly in an address range into a semi-decompiled state.
class WrapperClass
{
public:
WrapperClass(const wchar_t* original);
private:
uintptr_t m_string_address;
int m_somevalue1;
int m_somevalue2;
char missingBytes[2900];
};
WrapperClass::WrapperClass(const wchar_t* original)
{
typedef void(__thiscall* PrivateClassCtor)(void* pThis, const wchar_t* original);
PrivateClassCtor PrivateClassCtorFunc = PrivateClassCtor(DLLBase + 0x1c00);
PrivateClassCtorFunc(this, original);
}
So my question is how can we determine the real size of a class that
we do not have the source for?
You have to guess or logically deduce it for yourself. Or just guess. If guessing doesn't work out for you, you'll have to guess again.
Is there a tool that will tell you what variables exist in a class and
their order (or atleast the correct datatypes or datatype sizes of
each variable) I'm assuming by decompiling and processing assembly in
an address range.
No, there is not. The type of meta information that describes a class, it's members, etc. simply isn't written out as the program does not need it nor are there currently no facilities defined in the C++ Standard that would require a compiler to generate that information.
There are exactly zero guarantees that you can reliably 'guess' the size of a class. You can however probably make a reasonable estimate in most cases.
The one thing you can be sure of though: the only problem is when you have too little memory for a class instance. Having too much memory isn't really a problem at all (Which is what adding 2900 extra bytes works).
On the assumption that the code was originally well written (e.g. the developer decided to initialise all the variables nicely), then you may be able to guess the size using something like this:
#define MAGIC 0xCD
// allocate a big buffer
char temp_buffer[8092];
memset(temp_buffer, MAGIC, 8092);
// call ctor
PrivateClassCtor PrivateClassCtorFunc = PrivateClassCtor(DLLBase + 0x1c00);
PrivateClassCtorFunc(this, original);
// step backwards until we find a byte that isn't 0xCD.
// Might want to change the magic value and run again
// just to be sure (e.g. the original ctor sets the last
// few bytes of the class to 0xCD by coincidence.
//
// Obviously fails if the developer never initialises member vars though!
for(int i = 8091; i >= 0; --i) {
if(temp_buffer[i] != MAGIC) {
printf("class size might be: %d\n", i + 1);
break;
}
}
That's probably a decent guess, however the only way to be 100% sure would be to stick a breakpoint where you call the ctor, switch to assembly view in your debugger of choice, and then step through the assembly line by line to see what the max address being written to is.

Allocator with dense and sparse pointers - what's going on?

i'm trying to write a handle allocator in C++. this allocator would "handle" (hue hue hue) the allocation of handles for referencing assets (such as textures, uniforms, etc) in a game engine. for instance, inside a function for creating a texture, the handle allocator would be called to create a TextureHandle. when the texture was destroyed, the handle allocator would free the TextureHandle.
i'm reading through the source of BX, a library that includes a handle allocator just for this purpose - it's the base library of the popular library BGFX, a cross-platform abstraction over different rendering APIs.
before i start explaining what's baffling me, let me first outline what this class essentially looks like:
class HandleAllocator {
public:
constructor, destructor
getters: getNumHandles, getMaxHandles
u16 alloc();
void free(u16 handle);
bool isValid(u16 handle) const;
void reset();
private:
u16* getDensePointer() const;
u16* getSparsePointer() const;
u16 _numHandles;
u16 _maxHandles;
}
here's what getDensePointer() looks like:
u8* ptr = (u8*)reinterpret_cast<const u8*>(this);
return (u16*)&ptr[sizeof(HandleAlloc)];
as far as i understand it, this function is returning a pointer to the end of the class in memory, although i don't understand why the this pointer is first cast to a uint8_t* before being dereferenced and used with the array-index operator on the next line.
here's what's weird to me. the constructor calls the reset() function, which looks like this.
_numHandles = 0;
u16* dense = getDensePointer();
for(u16 ii=0, num = _maxHandles; ii < num; ++ii) {
dense[ii] = ii;
}
if getDensePointer returns a pointer to the end of the class in memory, how is it safe to be writing to memory beyond the end of the class in this for loop? how do i know this isn't stomping on something stored adjacent to it?
i'm a total noob, i realize the answer to this is probably obvious and betrays a total lack of knowledge on my part, but go easy on me..
To answer the first question, ask yourself why pointers have a type. In the end, they are just variables that are meant to store memory addresses. Any variable with a range large enough to store all possible memory addresses could do. They what is the difference between, let's say, int* and u8*?
The difference is the way operations are performed on them. Besides dereferencing, which is another story, pointer arithmetic is also involved. Let's take the following declarations: int *p; u8 *u;. Now, p+2, in order to have sense, will return the address at p+8 (the address of the second integer, if you'd like) while u+2 would return the address of u+2 (since u8 has a size of 1).
Now, sizeof gives you the size of the type in bytes. You want to move sizeof(x) bytes, so you need to index the array (or do pointer arithmetic, they are equivalent here) on a byte-sized data type. And that's why you cast it to u8.
Now, for the second question,
how do i know this isn't stomping on something stored adjacent to it?
simply by making sure nothing is there. This is done during the creation of the handler. For example, if you have:
HandleAllocator *h = new HandleAllocator[3]
you can freely call reset on h[0] and have 2 handlers worth of memory to play with. Without more details, it's hard to tell the exact way this excess memory is allocated and what's its purpose.

Potential dynamic memory problems

I am creating an Arduino device using C++. I need a stack object with variable size and variable data types. Essentially this stack needs to be able to be resized and used with bytes, chars, ints, doubles, floats, shorts, and longs.
I have a basic class setup, but with the amount of dynamic memory allocation that is required, I wanted to make sure that my use of data frees enough space for the program to continue without memory problems. This does not use std methods, but instead built in versions of those for the Arduino.
For clarification, my question is: Are there any potential memory problems in my code?
NOTE: This is not on the Arduino stack exchange because it requires an in depth knoweledge of C/C++ memory allocation that could be useful to all C and C++ programmers.
Here's the code:
Stack.h
#pragma once
class Stack {
public:
void init();
void deinit();
void push(byte* data, size_t data_size);
byte* pop(size_t data_size);
size_t length();
private:
byte* data_array;
};
Stack.cpp
#include "Arduino.h"
#include "Stack.h"
void Stack::init() {
// Initialize the Stack as having no size or items
data_array = (byte*)malloc(0);
}
void Stack::deinit() {
// free the data so it can be re-used
free(data_array);
}
// Push an item of variable size onto the Stack (byte, short, double, int, float, long, or char)
void Stack::push(byte* data, size_t data_size) {
data_array = (byte*)realloc(data_array, sizeof(data_array) + data_size);
for(size_t i = 0; i < sizeof(data); i++)
data_array[sizeof(data_array) - sizeof(data) + i] = data[i];
}
// Pop an item of variable size off the Stack (byte, short, double, int, float, long, or char)
byte* Stack::pop(size_t data_size) {
byte* data;
if(sizeof(data_array) - data_size >= 0) {
data = (byte*)(&data_array + sizeof(data_array) - data_size);
data_array = (byte*)realloc(data_array, sizeof(data_array) - data_size);
} else {
data = NULL;
}
// Make sure to free(data) when done with the data from pop()!
return data;
}
// Return the sizeof the Stack
size_t Stack::length() {
return sizeof(data_array);
}
There are some minor code bugs, apparently, which -- although important -- are easily resolved. The following answer only applies to the overall design of this class:
There is nothing wrong with just the code that is shown.
But only the code that's shown. No opinion is rendered on any code that's not shown.
And, it's fairly likely that there are going to be massive problems, and memory leaks, in the rest of the code which will attempt to use this class.
It's going to very, very easy to use this class in a way that leaks or corrupts memory. It's going to be much harder to use this class correctly, and much easier to screw up. The fact that these functions themselves appear to do their job correctly is not going to help if all you have to do is sneeze in the wrong direction, and end up with these functions not being used in the proper order, or sequence.
Just to name the first two readily apparent problems:
1) Failure to call deinit(), when any instance of this class goes out of scope and gets destroyed, will leak memory. Every time you use this class, you have to be cognizant of when the instance of this class goes out of scope and gets destroyed. It's easy to keep track of every time you create an instance of this class, and it's easy to remember to call init() every time. But keeping track of every possible way an instance of this class could go out of scope and get destroyed, so that you must call deinit() and free up the internal memory, is much harder. It's very easy to not even realize when that happens.
2) If an instance of this class gets copy-constructed, or the default assignment operator gets invoked, this is guaranteed to result in memory corruption, with an extra side-helping of a memory leak.
Note that you don't have to go out of your way to write code that copy-constructs, or assigns one instance of the object to another one. The compiler will be more than happy to do it for you, if you do not pay attention.
Generally, the best way to avoid these kinds of problems is to make it impossible to happen, by using the language correctly. Namely:
1) Following the RAII design pattern. Get rid of init() and deinit(). Instead, do this work in the object's constructor and destructor.
2) Either deleting the copy constructor and the assignment operator, or implementing them correctly. So, if instances of this class should never be copy-constructed or assigned-to, it's much better to have the compiler yell at you, if you accidentally write some code that does that, instead of spending a week tracking down where that happens. Or, if the class can be copy-constructed or assigned, doing it properly.
Of course, if there would only be a small number of instances of this class, it should be possible to safely use it, with tight controls, and lots of care, without doing this kind of a redesign. But, even if it were the case, it's always better to do the job right, instead of shrugging this off now, but then later deciding to expand the use of this class in more places, and then forgetting about the fact that this class is so error-prone.
P.S.: a few of the minor bugs that I mentioned in the beginning:
data_array = (byte*)realloc(data_array, sizeof(data_array) + data_size);
This can't be right. data_array is a byte *, so sizeof(data_array) will always be a compile-time constant, which would be sizeof(byte *). That's obviously not what you want here. You need to explicitly keep track of the allocated array's size.
The same general bug appears in several other places here, but it's easily fixed. The overall class design is the bigger problem.

Stack unwind clobbering memory with inplace new operator

I have a pretty nasty bug that has been bothering for a while. Here's the situation, I'm creating an in memory filesystem. I have pre-allocated data blocks for each file in which to do reads and writes from. To implement directories I have a simple std::vector object containing all the files in the directory. This vector object is at the top of the file in each directory. Therefore, in order to read and write from the directory I read the first 16 bytes into a character buffer and type cast it as a vector (16 bytes because sizeof(vector<T>) is 16 on my system). Specifically, the first 16 bytes are not elements of the vector, but the vector itself. However, the vector is somehow being clobbered after I exit out of a key function.
The following code does not throw an exception and can correctly save a vector to a character buffer to be later retrieved.
#include <vector>
char dblock[16];
typedef std::vector<int> Entries;
void foo() {
char buf[sizeof(Entries)];
Entries* test = new (buf)Entries();
test->push_back(0);
for (int i = 0; i < sizeof(std::vector<int>); ++i) {
dblock[i] = buf[i];
}
}
void bar() {
char buf[sizeof(Entries)];
for (int i = 0; i < sizeof(std::vector<int>); ++i) {
buf[i] = dblock[i];
}
Entries* test = (Entries*)buf;
test->back();
}
int main()
{
foo();
bar();
return 0;
}
However, once the function foo is changed to include an argument like so, any time I try to use iterators an exception is thrown.
void foo(int this_arg_breaks_everything) {
char buf[sizeof(Entries)];
Entries* test = new (buf)Entries();
test->push_back(0);
for (int i = 0; i < sizeof(std::vector<int>); ++i) {
dblock[i] = buf[i];
}
}
Looking at the disassembly, I found the problem assmbler when the function is tearing down its frame stack:
add esp,128h <----- After stack is reduced, vector goes to an unusable state.
cmp ebp,esp
call __RTC_CheckEsp (0D912BCh)
mov esp,ebp
pop ebp
ret
I found this code by using the debugger to test if ((Entries*)dblock)->back() returns an exception or not. Specifically the exception is "Access violation reading location 0XCCCCCCCC". The location in question is std::vector's std::_Container_base12::_Myproxy->_Mycont->_Myproxy == 0XCCCCCCCC; you can find it at line 165 of xutility.
And yes, the bug only happens when the inplace new operator is used. Using a normal new operator then writing test to dblock doesn't throw an exception.
Thus, I came to the conclusion that the ultimate reason for this is somehow the compiler is doing a bad stack unwind clobbering some part of memory that it shouldn't be.
Edit: Changed wording for clarity sake.
Edit2: Explained magic numbers.
In visual studio 2013 this generates errors and after taking a look at the internal data of the vector it was pretty easy to figure out why. Vector allocates an internal object that does a lot of it's heavy lifting, this internal object in turn stores a pointer back to the vector and thus knows the location that the vector is supposed to be. When the vector's memory is moved from one location to another, the memory it occupies changes and thus that internal object now instead points back to memory that later gets wiped by debug code.
Looking at your code, this looks like the exact same thing. std::_Container_base12 is a base class used by the vector which has a member called myProxy. myProxy is an internal object that does heavy lifting, and has a pointer back to the vector that contains it. When you move the vector, you can't update this pointer and so when you go to use the vector data that was moved it tries to use myProxy which is still trying to refer back to the original vector location which was wiped. Because that data area was wiped, it looks for a pointer in it and instead finds 'CCCCCCCC' which is what the debug code does to memory data that was wiped. It tries to access that memory location and everything explodes.
What you are doing (serializing an opaque C++ object with the equivalent of memcpy to a local buffer) is not going to work sustainably because vector objects are deep objects with lots of pointers to heap memory. Nevertheless to answer your question, why you are getting a crash.
The problem is alignment. When you try doing
char buf[sizeof(Entries)];
Entries* test = new (buf)Entries();
this assumes that buf has the proper alignment for the object Entries. I am not going to claim to know the internal structure of vector, but I bet it looks something like
class vector
{
T* start;
T* end;
other stuff
}
i.e. it is a bunch of pointers to heap. Pointers require register alignment, which is 8 bytes on a 64-bit machine. Alignment by N bytes means you can divide the address evenly by N. However, you are allocating buf on the stack, which is not guaranteed to have any alignment, but probably accidentally has 8 byte alignment because it is the only thing on your local stack frame. However, if you declare an argument to foo, and the argument is say an int which is 4 bytes, then buf is no longer 8-byte aligned since you've just added 4 bytes. Then when you try to access the pointers that are unaligned, you get your crash.
As an experiment, try changing foo to
void foo(int unused1, int unused2) {
This might accidentally realign buf, and it might not crash. However, stop what you are doing and don't do it this way.
See http://en.wikipedia.org/wiki/Data_structure_alignment for more info
and this for guidance on serializing: http://www.parashift.com/c++-faq/serialization.html . You might consider a Boost vector class that is serializable.
Why not just write the following?
std::vector<int> dblock;
void foo() {
dblock.push_back(0);
}
void bar() {
dblock.back();
}
Simplest answer for this: UNDEFINED BEHAVIOR.
std::vector isn't trivially copyable, you can't memcpy it from one place to another.
Another problem with your code is that dblock could have different alignment than std::vector. This could cause crash on some processors.
Third problem is compiler sometime could return garbage when you copy buff to dblock.
Its because you break strict aliasing rule.

C++ save vector of vector of vector of class object into file

I have a class in c++ like the following:
class myCls
{
public:
myCls();
void setAngle(float angle);
void setArr(unsigned char arr[64]);
unsigned char arr[64];
double angle;
int index;
static float calcMean(const unsigned char arr[64]);
static float sqrt7(float x);
};
Now in my main program I have a 3D vector of the class:
vector<vector<vector< myCls > > > obj;
The size of the vector is also dynamically changed. My question is that how can I store the content of my vector into a file and retrieve it afterward?
I have tried many ways with no success.This is my try:
std::ofstream outFile;
outFile.open(fileName, ios::out);
for(int i=0;i<obj.size();i++)
{
outFile.write((const char *)(obj.data()),sizeof(vector<vector<myCls> >)*obj.size());
}
outFile.close();
And for reading it:
vector<vector<vector<myCls>>> myObj;
id(inFile.is_open())
{
inFile.read((char*)(myObj.data()),sizeof(vector<vector<myCls> >)*obj.size());
}
What I get is only runTime error.
Can anyone help me in this issue please?
If you don't care too much about performance, try boost::serialization. Since they've already implemented serialization functions for stl containers, you would only have to write the serialize function for a myCL, and everything else comes for free. Since your member variables are all public, you can do that intrusively or non-intrusively.
Internally, a vector most usually consists of two numbers, representing the current length and the allocated length (capacity), as well as a pointer to the actual data. So the size of the “raw” object is fixed and approximately thrice the size of a pointer. This is what your code currently writes. The values the pointer points at won't be stored. When you read things back, you're setting the pointer to something which in most cases won't even be allocated memory, thus the runtime error.
In general, it's a really bad idea to directly manipulate the memory of any class which provides constructors, destructors or assignment operators. Your code writing to the private members of the vector would thoroughly confuse memory management, even if you took care to restore the pointed-at data as well. For this reason, you should only write simple (POD) data the way you did. Everything else should be customized to use custom code.
In the case of a vector, you'd probably store the length first, and then write the elements one at a time. For reading, you'd read the length, probably reserve memory accordingly, and then read elements one at a time. The boost::serialization templates suggested by Voltron will probably save you the trouble of implementing all that.