Memory usage and overwrites in c++ - c++

I am working on a large piece of code. As part of my main class constructor I declare a large number of vectors which at one point or another get filled (all with doubles). Up until a while ago the code ran fine but after I added one further vector of doubles a completely unrelated variable (one which decides whether a particular 'run' has been succesfull or not) is being changed for some reason.
I have not added any lines which change this success variable and when I print out it's value (a succesful run leads to the variable being zero) it changes to a massive integer everytime, but each time I run it gives a different value.
I have a feeling I am doing something wrong with memory allocation but I don't know what exactly!
Any advice welcomed,
Cheers
Jack
UPDATE
class MyClass {
std::vector <std::vector<HLV> > qChains;
std::vector <std::vector<HLV> > VertexChains;
std::vector <std::vector<double> > Virtuals;
std::vector <double> VProducts;
std::vector <double> QProducts;
std::vector <double> StrongCouplings;
int EventStatus
}
and then in another method of 'MyClass' I have a quick if loop checking the event is going ok:
if (GetEventStatus() != 0) cout << "ERROR!! " << GetEventStatus() << endl;
and ever since I added the line about StrongCouplings the status has been returning random huge integers.
I have however noticed that if I place a series of print statements throughout checking the value of EventStatus at various places the problem goes away!

try to add char buf[128]; before the variable that changes its value - if its helps - it will mean some previous variable overrides your variable. It may be caused by ODR violation or by incorrect usage of a C array (if you write after the end of the array)

Related

How do you determine the size of a class when reverse engineering?

I've been trying to learn a bit about reverse engineering and how to essentially wrap an existing class (that we do not have the source for, we'll call it PrivateClass) with our own class (we'll call it WrapperClass).
Right now I'm basically calling the constructor of PrivateClass while feeding a pointer to WrapperClass as the this argument...
Doing this populates m_string_address, m_somevalue1, m_somevalue2, and missingBytes with the PrivateClass object data. The dilemma now is that I am noticing issues with the original program (first a crash that was resolved by adding m_u1 and m_u2) and then text not rendering that was fixed by adding mData[2900].
I'm able to deduce that m_u1 and m_u2 hold the size of the string in m_string_address, but I wasn't expecting there to be any other member variables after them (which is why I was surprised with mData[2900] resolving the text rendering problem). 2900 is also just a random large value I threw in.
So my question is how can we determine the real size of a class that we do not have the source for? Is there a tool that will tell you what variables exist in a class and their order (or atleast the correct datatypes or datatype sizes of each variable). I'm assuming this might be possible by processing assembly in an address range into a semi-decompiled state.
class WrapperClass
{
public:
WrapperClass(const wchar_t* original);
private:
uintptr_t m_string_address;
int m_somevalue1;
int m_somevalue2;
char missingBytes[2900];
};
WrapperClass::WrapperClass(const wchar_t* original)
{
typedef void(__thiscall* PrivateClassCtor)(void* pThis, const wchar_t* original);
PrivateClassCtor PrivateClassCtorFunc = PrivateClassCtor(DLLBase + 0x1c00);
PrivateClassCtorFunc(this, original);
}
So my question is how can we determine the real size of a class that
we do not have the source for?
You have to guess or logically deduce it for yourself. Or just guess. If guessing doesn't work out for you, you'll have to guess again.
Is there a tool that will tell you what variables exist in a class and
their order (or atleast the correct datatypes or datatype sizes of
each variable) I'm assuming by decompiling and processing assembly in
an address range.
No, there is not. The type of meta information that describes a class, it's members, etc. simply isn't written out as the program does not need it nor are there currently no facilities defined in the C++ Standard that would require a compiler to generate that information.
There are exactly zero guarantees that you can reliably 'guess' the size of a class. You can however probably make a reasonable estimate in most cases.
The one thing you can be sure of though: the only problem is when you have too little memory for a class instance. Having too much memory isn't really a problem at all (Which is what adding 2900 extra bytes works).
On the assumption that the code was originally well written (e.g. the developer decided to initialise all the variables nicely), then you may be able to guess the size using something like this:
#define MAGIC 0xCD
// allocate a big buffer
char temp_buffer[8092];
memset(temp_buffer, MAGIC, 8092);
// call ctor
PrivateClassCtor PrivateClassCtorFunc = PrivateClassCtor(DLLBase + 0x1c00);
PrivateClassCtorFunc(this, original);
// step backwards until we find a byte that isn't 0xCD.
// Might want to change the magic value and run again
// just to be sure (e.g. the original ctor sets the last
// few bytes of the class to 0xCD by coincidence.
//
// Obviously fails if the developer never initialises member vars though!
for(int i = 8091; i >= 0; --i) {
if(temp_buffer[i] != MAGIC) {
printf("class size might be: %d\n", i + 1);
break;
}
}
That's probably a decent guess, however the only way to be 100% sure would be to stick a breakpoint where you call the ctor, switch to assembly view in your debugger of choice, and then step through the assembly line by line to see what the max address being written to is.

SIGABRT when returning from a function?

I'm new to C++, coming from a python/kotlin background and so am having some trouble understanding what's going on behind the scenes here...
The Issue
I call the calculateWeights (public) method with its required parameters, it then calls a series of methods, including conjugateGradientMethod (private) and should return a vector of doubles. conjugateGradientMethod returns the vector of doubles to calculateWeights just fine, but calculateWeights doesn't return that to its caller:
The code
Callsite of calculateWeights:
Matrix cov = estimator.estimateCovariances(&firstWindow, &meanReturns);
cout << "before" << endl; // this prints
vector<double> portfolioWeights = optimiser.calculateWeights(&cov, &meanReturns);
cout << "after" << endl; // this does not print
Here's calculateWeights:
vector<double> PortfolioOptimiser::calculateWeights
(Matrix *covariances, vector<double> *meanReturns) {
vector<double> X0 = this->calculateX0();
Matrix Q = this->generateQ(covariances, meanReturns);
vector<double> B = this->generateB0();
vector<double> weights = this->conjugateGradientMethod(&Q, &X0, &B);
cout << "inside calculateWeights" << endl;
print(&weights); // this prints just fine
cout << "returning from calculateWeights..." << endl; // also prints
return weights; //this is where the SIGABRT shows up
The output
The output looks like this (I've checked and the weights it outputs are indeed numerically correct):
before
inside calculateWeights
1.78998
0.429836
-0.62228
-0.597534
-0.0365409
0.000401613
returning from calculateWeights...
And then nothing.
I appreciate this is printf debugging which isn't ideal and so I used Cion's debugger to find the following:
When I used CLion's debugger
I put a break point on the returns of the conjugateGradient method and calculateWeights methods. The debugger steppeed through the first one just fine. After I stepped over the return from calculateWeights, it showed me a SIGABRT with the following error:
Thread 1 "markowitzportfoliooptimiser" received signal SIGABRT, Aborted.
__gnu_cxx::new_allocator<std::vector<double, std::allocator<double> > >::deallocate (this=0x6, __p=0x303e900000762) at /usr/lib/gcc/x86_64-pc-cygwin/9.3.0/include/c++/ext/new_allocator.h:129
This is probably wrong but my first stab at understanding that is that I've overflowed over the size of vector<double> weights? It's only 6 doubles long and I never append anything to it after the loop below. This is how it's created inside conjugateGradientMethod:
How weights is created inside conjugateGradientMethod
vector<double> weights= vector<double>(aSize);
for (int i = 0; i < aSize; i++) {
weights[i] = aCoeff * a->at(i) + bCoeff* b->at(i);
}
Things I've tried
Initialising a vector for double weights in calculateWeights and passed a pointer to it to conjugateGradientMethod. Same result.
Having a public attribute on the class calculateWeights and conjugateGradientMethod both live on, and having it assign the weights to that (so both functions return void). Same result.
More generally, I've had this kind of issue before with passing up a return value from two functions deep. (If that makes sense?) ie passing from private method up to public method up to public method's callsite.
I'd be grateful for any advice on SIGABRT's in this context, I've read it's when abort() sends the calling process the SIGABRT signal, but am unsure how to make use of that in this example.
Also, I'm all ears for any other style/best practices that would help avoid this in future
Edit: Solution found
After much work, I installed and got Ubuntu 20.04 LTS up and running since I couldn't get the Address Sanitizer nor Valgrind to work via WSL on Windows 10 (first time on Linux - I kinda like it).
With Address Sanitizer now working, I was able to see that I was writing too many elements to a vector of doubles on two separate accounts, nothing to do with my weights vector as #Lukas Matena rightly spotted. Confusingly this was long before it ever got to the snippets above.
If anyone is finding this in the future, these helped me massively:
Heap Buffer Overflow
Heap vs Stack 1
Heap vs Stack 2
The error message says that it failed to deallocate an std::vector<double> when calculateWeights was about to return. That likely means that at least one of the local variables (which are being destroyed at that point) in the function is corrupted.
You seem to be focusing on weights, but since the attempts that you mention have failed, I would rather suspect X0 or B (weights is maybe not even deallocated at that point due to return value optimization).
Things you can try:
start using an address sanitizer like others have suggested
comment out parts of the code to see if it leads you closer (in other words, make a minimal example)
make one of the vectors a member variable so it is not destroyed at that point (not a fix, but it might give a clue about who is the offender)
You're likely doing something bad to the respective vector somewhere, possibly in calculateX0 or generateB0 (which you haven't shared). It may be delete-ing part of the vector, returning a reference to a temporary instead of a copy, etc. The SIGABRT at that return is where you were caught, but memory corruption issues often surface later than they're caused.
(I would have made this shorter and posted as a comment but I cannot as a newbie. Hopefully it will count as an "advice on SIGABRT's in this context", which is what was in fact asked for).

Getting sementation fault (core dumped)

Everything seems to run okay up until the return part of shuffle_array(), but I'm not sure what.
int * shuffle_array(int initialArray[], int userSize)
{
// Variables
int shuffledArray[userSize]; // Create new array for shuffled
srand(time(0));
for (int i = 0; i < userSize; i++) // Copy initial array into new array
{
shuffledArray[i] = initialArray[i];
}
for(int i = userSize - 1; i > 0; i--)
{
int randomPosition = (rand() % userSize;
temp = shuffledArray[i];
shuffledArray[i] = shuffledArray[randomPosition];
shuffledArray[randomPosition] = temp;
}
cout << "The numbers in the initial array are: ";
for (int i = 0; i < userSize; i++)
{
cout << initialArray[i] << " ";
}
cout << endl;
cout << "The numbers in the shuffled array are: ";
for (int i = 0; i < userSize; i++)
{
cout << shuffledArray[i] << " ";
}
cout << endl;
return shuffledArray;
}
Sorry if spacing is off here, not sure how to copy and past code into here, so I had to do it by hand.
EDIT: Should also mention that this is just a fraction of code, not the whole project I'm working on.
There are several issues of varying severity, and here's my best attempt at flagging them:
int shuffledArray[userSize];
This array has a variable length. I don't think that it's as bad as other users point out, but you should know that this isn't allowed by the C++ standard, so you can't expect it to work on every compiler that you try (GCC and Clang will let you do it, but MSVC won't, for instance).
srand(time(0));
This is most likely outside the scope of your assignment (you've probably been told "use rand/srand" as a simplification), but rand is actually a terrible random number generator compared to what else the C++ language offers. It is rather slow, it repeats quickly (calling rand() in sequence will eventually start returning the same sequence that it did before), it is easy to predict based on just a few samples, and it is not uniform (some values have a much higher probability of being returned than others). If you pursue C++, you should look into the <random> header (and, realistically, how to use it, because it's unfortunately not a shining example of simplicity).
Additionally, seeding with time(0) will give you sequences that change only once per second. This means that if you call shuffle_array twice quickly in succession, you're likely to get the same "random" order. (This is one reason that often people will call srand once, in main, instead.)
for(int i = userSize - 1; i > 0; i--)
By iterating to i > 0, you will never enter the loop with i == 0. This means that there's a chance that you'll never swap the zeroth element. (It could still be swapped by another iteration, depending on your luck, but this is clearly a bug.)
int randomPosition = (rand() % userSize);
You should know that this is biased: because the maximum value of rand() is likely not divisible by userSize, you are marginally more likely to get small values than large values. You can probably just read up the explanation and move on for the purposes of your assignment.
return shuffledArray;
This is a hard error: it is never legal to return storage that was allocated for a function. In this case, the memory for shuffledArray is allocated automatically at the beginning at the function, and importantly, it is deallocated automatically at the end: this means that your program will reuse it for other purposes. Reading from it is likely to return values that have been overwritten by some code, and writing to it is likely to overwrite memory that is currently used by other code, which can have catastrophic consequences.
Of course, I'm writing all of this assuming that you use the result of shuffle_array. If you don't use it, you should just not return it (although in this case, it's unlikely to be the reason that your program crashes).
Inside a function, it's fine to pass a pointer to automatic storage to another function, but it's never okay to return that. If you can't use std::vector (which is the best option here, IMO), you have three other options:
have shuffle_array accept a shuffledArray[] that is the same size as initialArray already, and return nothing;
have shuffle_array modify initialArray instead (the shuffling algorithm that you are using is in-place, meaning that you'll get correct results even if you don't copy the original input)
dynamically allocate the memory for shuffledArray using new, which will prevent it from being automatically reclaimed at the end of the function.
Option 3 requires you to use manual memory management, which is generally frowned upon these days. I think that option 1 or 2 are best. Option 1 would look like this:
void shuffle_array(int initialArray[], int shuffledArray[], int userSize) { ... }
where userSize is the size of both initialArray and shuffledArray. In this scenario, the caller needs to own the storage for shuffledArray.
You should NOT return a pointer to local variable. After the function returns, shuffledArray gets deallocated and you're left with a dangling pointer.
You cannot return a local array. The local array's memory is released when you return (did the compiler warn you about that). If you do not want to use std::vector then create yr result array using new
int *shuffledArray = new int[userSize];
your caller will have to delete[] it (not true with std::vector)
When you define any non static variables inside a function, those variables will reside in function's stack. Once you return from function, the function's stack is gone. In your program, you are trying to return a local array which will be gone once control is outside of shuffle_array().
To solve this, either you need to define the array globally (which I won't prefer because using global variables are dangerous) or use dynamic memory allocation for the array which will create space for the array in heap rather than allocating the space on the function's stack. You can use std::vectors also, if you are familiar with vectors.
To allocate memory dynamically, you have to use new as mentioned below.
int *shuffledArray[] = new int[userSize];
and once you completed using shuffledArray, you need to free the memory as below.
delete [] shuffledArray;
otherwise your program will leak memory.

c++ error int i = 0 not working

For some reason in my c++ code, int i= 0 is not working. In debug mode, it stores 1024 into i after it goes out of scope. When I try to declare it to 0, it doesn't work. i is not a global variable.
void CCurrent::findPeaks()//this checks for peaks in the FFT
{
int localPeakIndx[256]={};
int j=0;
for(int i=0;i<DATASIZE/2;i++)
{
if(magnitude[i]>m_currentValues.AverageMagnitude*3) {
localPeakIndx[j]=i;
j++;
}
}
//i should be out of scope
int startIndx;
int endIndx;
int i = 0;//why does it equal 1024?????? debug*
It is not clear what you mean by "int i= 0 is not working" and "it stores 1024 into i after it goes out of scope". That description is unrealistic and most likely does not take place. After int i = 0 the value of i will be seen as 0, not as 1024.
What you have to keep in mind in this case is that your program has two i variables: one local to for cycle and one declared further down in the surrounding block. These are two different, completely unrelated variables.
Now, MS Visual Studio debugger will show you the values of all variables declared in the current block, even if the execution has not yet reached their point of declaration. Expectedly, variables whose declaration (and initialization) point has not been reached yet are shown with indeterminate (uninitialized) values.
Apparently, this is exactly what happens in your case. Your are looking at the value of outer i before its point of declaration and initialization. Even though name i is not yet visible immediately after the for cycle from the language point of view, the debugger will nevertheless give you "early access" to the outer i. The garbage value of that i at that point just happened to be 1024 in your experiment.
Note that there's certain definitive logic in this behavior. In general case it is not correct to assume that a local object does not exist above the point of its declaration. Being located above the point of declaration in spatial terms does not necessarily mean being located before the point of declaration in temporal terms, which becomes clear once you start using such beautiful feature of the language as goto.
For example, this piece of code
{
int *p = NULL;
above:
if (p != NULL) {
std::cout << *p << std::endl;
goto out;
}
int i = 10;
p = &i;
goto above;
out:;
}
is legal and has perfectly defined behavior. It supposed to output the valid initialized value of i through a valid pointer p, even though the output takes place above the i's point of declaration. I.e. it is possible to end up in the situation when the current execution point is located above the object declaration, yet the object already exists and holds a determinate value. To help you with debugging in such situations the debugger allows you to see the values of all objects that may potentially exist at that point, which includes all objects declared in the current block. In general case the debugger does not know which objects formally exist already and which do not, so it just lets you see everything.
Just a small refinement of what was already said:
The "outer i" is given space on the stack along with all other local variables when the function is entered.
Since the "outer i" hasn't been given a value before the end of the for loop, the debugger is looking on the stack where the value for i ought to be, and showing you what happens to be there.
This comes from the fact that local variables are not initialized.
I bet if you tried to access i from C before the declaration, the compiler would tag that as an error. But the debugger doesn't have such scruples.

Global Arrays in C++

Why the array is not overflowed (e.g. error alert) when the array is declared globally, in other why I'm able to fill it with unlimited amount of elements (through for) even it's limited by size in declaration and it does alert when I declare the array locally inside the main ?
char name[9];
int main(){
int i;
for( int i=0; i<18; ++i){
cin>>name[i];
}
cout<<"Inside the array: ";
for(i=0; i<20; i++)
cout<<name[i];
return 0;
}
C++ does not check bounds errors for arrays of any kind. Reading or writing outside of the array bounds causes what is known as "undefined behaviour", which means anything could happen. In your case, it seems that what happens is that it appears to work, but the program will still be in an invalid state.
If you want bounds checking, use a std::vector and its at() member function:
vector <int> a( 3 ); // vector of 3 ints
int n = a.at( 0 ); // ok
n = a.at( 42 ); // throws an exception
C++ does not have array bounds checking so the language never check to see if you have exceeded the end of your array but as others have mentioned bad things can be expected to happen.
Global variables exists in the static segment which is totally separate from your stack. It also does not contain important information like return addresses. When you exceed an array's boundaries you are effectively corrupting memory. It just so happens that corrupting the stack is more likely to cause more visible bad things than corrupting the data segment. All of this depends on the way your operating system organizes a process's memory.
its undefined behavior. Anything can happen.
You cannot assume too much about the memory layout of variables. It may run on your computer with these parameters, but totally fail when you increase your access bounds, or run that code on another machine. So if you seriously want to write code, don't let this become a habit.
I'd go one step further and state that C/C++ do not have arrays. What they have is array-like syntactic sugar that is immediately translated to pointer arithmetic, which cannot be checked, as pointers can be used to access potentially all of memory. Any checking that the compiler may manage to perform based on static sizes and constant bounds on an index is a happy accident, but you cannot rely on it.
Here's an oddity that stunned me when I first saw it:
int a[10], i;
i = 5;
a[i] = 42; // Looks normal.
5[a] = 37; // But what's this???
std::cout << "Array element = " << a[i] << std::endl;
But the odd-looking line is perfectly legal C++. This example emphasizes that arrays in C/C++ are a fiction.
Neil Butterworth already commented on the benefits of using std::vector and the at() access method for it, and I cannot second his recommendation strongly enough. (Unfortunately, the designers of STL blew a golden opportunity to make checked access the [] operators, with the at() methods the unchecked operators. This has probably cost the C++ programming community millions of hours and millions of dollars, and will continue to do so.)