C++ - Initialisation of empty array of objects sending bad access error - c++

I am trying to create an array of a object Cell in main function,
int nc=26 730 899;
Cell c[nc];
and Cell is a object which I am constructing with no arguments in.
// Constructor
Cell::Cell(){};
when nc is relatively low it works fine. The problem is when it takes big numbers, like the one in example, where it compiles, but retrieves a bad access error.
Does it mean my computer has no further memory, what kind of memory is this and how can work around this issue?
I am trying to develop a program to run computational fluid dynamics problems using finite volumes method, where each cell is an object so I will need tones of cells!
In the example (3D) I am just trying 299 cells in x by 299 in y by 299 in z = 26 730 899, which is very short yet.
Maybe my approach is being performed the wrong way!?
I am completely new to c++, so keep it as simple as possible, pleaseee. :)
Thank you all.
Note:
I don't know if is it relevant, but I am running the code in Xcode on a MacBookPro from 2010.

Does it mean my computer has no further memory,
Your compiler, unless told otherwise, produces executable programs in which the total size of objects with what the C++ language calls automatic storage duration is limited in a way that makes sense for the operating system on which the program should run.
You haven't shown your full code, but your array apparently has automatic storage duration.
Now, C++ automatic storage duration is usually implemented with something called the stack. How that stack works, how it is limited and how you can change those limits for your program is all implementation-specific.
What matters here is that you just shouldn't create huge objects with automatic storage duration. It's not made for that purpose.
and how can work around this issue?
Use dynamic storage duration and put the objects on the free store with std::vector:
int nc = 26730899;
std::vector<Cell> c(nc);
[*] Your code is also using a non-portable GCC extension, because in standard C++, an array's size must be fixed at compile time, but that doesn't matter much for the problem at hand; you'd probably get the same error if nc was const.

Related

constexpr pointers and memory management in C++

Quoting from C++ Primer:
The address of an object defined outside of any function is a constant expression, and so may be used to initialize a constexpr pointer.
In fact, each time I compile and run the following piece of code:
#include <iostream>
using namespace std;
int a = 1;
int main()
{
constexpr int *p = &a;
cout << "p = " << p << endl;
}
I always get the output:
p = 0x601060
Now, how is that possible? How can the address of an object (global or not) be known at compile time and be assigned to a constexpr? What if that part of the memory is being used for something else when the program is executed?
I always assumed that the memory is managed so that a free portion is allocated when a program is executed, but doesn't matter what particular part of the memory. However, since here we have a constexpr pointer, the program will always require a specific portion, that has to be free to allow the program execution. This doesn't make sense to me, could someone explain this behaviour please? Thanks.
EDIT: After reading your answers and a few articles online, I realized that I missed the whole concept of virtual memory... now it makes sense. It's quite surprising that neither C++ Primer nor Accelerated C++ mention this concept (maybe they will do it in later chapters, I'm still reading...).
However, quoting again C++ Primer:
A constant expression is an expression whose value cannot change and that can be evaluated at compile time.
Given that the linker has a major role in computing the fixed address of global objects, the book would have been more precise if it said "constant expression can be evaluated at link time", not "at compile time".
It's not actually true that the address of an object is known at compile time. What is known at compile time is the offset. When the program is compiled, the address is not emitted into the object file, but a marker to indicate the offset and the section.
To be simplistic about it, the linker then comes along, measures the size of each section, stitches them together and calculates the address of each marker in each object file now that it has a concrete 'base address' for each section.
Of course it's not quite that simple. A linker can also emit a map of the locations of all these adjusted values in its output, so that a loader or load-time linker can re-adjust them just prior to run time.
The point is, logically, for all intents and purposes, the address is a constant from the program's point of view. It's just that the constant isn't given a value until link/load time. When that value is available, every reference to that constant is overwritten by the linker/loader.
If your question is "why is it always the same address?" It's because your OS uses a standard virtual memory layout layered over the virtual memory manager. Addresses in a process are not real memory addresses - they are logical memory addresses. The piece of silicon at that 'address' is mapped in by the virtual memory management circuitry. Thus each process can use the "same" address, while actually using a different area of the memory chips.
I could go on about paging memory in and out, which is related, but it's a long topic. Further reading is encouraged.
It works because global variables are in static storage.
This is because the space for the global/static variable is allocated at compile time within the binary your compiler generates, in a region next to the program's machine code called the "data" segment. When the binary is copied and loaded into memory, the data segment becomes read-write.
This Wikipedia article includes a nice diagram of where the "data" segment fits into the virtual address space:
https://en.wikipedia.org/wiki/Data_segment
Automatic variables are not stored in the data segment because they may be instantiated as many times as their parent function is called. Moreover, they may be allocated at any depth of the stack. Thus it is not possible to know the address of an automatic variable at compile time in the general case.
This is not the case for global variables, which are clearly unique throughout the lifetime of the program. This allows the compiler to assign a fixed address for the variable which is separate from the stack.

Performance: should I use a global variable in a function which gets called often?

First off, let me get of my chest the fact that I'm a greenhorn trying to do things the right way which means I get into a contradiction about what is the right way every now and then.
I am modifying a driver for a peripheral which contains a function - lets call it Send(). In the function I have a timestamp variable so the function loops for a specified amount of time.
So, should I declare the variable global (that way it is always in memory and no time is lost for declaring it each time the function runs) or do I leave the variable local to the function context (and avoid a bad design pattern with global variables)?
Please bear in mind that the function can be called multiple times per milisecond.
Speed of execution shouldn't be significantly different for a local vs. a global variable. The only real difference is where the variable lives. Local variables are allocated on the stack, global variables are in a different memory segment. It is true that local variables are allocated every time you enter a routine, but allocating memory is a single instruction to move the stack pointer.
There are much more important considerations when deciding if a variable should be global or local.
When implementing a driver, try to avoid global variables as much as possible, because:
They are thread-unsafe, and you have no idea about the scheduling scheme of the user application (in fact, even without threads, using multiple instances of the same driver is a potential problem).
It automatically yields the creation of data-section as part of the executable image of any application that links to your driver (which is something that the application programmer might want to avoid).
Did you profile a fully-optimized, release build of your code and identify the bottleneck to be small allocations in this function?
The change you are proposing is a micro-optimization; a change to a small part of your code with the intent to make it more efficient. If the question to the above question is "no" as I'd expect, you shouldn't even be thinking of such things.
Select the correct algorithm for your code. Write your code using idiomatic techniques. Do not write in micro-optimizations. You might be surprised how good your compiler is at optimizing your code for you. It will often be able to optimize away these small allocations, but even if it can't you still don't know if the performance penalty imposed by them is even noticeable or significant.
For drivers, with is usually position independent, global variables are accessed indirectly with GOT table unless IP-relative operations is available (i.e. x86_64, ARM, etc)
In case of GOT, you can think it as an extra indirect pointer.
However, even with an extra pointer it won't make any observable difference if it's "only" called in mill-second frequency.

How to keep a static array out of memory until first used

I'm very new to C++ so I'm somewhat confused about how static arrays work. I know in C# the array isn't placed into memory until it's first accessed which can be problematic if you want it to be instantly accessible. However, I'm working on converting a Perlin class to C++ and I'd like to have multiple static arrays of which only one may be used during runtime or any number of them. In reality, it's not really that big of a memory issue as none of them will be more than 50kb, however, I'd rather know if it's possible to ensure the array isn't loaded into memory unless I ask for it. Is there a way to ensure a static array defined in source code isn't loaded into memory unless asked for? It's a pretty nitpicky thing (esp w/ x64), but I'd prefer to be as optimized about it as possible. I hate the idea of taking up memory with something that isn't going to be used.
Or maybe static arrays aren't even the way to go - just dynamical class object wrapped arrays?
I guess the real question is: what is the most efficient solution for implementing table-lookups in c++ that might not all be used?
Static arrays will be in your memory space, with no way to omit or free them, but that is not the same thing as 'in memory'. Leave it up to Windows virtual memory manager. When you first access an array Windows will bring it from disk into RAM.
No, you cannot do that: statically initialized structures and arrays in C++ are loaded into memory along with the rest of your code, so you cannot influence the time at which it gets loaded.
If you must load your static array at runtime, consider changing your strategy to placing the data into a separate file, and adding an initialization function to read the file into a static vector object. This strategy results in placing the data into the dynamic memory area, while the vector object itself could remain static.
Both Windows and Linux uses "demand loading", which means that code and data is loaded when the code reaching it actually needs the data. So assuming the data is constant and global (e.g. static const int x[number] = { ... }), the data will not be loaded. [The typical granularity for this is 4KB or some multiple thereof, but if you have, say, several hundred 50KB blocks of data that aren't being used, you should not see them in memory, and thus no delay in loading the program itself].
As always when it comes to performance and optimisations, it's best to NOT overcomplicate things by trying to predict problems in an area (aka "premature optimisation"), and make sure that what you think MAY be a problem actually is a problem before you optimise it.

Pointers in C++ : How large should an object be to need use of a pointer?

Often times I read in literature explaining that one of the use case of C++ pointers is when one has big objects to deal with, but how large should an object be to need a pointer when being manipulated? Is there any guiding principle in this regard?
I don't think size is the main factor to consider.
Pointers (or references) are a way to designate a single bunch of data (be it an object, a function or a collection of untyped bytes) from different locations.
If you do copies instead of using pointers, you run the risk of having two separate versions of the same data becoming inconsistent with each other. If the two copies are meant to represent a single piece of information, then you will have to do twice the work to make sure they stay consistent.
So in some cases using a pointer to reference even a single byte could be the right thing to do, even though storing copies of the said byte would be more efficient in terms of memory usage.
EDIT: to answer jogojapan remarks, here is my opinion on memory efficiency
I often ran programs through profilers and discovered that an amazing percentage of the CPU power went into various forms of memory-to-memory copies.
I also noticed that the cost of optimizing memory efficiency was often offset by code complexity, for surprisingly little gains.
On the other hand, I spent many hours tracing bugs down to data inconsistencies, some of them requiring sizeable code refactoring to get rid of.
As I see it, memory efficiency should become more of a concern near the end of a project, when profiling reveals where the CPU/memory drain really occurs, while code robustness (especially data flows and data consistency) should be the main factor to consider in the early stages of conception and coding.
Only the bulkiest data types should be dimensionned at the start, if the application is expected to handle considerable amounts of data. In a modern PC, we are talking about hundreds of megabytes, which most applications will never need.
As I designed embedded software 10 or 20 years ago, memory usage was a constant concern. But in environments like a desktop PC where memory requirements are most of the time neglectible compared to the amount of available RAM, focusing on a reliable design seems more of a priority to me.
You should use a pointer when you want to refer to the same object at different places. In fact you can even use references for the same but pointers give you the added advantage of being able to refer different objects while references keep referring the same object.
On a second thought maybe you are referring to objects created on freestore using new etc and then referring them through pointers. There is no definitive rule for that but in general you can do so when:
Object being created is too large to be accommodated on stack or
You want to increase the lifetime of the object beyond the scope etc.
There is no such limitation or guideline. You will have to decide it.
Assume class definition below. Size is 100 ints = 400 bytes.
class test
{
private:
int m_nVar[100];
};
When you use following function definition(passed by value), copy constructor will get called (even if you don't provide one). So copying of 100 ints will happen which will obviously take some time to finish
void passing_to_function(test a);
When you change definition of function to reference or pointer, there is no such copying will happen. Just transfer of test* (only pointer size)
void passing_to_function(test& a);
So you obviously have advantage by passing by ref or passing by ptr than passing by value!

C++ memory allocation errors without use of new

I am having issues with my program throwing a large number of memory allocation exceptions and I am having a very hard time diagnosing the problem...I would post code, but my program is very large and I have proprietary information concerns, so I am hoping to get some help without posting the code. If you plan on responding with some form of SSCCE comment, just stop reading now and save both of us some time. This is a case where I cannot post succinct code - I will try to be as clear and concise as possible with my problem description and some specific questions.
Program Background - my program is basically a data cruncher. It takes a bunch of data tables as inputs, performs calculations on them, and spits out new data tables based on the calculation results. All of my data structures are user-defined classes (consisting of int, double and string types with vector containers for arrays). In all cases, I initiate instances of class variables without the use of new and delete.
Problem Description - my program compiles without warnings, and runs fine on smaller datasets. However, once I increase the dataset (from a 20x80 array to 400x80), I start throwing bad_alloc exceptions (once I've processed the first 35 entries or so). The large datasets runs fine in 17 of my 18 modules - I have isolated one function where the errors are occurring. The calculations needed for this function would result in about 30,000 rows of data being created, whereas other functions in my code generate 800,000+ rows without incident.
The only real unique attribute in this module is that I am using resize a lot (about 100 times per function call), and that the function uses recursive loops during the resize operation (the function is allocating square feet out of a building one tenant at a time, and then updating the remaining feet to be allocated after each tenant lease size and duration is simulated, until all square feet are allocated). Also, the error is happening at nearly the same place each time (but not the exact same location because I have a random number generator that is throwing in some variation to the outcomes). What really confounds me is that the first ~34 calls to this function work fine, and the ~35 call does not require more memory than the previous 34, yet I am having these bad_alloc exceptions on the 35th call nonetheless...
I know it's difficult to help without code. Please just try to give me some direction. My specific questions are as follows:
If I am not using "new" and "delete", and all of my variables are being initialized INSIDE of local functions, is it possible to have memory leaks / allocation problems through repeated function calls? Is there anything I can or should do to manage memory when initializing variables include of local function using "vector Instance;" to declare my variables?
Is there any chance I am running low on stack memory, if I am doing the whole program through the stack? Is it possible I need to load some of my big lookup tables (maps, etc.) on the heap and then just use the stack for my iterations where speed is important?
Is there a problem with using resize a lot related to memory? Could this be an instance where I should use "new" and "delete" (I've been warned in many instances not to use those unless there is a very strong, specific reason to do so)?
[Related to 3] Within the problem function, I am creating a class variable, then writing over that variable about 20 times (once for each "iteration" of my model). I don't need the data from the previous iteration when I do this...so I could ostensibly create a new instance of the variable for each iteration, but I don't understand how this would help necessarily (since clearly I am able to do all 20 iterations on one instance on the first ~34 data slices)
Any thoughts would be appreciated. I can try to post some code, but I already tried that once and everyone seemed to get distracted by the fact that it wasn't compilable. I can post the function in question but it doesn't compile by itself.
Here is the class that is causing the problem:
// Class definition
class SpaceBlockRentRoll
{
public:
double RBA;
string Tenant;
int TenantNumber;
double DefaultTenantPD;
int StartMonth;
int EndMonth;
int RentPSF;
vector<double> OccupancyVector;
vector<double> RentVector;
};
// Class variable declaration (occuring inside function)
vector<SpaceBlockRentRoll> RentRoll;
Also, here is a snippet from the function where the recursion occurs
for (int s=1; s<=NumofPaths; ++s) {
TenantCounter = 0;
RemainingTenantSF = t1SF;
if (RemainingTenantSF > 0) {
while (RemainingTenantSF > 0) {
TenantCounter = TenantCounter + 1;
// Resize relevant RentRoll vectors
ResizeRentRoll(TenantCounter, NumofPaths, NumofPeriods, RentRoll);
// Assign values for current tenant
RentRoll[TenantCounter] = AssignRentRollValues(MP, RR)
// Update the square feet yet to be allocated
RemainingTenantSF = RemainingTenantSF - RentRoll[TenantCounter].RBA;
}
}
}
bad_alloc comes from heap problems of some kind, and can be thrown by any code that indirectly allocates or frees heap memory, which includes all the standard library collections (std::vector, std::map, etc) as well as std::string.
If your programs do not use a lot of heap memory (so they're not running out of heap), bad_allocs are likely caused by heap corruption, which is generally caused by using dangling pointers into the heap.
You mention that your code does a lot of resize operations -- resize on most collections will invalidate all iterators on the collection, so if you reuse any iterator after a resize, that may cause heap corruption that manifests bad_alloc exceptions. If you use unchecked vector element accesses (std::vector::operator[]), and your indexes are out of range, that can cause heap corruption as well.
The best way to track down heap corruption and memory errors in general is to use a heap debugger such as valgrind
Classes like std::vector and std::string are allowed to throw bad_alloc or other exceptions. After all, they have to use some memory that comes from somewhere, and any computer only has so much memory to go around.
Standard 17.6.5.12/4:
Destructor operations defined in the C++ standard library shall not throw exceptions. Every destructor in the C++ standard library shall behave as if it had a non-throwing exception specification. Any other functions defined in the C++ standard library that do not have an exception-specification may throw implementation-defined exceptions unless otherwise specified. [Footnote 1] An implementation may strengthen this implicit exception-specification by adding an explicit one.
Footnote 1: In particular, they can report a failure to allocate storage by throwing an exception of type bad_alloc, or a class derived from bad_alloc (18.6.2.1). Library implementations should report errors by throwing exceptions of or derived from the standard exception classes (18.6.2.1, 18.8, 19.2).
If I am not using "new" and "delete", and all of my variables are being initialized INSIDE of local functions, is it possible to have memory leaks / allocation problems through repeated function calls?
Unclear. If all the variables you refer to are local, no. If you're using malloc(), calloc(), and free(), yes.
Is there any chance I am running low on stack memory, if I am doing the whole program through the stack?
Not if you get bad_alloc. If you got a 'stack overflow' error, yes.
Is it possible I need to load some of my big lookup tables (maps, etc.) on the heap and then just use the stack for my iterations where speed is important?
Well, it's hard to believe that you need a local copy of a lookup table in every stack frame of a recursive method.
Is there a problem with using resize a lot related to memory?
Of course. You can run out.
Could this be an instance where I should use "new" and "delete"
Impossible today without knowing more about your data structures.
(I've been warned in many instances not to use those unless there is a very strong, specific reason to do so)?
By whom? Why?
Within the problem function, I am creating a class variable,
You are creating an instance of the class on the stack. I think. Please clarify.
then writing over that variable about 20 times (once for each "iteration" of my model).
With an assignment? Does the class have an assignment operator? Is it correct? Does the class itself use heap memory? Is it correctly allocated and deleted on construction, destruction, and assignment?
Since, as you said, you are using std::vector with default allocator, problem occurs when you use a lot of std::vector::resize(...) and it occurs after some iterations, my guess is that you run into heap fragmentation problem.