Effective use of stack vs. heap memory allocation in C++

Effective use of stack vs. heap memory allocation in C++ - c++

I am developing a large, complex model (mostly simple math, primarily algebra, but lots of calculations). I did my initial bid assuming I'd have to do everything once, but now the scope has expanded such that I need to run the entire model multiple times (on the same set of underlying assumptions but with a different dataset).
Based on my initial needs, I created a bunch of classes, then created dynamic instances of those classes in my main function, passing them by reference to each function as I went. That way, at the end of the main function, I can do all the necessary reporting / output once all of the functions have run. My question is about how to now modify my main function to allow for multiple iterations. A few sample bits of code below, followed by my question(s):
// Sample class declaration (in main)
vector<PropLevelFinancials> PLF;
// Sample function call (functions in other header files)
ProcessPropFinancials(vector<PropLevelFinancials>& PLF);
// Reporting at the end of main (functions in other header files)
OutputSummaryReport(vector<PropLevelFinancials>& PLF, other objects);
// What I need to do next
// Clear/Delete PLF and other objects, iterate through model again
I am quite happy with the speed and result of my current program, so don't need a whole lot of input on that front (although always welcome suggestions).
How should I implement the ability to cycle through multiple datasets (I obviously know how to do the loop, my question is about memory management)? Speed is critical. I want to essentially delete the existing instance of the class object I have created (PLF), and then run everything again on a new instance of the object(s). Is this the type of situation where I should use "new" and "delete" to manage the iterations? Would that change my function calls as outlined above? If I wanted to avoid using new and delete (stay on the stack), what are my options?

No. Do not, ever, use new and delete without a highly exceptional cause.
std::vector<T> offers a clear() member you can use. I suggest you implement a similar one for your own classes if that is what you need. Or you can simply do PLF = std::vector<T>();, which would work a bit better for your own UDTs without modification, assuming that you wrote them according to the most basic C++ guidelines.

Related

Why is there no clear() function for the inbuilt stack interface in C++?

I have to empty a stack before using it further more. I do understand that it can be done like:
while (!mystack.empty()) { mystack.pop(); }
Is there a specific reason for not having this function? or its just that the first time it was made, no one felt its requirement and has been just left out
Also, the stack interface in Java does have a clear() function.

While it would possibly be more readable to have an explicit .clear(), even without it you can empty a stack like this:
mystack = {};

As molbdnilo mentioned within the comments, you have to distinguish between standard containers and container adapters. std::stack is a container adapter, not a container. There are several reasons, why these adapters have to reduce their assumptions about characteristics of used inner container types as far as possible. A relevant one is time complexity (theoretical, accidental) for instance, that might differ a lot between possible underlying containers here. A further relevant aspect can be the requirement to be consistent to several access schemes within parallel working environments (parallel reads and writes), although that might not be relevant here specifically to the clear-functionality.
And in general, it follows a simple software design rule: Do not inflate top-level interfaces with too many assumptions about possible inner implementations and possible usage-scenarios that might occur for your data type but are not directly related to its core-characteristics. Directly clearing an 'abstract' stack can introduce a lot of confusion in doubt and error-prone misusage of objects of this type since a stack often represents more than just a simple partial ordering, it commonly represents a history. Semantically, a direct clear-operation can be seen as a design attack for several stack-related scenarios here: "Forget what I've done and thought so far with and about the stack, let's try something totally different..." Re-assigning is the superior approach therefore here in terms of proprtionality between the issues mentioned here since you explicitly introduce a totally new object (while the previous one might still live within a shared_ptr for instance somewhere else, unaffected by the clearing if required).

Run a while loop until the stack is empty to pop all the elements
while(!stack.empty()){
stack.pop();
}

Programming a basic in-application memory profiling: how to count pointers that are created?

I have an application in C++ for which I want to implement basic memory profiling capabilities.
For the most important and memory consuming classes, I included in the constructors, copy constructors and destructors, some code that calculates and saves the amount of memory used by each instance of the class (the code runs only when a macro #MEMPROFILE is defined. Something like:
class MyClass
{
MyClass(){
#ifdef MEMPROFILE
calcAndSaveMemUsage();
#endif
}
...
}
Analogously, the destructor updates the counters accordingly. That way, whenever the macro for profiling is defined and a new instance is created like MyClass obj, memory consumption info is automatically processed.
Are there ways to automatically do a similar thing to pointers - either in general, or the pointers of that particular class? For intance, to automatically increase a counter every time a pointer is created?
I would be interested in learning more about any ways to achieve that, be it with overloading, wrapping or instrumentation.
PS: I know external tools that would help me profile memory. I am interested in learning that particular thing asked above.

No, that's not possible. Pointers are not user-defined types. Also, they can be copied by memcpy which means the compiler might not even know it's copying them. Behind the scenes, std::copy may also use memcpy where possible, so it's not just explicit calls to memcpy which would trip you up.

Behavior of creating objects in ColdFusion

At one time I had a theory that instantiating objects on every request rather than having them reside in the Application scope was a huge memory hog. As my knowledge of ColdFusion has grown over the years, I don't think I really understood how CF deals with classes in the "black box" of the CF framework, so I'm going to ask this for community correction or confirmation.
I'm just going to throw out what I think is happening:
A CFC is compiled into a class, each method within that CFC is compiled into a class.
Those classes will reside in (PermGen) memory and can be written to disk based on CF administrator settings.
When a new object is created or template requested, the source code is hashed and compared to the hash stored with the compiled class.
If there is a match, it will use the compiled class in memory
If the compiled class doesn't exist, it will compile from source
If the compiled class exists, but the hash doesn't match, it will recompile.
As an aside, whenever you enable trusted cache, ColdFusion will no longer hash the source to check for differences and will continue to use the compiled class in memory.
Whenever you create a new object, you get a new pointer to the compiled class and its methods' classes and any runtime events occur in the pseudo-constructor. Edit: At this point, I'm referring to using createObject and having any "loose" code outside of functions run. When I say pointer, I mean the reference to memory allocated for the object's scopes (this, variables, function variables).
If you request an init, then the constructor runs. The memory consumed at this point is just your new reference and any variables set in the pseudo-constructor and constructor. You are not actually taking up memory for a copy of the entire class. Edit: For this step I'm referring to using the new operator or chaining your createObject().init() old school.
This eliminates a huge fallacy that I, personally, might have heard over the years that instantiating large objects in every request is a massive memory hog (due to having a copy of the class rather than just a reference). Please note that I am not in favor of this, the singleton pattern is amazing. I'm just trying to confirm what is going on under the hood to prevent chasing down red herrings in legacy code.
Edit: Thanks for the input everyone, this was a really helpful Q/A for me.

I've been developing CF for 14 years and I've never heard anyone claim that creating CFC instances on each request consumed memory due to class compilation. At the Java level, your CFML code is direct compiled to bytecode and stored as Java classes in memory and on disk. Java classes are not stored in the heap, but rather in the permanent generation which is not (usually) a collected memory space. You can create as many instances of that CFC and no more perm gen space will be used, however heap space will be allocated to store the instance data for that CFC for the duration of its existsance. Note, open source Railo does not use separate classes for methods.
Now, if you create a very large amount of CFC instances (or any variable) for that matter, that will create a lot of cruft in your heap's young generations. As long as hard references are not held after the request finishes, those objects will be cleared from the heap when the next minor garbage collection runs. This isn't necessarily a bad thing, but heap sizes and GC pauses should always be taken into account when performance tuning an application.
Now, there are reasons to persist CFC instances, either as a singleton pattern or for the duration of a session, request, etc. One reason is the overhead of actual object creation. This often involves disk I/O to check last modified times. Object creation has increased speed significantly since the old days, but is still pretty far behind native Java if you're going to be creating thousands of instances. The other main reason is for your objects to maintain state over the life of the application/session/request such as a shopping cart stored in session while the user shops.
And for completeness, I'll attempt to address your points categorically:
For Adobe CF yes, for Railo, methods are inner classes
Yes.
Actually, I don't believe there is any hashing involved. It's all based on the datetime last modified on the source file.
Yes, but again, no hashing-- it just skips the disk I/O to check the last modified datetime
I don't think "pointer" is the right term as that implies the Java classes actually live in the heap. CF uses a custom URL classloader to load the class for the template and then an INSTANCE of that class is created and stored in the heap. I can understand how this may be confusing as CFML has no concept of "class". Everything is simply an instance or doesn't exist at all. I'm not sure what you mean by "runtime events occur[ing] in the pseudo-constructor".
To be clear, the JAVA constructor already ran the instant you created the CFC. The CF constructor may be optional, but it has zero bearing on the memory consumed by the CFC instance. Again, I think you're getting unnecessarily hung up on the pseudo-constructor as well. That's just loose code inside the component that runs when it is created and has no bearing on memory allocated in the heap. The Java class is never copied, it is just the template for the instance.

Boost shared_ptr use_count function

My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.

My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.

If you want to know whether or not the use count is 1, use the unique() member function.

I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.

I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.

I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.

How to get all datatype sizes and function stack footprint sizes in a C/C++ project?

I have a large inherited C/C++ project. Are there any good tools or techniques to produce a report on the "sizeof" of all the datatypes, and a breakdown of the stack footprints of each function in such a project.

I'm curious to know why you want to do this, but that's merely a curiosity.
Determining the sizeof for every class used should be simple, unless they've been templated, in which case you'd have to check every instantiation, also.
Likewise, determining the per call sizeof on a function is simple: it's a sizeof on each passed parameter plus some function overhead.
To determine the full memory usage of the whole program, if it's not all statically defined, couldn't be done without a runtime profiler.
Writing a shell scrip that would collect all the class names into a file would be pretty simple. That file could be constructed as a .cpp file that was a series of calls to sizeof on each class. If the file also #included each header file, it could be compiled and run to get an output of the memory footprint of just the classes.
Likewise, culling all of the function definitions to see when they're not using reference or pointer arguments (ie copying the entire class instance onto the stack) should be pretty straight-forward.
All this goes to say that I know of no existing tool, but writing one shouldn't be difficult.

I'm not aware of any tools, but if you're working under MSVC you can use DIA SDK to extract size information from .PDB files. Sadly, this wont work for stack footprints IIRC.

I'm not sure if the concept of the stack footprint actually exists with modern compilers. That is to say, I think that determining the amount of stack space used depends on the branches taken, which in turn depends on input parameters, and in general requires solving the halting problem.

I am looking for the same information about stack footprint for functions, and I dont believe what warren said is true. Yes, part of what impacts the stack in a function is the parameters, but I've also found that every local variable in a function, regardless of the scoping of said variable, is used to determine the amount of stack space to reserve for the function.
In the particular poor code example I am working with, there are >200 local class instances, each guarded by if (blah-blah) clauses, but the stack space reserved is modified by these guarded local variables.
I know what I need is to be able to read the function prologue for each method to determine the amount of space being reserved for the function, now how would I do that....?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js