Track memory allocation per function

Track memory allocation per function - c++

So I know I can track memory allocation with methods of overloading new globally like so: http://www.almostinfinite.com/memtrack.html
However, I was wondering if there was a good way to do this per function so I can get a report of how much is allocated per function. Right now I can get file and lines and what the typeid is as in the link I provided but I would like to find which function is allocating the most.

What about doing something like: http://ideone.com/Wqjkrw
#include <iostream>
#include <cstring>
class MemTracker
{
private:
static char func_name[100];
static size_t current_size;
public:
MemTracker(const char* FuncName) {strcpy(&func_name[0], FuncName);}
static void inc(size_t amount) {current_size += amount;}
static void print() {std::cout<<func_name<<" allocated: "<<current_size<<" bytes.\n";}
static void reset() {current_size = 0; memset(&func_name[0], 0, sizeof(func_name)/sizeof(char));}
};
char MemTracker::func_name[100] = {0};
size_t MemTracker::current_size = 0;
void* operator new(size_t size)
{
MemTracker::inc(size);
return malloc(size);
}
void operator delete(void* ptr)
{
free(ptr);
}
void FuncOne()
{
MemTracker(__func__);
int* i = new int[100];
delete[] i;
i = new int[200];
delete[] i;
MemTracker::print();
MemTracker::reset();
}
void FuncTwo()
{
MemTracker(__func__);
char* c = new char[1024];
delete[] c;
c = new char[2048];
delete[] c;
MemTracker::print();
MemTracker::reset();
}
int main()
{
FuncOne();
FuncTwo();
FuncTwo();
FuncTwo();
return 0;
}
Prints:
FuncOne allocated: 1200 bytes.
FuncTwo allocated: 3072 bytes.
FuncTwo allocated: 3072 bytes.
FuncTwo allocated: 3072 bytes.

What platform are you using? There might be platform specific solutions without changing the functions in your code base.
If you are using Microsoft Visual Studio, you can use compiler switches /Gh and /GH to let the compiler call functions _penter and _pexit that you can define. In those functions, you can query how much memory the program is using. There should be enough information in there to figure out how much memory is allocated in each function.
Example code for checking memory usage is provided in this MSDN article.

Related

std::fstream crashes after main while using pre-allocated memory

I currently have a std::ofstream being created on the stack, and when it allocates off a global operator new that assigns it memory from a pre-allocated buffer the program will crash after main completes citing std::locale0 : line 59. Read Access Violation. nodeptr was 0x... as the program crash point. nodeptr's memory address is a real address. I have no idea why this is happening, and I can only assume it was because I misunderstand what the allocations are actually doing.
This behaviour happens on a release build tested on MSVC Version 19.10.25019 x86, building on debug has the program complete without a crash. Dr. Memory reports no leaks in debug mode.
Minimal Code:
#include <fstream>
#include <cstdlib>
std::uint8_t *byte = static_cast<std::uint8_t*>(malloc(134217728)); // 134217728 = 128 MiB
void *operator new(size_t bytes) throw(std::bad_alloc)
{
return static_cast<void*>(byte); // Tested empirically, and this is only called once so this shouldnt be a cause of a problem
}
void operator delete(void *memory) throw()
{}
int main()
{
std::ofstream out("abc");
free(byte);
}

There are two obvious bugs:
What if operator new is called more than once? What if the constructor of out constructs a sub-object?
You free byte while out is still in scope. When out's destructor runs, you've already returned its memory to the system.
Try this:
#include <fstream>
#include <cstdlib>
std::uint8_t *byte = static_cast<std::uint8_t*>(malloc(134217728)); // 134217728 = 128 MiB
void *operator new(size_t bytes) throw(std::bad_alloc)
{
static int offset = 0;
void * ret = static_cast<void*>(byte + offset);
offset += 16 * ((bytes + 15) / 16);
return ret;
}
void operator delete(void *memory) throw()
{}
int main()
{
{
std::ofstream out("abc");
}
free(byte);
}

It appears as it is a Visual Studio bug regarding custom allocations and std::locale.

Override global new/delete and malloc/free with tcmalloc library

I want to override new/delete and malloc/free. I have tcmalloc library linked in my application. My aim is to add stats.
From new I am calling malloc. Below is an example it's global.
void* my_malloc(size_t size, const char *file, int line, const char *func)
{
void *p = malloc(size);
....
....
....
return p;
}
#define malloc(X) my_malloc(X, __FILE__, __LINE__, __FUNCTION__)
void *
operator new(size_t size)
{
auto new_addr = malloc(size);
....
...
return new_addr;
}
New/delete override is working fine.
My question is what happen to other file where I have use malloc directly for example
first.cpp
malloc(sizeof(..))
second.cpp
malloc(sizeof(..))
How this malloc call get's interpret as my macro is not in header file.

tcmalloc provides new/delete hooks that can be used to implement any kind of tracking/accounting of memory usage. See e.g. AddNewHook in https://github.com/gperftools/gperftools/blob/master/src/gperftools/malloc_hook.h

How much memory does boost singleton_pool::malloc return?

My code is segfaulting for some reasons.
Here is the relevant code.
typedef boost::singleton_pool<httpHandler,sizeof(httphandler)> httpHandlerpool;
typedef boost::singleton_pool<Conn ,sizeof(Conn)> connpool;
void *httphandler::operator new(size_t size)
{
return httpHandlerpool::malloc();
}
//there is a corresponding delete as well.
void *Conn::operator new(size_t size)
{
return connpool::malloc();
}
//there is a corresponding delete as well.
Conn* httpHandler::getFreeConn()
{
Conn *c=0;
c = new Conn(); //
memset(c,0,sizeof *c); // This is where looks like some issue is there.
return c
}
In this code when I do a new Conn , how much memory will be allocated? Is it 'sizeof struct Conn`?
What am I doing wrong with memset? Is it necessary?
Thanks.

Can't the runtime size of instances of a class with virtual methods be optimized more by g++?

I just checked the size of a class containing dozens of virtual methods with g++ (4.7), because I had heard pointers are used for virtual methods and I thought that would be a terrible implementation, as it would take up 80 bytes for each instance of a class with a mere 10 virtual methods that way on my system.
To my relief, sizeof(<insert typename here>) returned only 8 bytes, the size of a pointer on my system. I assume this means that it stores a pointer to the vtable, rather than each method and that I simply misunderstood what people were saying (or perhaps that most compilers are stupid).
However, before I finally tested this, I had been struggling with using virtual methods as pointers the way I expected them to work. I noticed that the address was in fact a relatively very low number, often under 100 and with a difference of 8 bytes compared to other ones, so I assumed it was some sort of index for an array. And then I went pondering about how I would implement vtables myself, and that would not be using a pointer, as the results of my test clearly show. I was surprised to see it use a whole 8 bytes (I verified whether or not it was just padding by inserting a char field, which returned 16 bytes with sizeof).
Instead, I would implement this by storing an array index (for example 4 bytes, or even 2 if 65536 or less classes with virtual methods are used) which would be searched for in a look-up table containing pointers to the vtables, and find it that way. So why is a pointer stored? For performance reasons, or did they simply reuse the code for 32-bit operating systems (as it would make no difference in memory size there)?
Thank you in advance.
edit:
Someone requested me to calculate the actual memory saved, and I decided to make a code example. Unfortunately, it became quite big (they requested me to use 10 virtual methods in both), but I tested it and it actually works. Here it comes:
#include <cstdio>
#include <cstdlib>
/* For the singleton lovers in this community */
class VirtualTableManager
{
unsigned capacity, count;
void*** vtables;
public:
~VirtualTableManager() {
delete vtables;
}
static VirtualTableManager& getInstance() {
static VirtualTableManager instance;
return instance;
}
unsigned addElement(void** vtable) {
if (count == capacity)
{
vtables = (void***) realloc(vtables, (capacity += 0x2000) * sizeof(void**)); /* Reserves an extra 64KiB of pointers */
}
vtables[count] = vtable;
return count++;
}
void** getElement(unsigned index) {
return index < capacity ? vtables[index] : 0; /* Just in case: "Hey guys, let's misuse the API!" */
}
private:
VirtualTableManager() : capacity(0), count(0), vtables(0) { }
VirtualTableManager(const VirtualTableManager&);
void operator =(const VirtualTableManager&);
};
class Real
{
public:
short someField; /* This is required to show the difference, because of padding */
Real() : someField(0) { }
virtual ~Real() {
printf("Real::~Real()\n");
}
virtual void method0() {
printf("Real::method0()\n");
}
virtual void method1(short argument) {
someField = argument;
}
virtual short method2() {
return someField;
}
virtual void method3() { }
virtual void method4() { }
virtual void method5() { }
virtual void method6() { }
virtual void method7() { }
virtual void method8() { }
};
class Fake
{
static void** vtable;
static unsigned classVIndex; /* Don't know what to call it, please forgive me for the lame identifier */
public:
unsigned instanceVIndex;
short someField;
Fake() : instanceVIndex(classVIndex), someField(0) { }
~Fake() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[9])(this);
}
void method0() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[0])(this);
}
void method1(short argument) {
reinterpret_cast<void (*)(Fake*, short argument)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[1])(this, argument);
}
short method2() {
return reinterpret_cast<short (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[2])(this);
}
void method3() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[3])(this);
}
void method4() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[4])(this);
}
void method5() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[5])(this);
}
void method6() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[6])(this);
}
void method7() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[7])(this);
}
void method8() {
reinterpret_cast<void (*)(Fake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[8])(this);
}
protected:
Fake(unsigned instanceVIndex, short someField)
: instanceVIndex(instanceVIndex), someField(someField) { }
/* The 'this' keyword is an automatically passed pointer, so I'll just manually pass it and identify it as 'self' (thank you, lua, I would have used something like 'vthis', which would be boring and probably incorrect) */
static void vmethod0(Fake* self) {
printf("Fake::vmethod0(%p)\n", self);
}
static void vmethod1(Fake* self, short argument) {
self->someField = argument;
}
static short vmethod2(Fake* self) {
return self->someField;
}
static void vmethod3(Fake* self) { }
static void vmethod4(Fake* self) { }
static void vmethod5(Fake* self) { }
static void vmethod6(Fake* self) { }
static void vmethod7(Fake* self) { }
static void vmethod8(Fake* self) { }
static void vdestructor(Fake* self) {
printf("Fake::vdestructor(%p)\n", self);
}
};
class DerivedFake : public Fake
{
static void** vtable;
static unsigned classVIndex;
public:
DerivedFake() : Fake(classVIndex, 0) { }
~DerivedFake() {
reinterpret_cast<void (*)(DerivedFake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[1])(this);
}
void method0() {
reinterpret_cast<void (*)(DerivedFake*)>(VirtualTableManager::getInstance().getElement(instanceVIndex)[0])(this);
}
protected:
DerivedFake(unsigned instanceVIndex, short someField)
: Fake(instanceVIndex, someField) { }
static void vmethod0(DerivedFake* self) {
printf("DerivedFake::vmethod0(%p)\n", self);
}
static void vdestructor(DerivedFake* self) {
printf("DerivedFake::vdestructor(%p)\n", self);
Fake::vdestructor(self); /* call parent destructor */
}
};
/* Make the vtable */
void** Fake::vtable = (void*[]) {
(void*) &Fake::vmethod0, (void*) &Fake::vmethod1,
(void*) &Fake::vmethod2, (void*) &Fake::vmethod3,
(void*) &Fake::vmethod4, (void*) &Fake::vmethod5,
(void*) &Fake::vmethod6, (void*) &Fake::vmethod7,
(void*) &Fake::vmethod8, (void*) &Fake::vdestructor
};
/* Store the vtable and get the look-up index */
unsigned Fake::classVIndex = VirtualTableManager::getInstance().addElement(Fake::vtable);
/* Do the same for derived class */
void** DerivedFake::vtable = (void*[]) {
(void*) &DerivedFake::vmethod0, (void*) &Fake::vmethod1,
(void*) &Fake::vmethod2, (void*) &Fake::vmethod3,
(void*) &Fake::vmethod4, (void*) &Fake::vmethod5,
(void*) &Fake::vmethod6, (void*) &Fake::vmethod7,
(void*) &Fake::vmethod8, (void*) &DerivedFake::vdestructor
};
unsigned DerivedFake::classVIndex = VirtualTableManager::getInstance().addElement(DerivedFake::vtable);
int main_virtual(int argc, char** argv)
{
printf("size of 100 instances of Real including padding is %lu bytes\n"
"size of 100 instances of Fake including padding is %lu bytes\n",
sizeof(Real[100]), sizeof(Fake[100]));
Real *real = new Real;
Fake *fake = new Fake;
Fake *derived = new DerivedFake;
real->method1(123);
fake->method1(456);
derived->method1(789);
printf("real::method2() = %hi\n"
"fake::method2() = %hi\n"
"derived::method2() = %hi\n", real->method2(), fake->method2(), derived->method2());
real->method0();
fake->method0();
derived->method0();
delete real;
delete fake;
delete derived;
return 0;
}
Fear not, I do not normally put the definition in classes like that. I just did it here to hopefully improve readability. Anyway, the output:
size of 100 instances of Real including padding is 1600 bytes
size of 100 instances of Fake including padding is 800 bytes
real::method2() = 123
fake::method2() = 456
derived::method2() = 789
Real::method0()
Fake::vmethod0(0x1bd8040)
DerivedFake::vmethod0(0x1bd8060)
Real::~Real()
Fake::vdestructor(0x1bd8040)
DerivedFake::vdestructor(0x1bd8060)
Fake::vdestructor(0x1bd8060)
It might not be thread-safe, might contain a fearsome legion of bugs, and might also be relatively inefficient, but I hope it demonstrates my concept. It was tested on 64-bit Ubuntu with g++-4.7. I doubt there is any size benefit on 32-bit systems, and since I save less than a word (4 bytes, so much for that!) I had to put a field in there to make the effects show. Feel free to benchmark the speed though (please optimize it first if you do, I rushed this) or test the effects on other architectures/platforms and with other compilers (I'd like to see the results, so please share them if you do). Something similar may be useful when someone finds the need to make a 128/256-bit platform, creates a processor which has very limited memory support but incredible speed or with compilers that use like 21 bytes for the vtable on each instance.
edit:
Whoops, the code example was a derp. Fixed it.

One challenge with an array-based vtable is how you would link together several compiled source files. If each compiled file stored its own table, the linker would have to combine those tables together when producing the final binary. This increases the complexity of the linker, which now has to be made aware of this new C++-specific detail.
Additionally, the byte-saving techniques you described would be tricky to get right with multiple compilation units. What if you have two source files, each of which has few enough classes to use two bytes per vtable index, but which combined now need three bytes? In that case, the linker would have to rewrite the object files based on the new object size.
Additionally, this new system would not interact well with dynamic linking. If you had a separate object file that was linked in at runtime, you would have two or more global tables of vtables. The generated object code would then have to take this into account, which would increase code generator complexity.
Finally, there's alignment issues. Using two or four bytes for the index when the word size is eight bytes might degrade program performance if it offset all the other fields of the object. In fact, its entirely possible that g++ only uses four bytes, but then pads to eight.
In short, there is no reason why you couldn't do this optimization, but it comes at a significant implementation complexity and (possibly) at a runtike cost. That said, it's a very clever idea!
Hope this helps!

It's always a trade-off. To be an improvement, any scheme to save space would have to at least often save space and never lose speed.
If you put a 2 or 4 byte index in the class, and I then add a pointer as the first member, there will have to be some padding to get the right alignment for my pointer.
So now the class is 16 bytes anyway. If the indexing is then just even slightly slower than using a vtable pointer, it is a net loss.
I can accept that it is not always a reduction in size, but I don't want to lose some speed for no gain in size.

In addition, it's simpler for the CPU to pre-fetch a simple address rather than an index to an array (plus, of course, the extra de-references). You would add more than the cost of a single de-reference.

Handling Huge Multidimensional Arrays in C++

I'm designing a game in C++ similar to Minecraft that holds an enormous amount of terrain data in memory. In general, I want to store an array in memory that is [5][4][5][50][50][50]. This isn't bad since it amounts to about 100mb of virtual memory since my structure will only be about 8 bytes.
However, I'm having trouble figuring out the best way to handle this. I do want this to be in virtual memory, but obviously not on the stack. And I keep making the mistake some how of creating this array on the stack an causing a stack overflow. What I would like to do is below. This is just code that I threw together to give you an example of what I'm doing, I have code with correct syntax on my machine, I just didn't want to clutter the post.
typedef struct modelBlock
{
// Information about the blocks
} BLOCK;
typedef struct modelGrid
{
bool empty;
BLOCK blocksArray[50][50][50];
} GRID;
class Parent
{
Child* child;
Parent(void);
}
Parent::Parent()
{
Child c;
child = &c;
}
class Child
{
GRID grids[5][4][5];
}
However, every time I do this, I cause a stack overflow (appropriate web site choice right?). I played with using pointer based arrays, but I had a lot of trouble with data being lost outside of its scope.
If anyone could give me some insight on how to get my data to store on the heap instead of the stack, or if I should use some other way of creating my array, I'd really appreciate the help. I'd like to avoid using vectors because of overhead, though I'm not sure how substantial it is.

Use boost::multi_array

If you want to allocate something on the heap, use new.
#include <memory>
class Parent
{
std::auto_ptr<Child> child; // use auto_ptr for dynamically-allocated members
Parent(const Parent&); // You probably don't want to copy this giant thing
public:
Parent();
};
Parent::Parent()
: child(new Child) // initialize members with an initializer list
{
}
Also, avoid mixing C and C++ styles. There's no reason to do
typedef struct blah{ ... } BLAH;
in C++. A struct is just a class with all of the members public by default; just like a class, you can refer to the struct type's name without using the struct tag. There's also no need to specify void for a function that takes no parameters.
boost::multi_array (linked in PigBen's answer) is a good choice over raw arrays.

If you want the class created on the heap, create it with new:
Child * c = new Child;
and then of course delete it, or better still use a smart pointer.

In order to do exactly what you're trying to do you have to declare everything as pointers (and pointers to pointers to pointers to pointers) and then allocate each one individually.
Teh sux!
A better option is to simply allocate the ginormous block in one chunk and use multiple variable along with pointer arithmetic to arrive at the correct location.
Edit: Wasn't paying attention and didn't notice your constructor. That's not only not the way to get your Child allocated on the free-store, it's a great way to create situations eliciting undefined behavior. Your Child will be gone when the constructor is through and the pointer to it will then be invalid. I wonder if you shouldn't run through some basic tutorials before trying to write a game.

Here's something that works and can be built upon without the boost dependency. One downside is it removes use of [][][] style of referencing elements, but it's a small cost and can be added.
template<class T>
class Matrix {
unsigned char* _data;
const size_t _depth;
const size_t _cols;
const size_t _rows;
public:
Matrix(const size_t& depth, const size_t& rows, const size_t& cols):
_depth(depth),
_rows(rows),
_cols(cols) {
_data = new unsigned char [depth * rows * cols * sizeof(T)];
}
~Matrix() {
delete[] _data;
}
T& at(const size_t& depthIndex, const size_t& rowIndex, const size_t& colIndex) const {
return *reinterpret_cast<T*>(_data + ((((depthIndex * _cols + colIndex) * _rows) + rowIndex) * sizeof(T)));
}
const size_t& getDepth() const {
return _depth;
}
const size_t& getRows() const {
return _rows;
}
const size_t& getCols() const {
return _cols;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
Matrix<int> block(50, 50, 50);
size_t d, r, c;
for (d = 0; d < block.getDepth(); d++) {
for (r = 0; r < block.getRows(); r++) {
for (c = 0; c < block.getCols(); c++) {
block.at(d, r, c) = d * 10000000 + r * 10000 + c;
}
}
}
for (d = 0; d < block.getDepth(); d++) {
for (r = 0; r < block.getRows(); r++) {
for (c = 0; c < block.getCols(); c++) {
assert(block.at(d, r, c) == d * 10000000 + r * 10000 + c);
}
}
}
return 0;
}

A smaller example (with changed names for all the structs, to make the general principle clearer). The 'Bloe' struct is the one you want to allocate on the heap, and this is accomplished using 'new'.
struct Bla {
int arr[4][4];
};
struct Bloe {
Bla bla[2][2];
};
int main()
{
Bloe* bloe = new Bloe();
bloe->bla[1][1].arr[1][1] = 1;
return 0;
}

I did this by putting all the data in a binary file. I calculated the offset of the data and used seek() and read() to get the data when needed. The open() call is very slow so you should leave the file open during the lifetime of the program.

Below is how I understood what you showed you were trying to do in your example. I tried to keep it straightforward. Each Array of [50][50][50] is allocated in one memory chunk on the heap, and only allocated when used. There is also an exemple of access code. No fancy boost or anything special, just basic C++.
#include <iostream>
class Block
{
public:
// Information about the blocks
int data;
};
class Grid
{
public:
bool empty;
Block (*blocks)[50][50];
Grid() : empty(true) {
}
void makeRoom(){
this->blocks = new Block[50][50][50];
this->empty = false;
}
~Grid(){
if (!this->empty){
delete [] this->blocks;
}
}
};
class Parent
{
public:
Grid (* child)[4][5];
Parent()
{
this->child = new Grid[5][4][5];
}
~Parent()
{
delete [] this->child;
}
};
main(){
Parent p;
p.child[0][0][0].makeRoom();
if (!p.child[0][0][0].empty){
Block (* grid)[50][50] = p.child[0][0][0].blocks;
grid[49][49][49].data = 17;
}
std::cout << "item = "
<< p.child[0][0][0].blocks[49][49][49].data
<< std::endl;
}
This could still be more simple and straightfoward and just use one bug array of [50][50][50][5][4][5] blocks in one memory chunk on the heap, but I'll let you figure out how if this is what you want.
Also, usind dynamic allocation in class Parent only has the sole purpose to use heap instaed of stack, but for such a small array (5*4*5 pointers), allocating it on stack should not be a problem, hence it could be written.
class Parent
{
public:
Grid child[5][4][5];
};
without changing anything in the way it is used.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js