I want to read from disk a solid block of data which will then be split into many allocations that can independently be freed or reallocated. I could just allocate new memory at the proper size and copy the data over but I was wondering if there is a way to split the allocation since that seems like it would be a cheaper operation.
I would expect this functionality to be provided by the standard but I did not find anything like this. Is there a good reason why this is? And if not could it be implemented in some way?
I want to read from disk a solid block of data which will then be split into many allocations that can independently be freed or reallocated.
This requirement is flawed to begin with, because if we were to allocate a big chunk of contiguous memory and then manually free parts of it, the programmer would be taking liberties in acting as a manual heap memory manager. What to patch into the holes in the contiguous chunk and who will have the responsibility for doing that? It might just end up as useless holes in the memory map. You may be able to do something similar with lower level, system-specific functions (mmap or similar), but standard C or C++ both strive to be generic and do not generally specify how/where things should be allocated in memory.
The proper way to do this is otherwise to use realloc. Then the underlying heap manager may free parts of the memory while keeping some of it in the same location, or it may allocate a new chunk and copy the data there, as it pleases. The caller of realloc need not worry their pretty head about it. In case of t* tmp = realloc(original, n), the programmer should just not be assuming that original is still pointing at valid memory after the call. But rather do if(tmp != NULL) { original = tmp; }. And let realloc worry about if the actual data is stored at the same address or a new one.
Another option would be to not use heap allocation at all but to implement your own static memory pool of a fixed size. The main reasons for doing something like that is not to preserve memory but rather for deterministic allocations (embedded systems).
It's not generally possible, so they didn't put it in the library. Some memory allocation algorithms could theoretically do this, but other ones can't. Some memory allocation algorithms only support certain sizes (and round up), or they put different-sized objects into different parts of memory.
In C++ you can use std::shared_ptr whith empty deleter when you first "allocating" you structure and then use default deleter when re-allocating some objects. I.e. something like this:
#include <memory>
// This class should be standad-layout-class to be able to use placement new
class A {
int a;
char b[10];
};
class B
{
public:
std::shared_ptr<A> a;
std::shared_ptr<A> b;
};
template<typename T>
void null_deleter(T *)
{
// Do notyhing, memory managed elsewhere
}
extern char *read_memory();
int main()
{
char *buf = read_memory();
B v;
v.a = std::shared_ptr<A>(new (buf) A, &null_deleter<A>);
v.b = std::shared_ptr<A>(new (buf+50) A, &null_deleter<A>);
// some code
v.a = std::make_shared<A>(); // Delete old pointer and create new
}
But I hope you understand that memory allocated when you first read it from disk will not be affected by these allocations/deallocations and you can't use it as contiguous representation of current state, i.e. you can't write it back to file and expect that all modifications done to reallocated objects will be reflected there.
One downside of this approach is that memory portions which you do not use anymore will not be re-used until you free entire big block and so they will be wasted, but in some cases, i.e. when this happens rare or these blocks not very big this can be acceptable.
Related
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to get memory block length after malloc?
If I have a pointer, is it possible to learn how many bytes were allocated by new?
When I googled I found a solution for Windows: _msize() and for Mac: malloc_size(). But nothing for Linux.
And if not, does anybody know why is it hidden from a programmer? delete should definitely know such info.
Update:
As far as I know, if I have this code:
class A {
~A() {}
int m_a;
};
class B : public A {
~B() {}
int m_b;
};
int main() { A * b = new B(); delete b; return 0; }
Destructor of A will be called, but still all the memory allocated by new will be freed.
This means that it can be somehow computed knowing only the pointer. So what is the reason for hiding it from a programmer?
Unfortunately, there is no portable way of obtaining the number of bytes allocated by new and malloc. There are a number of reason why this is the case:
On some platforms, delete and free do nothing at all. As such, they don't need to store size information. This is surprisingly common in embedded platforms; it lets you use C or C++ code written for other platforms unchanged, as long as you don't do too much allocation.
Even on more common platforms, the system may allocate a different number of bytes than you ask for. Typically your allocation will be aligned to some larger size - possibly much larger than your original request. The storage metadata might also be stored in a very slow data structure - you wouldn't want to be taking locks and accessing a hash table in time-critical code.
As portable languages, C and C++ can't offer a feature that won't be available (or well-defined, or reasonably fast) on every platform. That's why this is not available on C++. That said, you don't need this - C++ offers std::vector, which does track the size of your allocation, or std::string which takes care of all of those details for you.
new, malloc, calloc and all the other heap related allocations in the language (yes, there are many more than those) will allocate at least the amount of memory you requested. They may allocate more (and in general they will allocate more).
There is no portable way to know how much they allocated. In fact there is no way at all unless you know exactly what heap manager you are using.
You also need to distinguish allocated memory in the sense of memory that you may access safely from the returned pointer (that's what malloc_size returns on macs and probably what _msize returns on windows) from actual memory that is 'taken away from the heap' because of the allocation (which includes bookkeeping information which may or may not be adjacent to the memory block you allocated and may or may not be the same for same-sized allocations).
Q: So can I query the malloc package to find out how big an allocated block is?
A: Unfortunately, there is no standard or portable way. (Some compilers provide nonstandard extensions.) If you need to know, you'll have to keep track of it yourself.
C-FAQ
What the new operator does is it invokes a constructor so the size of the allocation depends on the type whose constructor you're invoking.
E.g.
class A
{
private:
int* x;
public:
A() { x = new int [100]; }
};
Will allocate sizeof(int) * 100 but you cannot know that if the implementation of the A is hidden from you.
If you perform yourself:
int * x = new int [100];
Then you know how much you have allocated because of have access to sizeof(primitive).
Furthermore the delete operator invokes a destructor so for complex objects it again does not need to know the size of the allocated memory as the responsibility for freeing the memory fully and correctly is entirely delegated to the programmer.
So there isn't a straight forward answer here.
In addition to the answers above: In some situations, the size that have to be allocated and deallocated are known at compile time and it would be a complete waist of memory, to record the size somewhere.
In cases where the static type is equal to the dynamic type, the memory to be deallocated can be determined by the type.
In cases, where the static type is not equal to the dynamic type, the deleted objects class has to have a virtual destructor. This destructor can be used to deallocate the right size of memory.
When allocating an array, the size of the array is usually attached to that array in a implementation dependent manner and the size to be deallocated can be determined by the type of the elements and the size of the array.
X x=new X()
here it depend on the size of class i.e. the number of variables class contains.
int x = new int [100];
here it depend on how many elements u r going to allocate.suppose, int takes 2 byte,then here it takes 200 bytes.
shortly, we can say that, it depend on the data type, for which u r using new operator
I don't quite get the point of dynamically allocated memory and I am hoping you guys can make things clearer for me.
First of all, every time we allocate memory we simply get a pointer to that memory.
int * dynInt = new int;
So what is the difference between doing what I did above and:
int someInt;
int* dynInt = &someInt;
As I understand, in both cases memory is allocated for an int, and we get a pointer to that memory.
So what's the difference between the two. When is one method preferred to the other.
Further more why do I need to free up memory with
delete dynInt;
in the first case, but not in the second case.
My guesses are:
When dynamically allocating memory for an object, the object doesn't get initialized while if you do something like in the second case, the object get's initialized. If this is the only difference, is there a any motivation behind this apart from the fact that dynamically allocating memory is faster.
The reason we don't need to use delete for the second case is because the fact that the object was initialized creates some kind of an automatic destruction routine.
Those are just guesses would love it if someone corrected me and clarified things for me.
The difference is in storage duration.
Objects with automatic storage duration are your "normal" objects that automatically go out of scope at the end of the block in which they're defined.
Create them like int someInt;
You may have heard of them as "stack objects", though I object to this terminology.
Objects with dynamic storage duration have something of a "manual" lifetime; you have to destroy them yourself with delete, and create them with the keyword new.
You may have heard of them as "heap objects", though I object to this, too.
The use of pointers is actually not strictly relevant to either of them. You can have a pointer to an object of automatic storage duration (your second example), and you can have a pointer to an object of dynamic storage duration (your first example).
But it's rare that you'll want a pointer to an automatic object, because:
you don't have one "by default";
the object isn't going to last very long, so there's not a lot you can do with such a pointer.
By contrast, dynamic objects are often accessed through pointers, simply because the syntax comes close to enforcing it. new returns a pointer for you to use, you have to pass a pointer to delete, and (aside from using references) there's actually no other way to access the object. It lives "out there" in a cloud of dynamicness that's not sitting in the local scope.
Because of this, the usage of pointers is sometimes confused with the usage of dynamic storage, but in fact the former is not causally related to the latter.
An object created like this:
int foo;
has automatic storage duration - the object lives until the variable foo goes out of scope. This means that in your first example, dynInt will be an invalid pointer once someInt goes out of scope (for example, at the end of a function).
An object created like this:
int foo* = new int;
Has dynamic storage duration - the object lives until you explicitly call delete on it.
Initialization of the objects is an orthogonal concept; it is not directly related to which type of storage-duration you use. See here for more information on initialization.
Your program gets an initial chunk of memory at startup. This memory is called the stack. The amount is usually around 2MB these days.
Your program can ask the OS for additional memory. This is called dynamic memory allocation. This allocates memory on the free store (C++ terminology) or the heap (C terminology). You can ask for as much memory as the system is willing to give (multiple gigabytes).
The syntax for allocating a variable on the stack looks like this:
{
int a; // allocate on the stack
} // automatic cleanup on scope exit
The syntax for allocating a variable using memory from the free store looks like this:
int * a = new int; // ask OS memory for storing an int
delete a; // user is responsible for deleting the object
To answer your questions:
When is one method preferred to the other.
Generally stack allocation is preferred.
Dynamic allocation required when you need to store a polymorphic object using its base type.
Always use smart pointer to automate deletion:
C++03: boost::scoped_ptr, boost::shared_ptr or std::auto_ptr.
C++11: std::unique_ptr or std::shared_ptr.
For example:
// stack allocation (safe)
Circle c;
// heap allocation (unsafe)
Shape * shape = new Circle;
delete shape;
// heap allocation with smart pointers (safe)
std::unique_ptr<Shape> shape(new Circle);
Further more why do I need to free up memory in the first case, but not in the second case.
As I mentioned above stack allocated variables are automatically deallocated on scope exit.
Note that you are not allowed to delete stack memory. Doing so would inevitably crash your application.
For a single integer it only makes sense if you need the keep the value after for example, returning from a function. Had you declared someInt as you said, it would have been invalidated as soon as it went out of scope.
However, in general there is a greater use for dynamic allocation. There are many things that your program doesn't know before allocation and depends on input. For example, your program needs to read an image file. How big is that image file? We could say we store it in an array like this:
unsigned char data[1000000];
But that would only work if the image size was less than or equal to 1000000 bytes, and would also be wasteful for smaller images. Instead, we can dynamically allocate the memory:
unsigned char* data = new unsigned char[file_size];
Here, file_size is determined at runtime. You couldn't possibly tell this value at the time of compilation.
Read more about dynamic memory allocation and also garbage collection
You really need to read a good C or C++ programming book.
Explaining in detail would take a lot of time.
The heap is the memory inside which dynamic allocation (with new in C++ or malloc in C) happens. There are system calls involved with growing and shrinking the heap. On Linux, they are mmap & munmap (used to implement malloc and new etc...).
You can call a lot of times the allocation primitive. So you could put int *p = new int; inside a loop, and get a fresh location every time you loop!
Don't forget to release memory (with delete in C++ or free in C). Otherwise, you'll get a memory leak -a naughty kind of bug-. On Linux, valgrind helps to catch them.
Whenever you are using new in C++ memory is allocated through malloc which calls the sbrk system call (or similar) itself. Therefore no one, except the OS, has knowledge about the requested size. So you'll have to use delete (which calls free which goes to sbrk again) for giving memory back to the system. Otherwise you'll get a memory leak.
Now, when it comes to your second case, the compiler has knowledge about the size of the allocated memory. That is, in your case, the size of one int. Setting a pointer to the address of this int does not change anything in the knowledge of the needed memory. Or with other words: The compiler is able to take care about freeing of the memory. In the first case with new this is not possible.
In addition to that: new respectively malloc do not need to allocate exactly the requsted size, which makes things a bit more complicated.
Edit
Two more common phrases: The first case is also known as static memory allocation (done by the compiler), the second case refers to dynamic memory allocation (done by the runtime system).
What happens if your program is supposed to let the user store any number of integers? Then you'll need to decide during run-time, based on the user's input, how many ints to allocate, so this must be done dynamically.
In a nutshell, dynamically allocated object's lifetime is controlled by you and not by the language. This allows you to let it live as long as it is required (as opposed to end of the scope), possibly determined by a condition that can only be calculated at run-rime.
Also, dynamic memory is typically much more "scalable" - i.e. you can allocate more and/or larger objects compared to stack-based allocation.
The allocation essentially "marks" a piece of memory so no other object can be allocated in the same space. De-allocation "unmarks" that piece of memory so it can be reused for later allocations. If you fail to deallocate memory after it is no longer needed, you get a condition known as "memory leak" - your program is occupying a memory it no longer needs, leading to possible failure to allocate new memory (due to the lack of free memory), and just generally putting an unnecessary strain on the system.
When instantiating a class with new. Instead of deleting the memory what kinds of benefits would we gain based on the reuse of the objects?
What is the process of new? Does a context switch occur? New memory is allocated, who is doing the allocation? OS ?
You've asked a few questions here...
Instead of deleting the memory what kinds of benefits would we gain based on the reuse of the objects?
That depends entirely on your application. Even supposing I knew what the application is, you've left another detail unspecified -- what is the strategy behind your re-use? But even knowing that, it's very hard to predict or answer generically. Try some things and measure them.
As a rule of thumb I like to minimize the most gratuitous of allocations. This is mostly premature optimization, though. It'd only make a difference over thousands of calls.
What is the process of new?
Entirely implementation dependent. But the general strategy that allocators use is to have a free list, that is, a list of blocks which have been freed in the process. When the free list is empty or contains insufficient contiguous free space, it must ask the kernel for the memory, which it can only give out in blocks of a constant page size. (4096 on x86.) An allocator also has to decide when to chop up, pad, or coalesce blocks. Multi-threading can also put pressure on allocators because they must synchronize their free lists.
Generally it's a pretty expensive operation. Maybe not so much relative to what else you're doing. But it ain't cheap.
Does a context switch occur?Entirely possible. It's also possible that it won't. Your OS is free to do a context switch any time it gets an interrupt or a syscall, so uh... That can happen at a lot of times; I don't see any special relationship between this and your allocator.
New memory is allocated, who is doing the allocation? OS ?It might come from a free list, in which case there is no system call involved, hence no help from the OS. But it might come from the OS if the free list can't satisfy the request. Also, even if it comes from the free list, your kernel might have paged out that data, so you could get a page fault on access and the kernel's allocator would kick in. So I guess it'd be a mixed bag. Of course, you can have a conforming implementation that does all kinds of crazy things.
new allocates memory for the class on the heap, and calls the constructor.
context switches do not have to occur.
The c++-runtime allocates the memory on its freestore using whatever mechanism it deems fit.
Usually the c++ runtime allocates large blocks of memory using OS memory management functions, and then subdivides those up using its own heap implementation. The microsoft c++ runtime mostly uses the Win32 heap functions which are implemented in usermode, and divide up OS memory allocated using the virtual memory apis. There are thus no context switches until and unless its current allocation of virtual memory is needed and it needs to go to the OS to allocate more.
There is a theoretical problem when allocating memory that there is no upper bound on how long a heap traversal might take to find a free block. Practically tho, heap allocations are usually fast.
With the exception of threaded applications. Because most c++ runtimes share a single heap between multiple threads, access to the heap needs to be serialized. This can severly degrade the performance of certain classes of applications that rely on multiple threads being able to new and delete many objects.
If you new or delete an address it's marked as occupied or unassigned. The implementations do not talk all the time with the kernel. Bigger chucks of memory are reserved and divided in smaller chucks in user space within your application.
Because new and delete are re-entrant (or thread-safe depending on the implementation) a context switch may occur but your implementation is thread-safe anyway while using the default new and delete.
In C++ you are able to overwrite the new and delete operator, e.g. to place your memory management:
#include <cstdlib> //declarations of malloc and free
#include <new>
#include <iostream>
using namespace std;
class C {
public:
C();
void* operator new (size_t size); //implicitly declared as a static member function
void operator delete (void *p); //implicitly declared as a static member function
};
void* C::operator new (size_t size) throw (const char *){
void * p = malloc(size);
if (p == 0) throw "allocation failure"; //instead of std::bad_alloc
return p;
}
void C::operator delete (void *p){
C* pc = static_cast<C*>(p);
free(p);
}
int main() {
C *p = new C; // calls C::new
delete p; // calls C::delete
}
If I use std::vector<> or std::string, do I need to allocate them in heap as well. For example:
int main() {
std::vector<int>* p = new std::vector<int>();
delete p;
}
In Java and C#, objects are always allocated in heap using this syntax. I wonder is it efficient to do the same thing in C++? Because whenever I create a class in C++, I usally mix between stack and heap variables. Let's say:
class simple {
int a;
double b;
std::string c;
std::vector<int> d;
....
};
I wonder what's the best practice should I follow when using object in C++?
All data should be allocated on heap?
All data could be mixed?
or...
Thanks,
Chan
I try to allocate the objects on stack whenever possible as I don't have to worry about releasing the memory in such case. Only when I explictly want to control the life time of an object I will allocate the object on the heap. Even if the object internally allocates memory on heap, you can still create the object itself on the stack. There is no restriction on that.
You should avoid creating objects with large size on stack, because casual stack overflow on stress (large input data) is rarely revealed by testing and so will make your end users unhappy that your software crashes.
About string and vector and other STL containers you should not worry, because they use dynamic allocation internally. So the answer is NO, it is safe to construct them into stack and it is usually overkill to allocate them dynamically.
What might be dangerous are static-sized arrays, things that enwrap such arrays like boost::array or classes that have such as data members. Experts often use pimpl idiom to make their classes internally dynamic.
Stack is extremely quick, but use its quickness only where it really benefits the performance. It is safer to be careful with it. Avoid taking dangerous idioms like "I allocate everything on stack".
No; in general, use the stack unless the variable's lifetime exceeds that of the function.
The container classes will allocate their own memory from the heap; the only data on the stack is whatever bookkeeping the container class needs such as a pointer to the head, size, etc.
Additionally I would recommend avoiding manual new/delete and utilizing shared_ptr etc techniques.
I have an array, called x, whose size is 6*sizeof(float). I'm aware that declaring:
float x[6];
would allocate 6*sizeof(float) for x in the stack memory. However, if I do the following:
float *x; // in class definition
x = new float[6]; // in class constructor
delete [] x; // in class destructor
I would be allocating dynamic memory of 6*sizeof(float) to x. If the size of x does not change for the lifetime of the class, in terms of best practices for cleanliness and speed (I do vaguely recall, if not correctly, that stack memory operations are faster than dynamic memory operations), should I make sure that x is statically rather than dynamically allocated memory? Thanks in advance.
Declaring the array of fixed size will surely be faster. Each separate dynamic allocation requires finding an unoccupied block and that's not very fast.
So if you really care about speed (have profiled) the rule is if you don't need dynamic allocation - don't use it. If you need it - think twice on how much to allocate since reallocating is not very fast too.
Using an array member will be cleaner (more succinct, less error prone) and faster as there is no need to call allocation and deallocation functions. You will also tend to improve 'locality of reference' for the structure being allocated.
The two main reasons for using dynamically allocated memory for such a member are where the required size is only known at run time, or where the required size is large and it is known that this will have a significant impact on the available stack space on the target platform.
TBH data on the stack generally sits in the cache and hence it is faster. However if you dynamically allocate something once and then use it regularly it will also be cached and hence pretty much as fast.
The important thing is to avoid allocating and deallocating regularly (ie each time a function is called). If you justa void doing regular allocation and deallocations (ie allocate and deallocate once only) then a stack and heap allocated array will preform pretty much as quickly as each other.
Yes, declaring the array statically will perform faster.
This is very easy to test, just write a simple wrapping loop to instantiate X number of these objects. You can also step through the machine code and see the larger number of OPCODEs required to dynamically allocate the memory.
Static allocation is faster (no need to ask to memory ) and there's no way you will forget to delete it or delete it with incorrect delete operator (delete instead of delete[]).
Construction an usage of dynamic/heap data is consists of the following steps:
ask for memory to allocate the objects (calling to new operator). If no memory a new operator will throw bad_alloc exception.
creating the objects with default constructor (also done by new)
release the memory by user (by delete/delete[] operator) - delete will call
to object destructor. Here a user can do a lot of mistakes:
forget to call to delete - this will lead to memory leak
call to not correct delete operator (e.g. delete instead of delete[]) - bad things will happen
call to delete twice - bad things can happen
When using static objects/array of objects, there's no need to allocate memory and release it by user. This makes code simpler and less error-prone.
So to the conclusion, if you know your size on the array on at compilation time and you don't matter about memory (maybe at runtime I'll use not entries in the array), static array is obviously preferred one.
For dynamic allocated data it worth looking for smart pointers (here)
Don't confuse the following cases:
int global_x[6]; // an array with static storage duration
struct Foo {
int *pointer_x; // a pointer member in instance data
int member_x[6]; // an array in instance data
Foo() {
pointer_x = new int[6]; // a heap-allocated array
}
~Foo() { delete[] pointer_x; }
};
int main() {
int auto_x[6]; // an array on the stack (automatic variable)
Foo auto_f; // a Foo on the stack
Foo *dyn_f = new Foo(); // a heap-allocated Foo.
}
Now:
auto_f.member_x is on the stack, because auto_f is on the stack.
(*dyn_f).member_x is on the heap, because *dyn_f is on the heap.
For both Foo objects, pointer_x points to a heap-allocated array.
global_x is in some data section which the OS or runtime creates each time the program is run. This may or may not be from the same heap as dynamic allocations, it doesn't usually matter.
So regardless of whether it's on the heap or not, member_x is a better bet than pointer_x in the case where the length is always 6, because:
It's less code and less error-prone.
Your object only needs a single allocation if the object is heap-allocated, instead of 2.
Your object requires no heap allocations if the object is on the stack.
It uses less memory in total, because of fewer allocations, and also because there's no need for storage for the pointer value.
Reasons to prefer pointer_x:
If you need to reallocate during the lifetime of the object.
If different objects will need a different size array (perhaps based on constructor parameters).
If Foo objects will be placed on the stack, but the array is so large that it won't fit on the stack. For instance if you've got 1MB of stack, then you can't use automatic variables which contain an int[262144].
Composition is more efficient, being faster, lower memory overhead and less memory fragmentation.
You could do something like this:
template <int SZ = 6>
class Whatever {
...
float floats[SZ];
};
Use the stack allocated memory whenever possible. It will save you from the headaches of deallocating the memory, fragmentation of your virtual address space etc. Also, it is faster compared to the dynamic memory allocation.
There are more variables at play here:
The size of the array vs. the size of the stack: stack sizes are quite small compared to the free store (e.g. 1MB upto 30MB). Large chunks on the stack will cause stack overflow
The number of arrays you need: large number of small arrays
The lifetime of the array: if it's only needed locally inside a function, the stack is very convenient. If you need it after the function has exited, you must allocate it on the heap.
Garbage collection: if you allocate it on the heap, you need to clean it up manually, or have some flavour of smart pointers do the work for you.
As mentioned in another reply, large objects can not be allocated on the stack because you are not sure what is the stack size. In interests of portability, large objects or objects with variable sizes should always be allocated on the heap.
There has been a lot of development in the malloc/new routines now provided by the operating system (for example, Solaris's libumem). Dynamic memory allocation is often not a bottleneck.
If yo allocate the arraty statically, there will only be one instance of it. The point of using a class is that you want multiple instances. There is no need to allocate the array dynamically at all:
class A {
...
private:
float x[8];
};
is what you want.