How does sbrk() work in C++? - c++

Where can I read about sbrk() in some detail?
How does it exactly work?
In what situations would I want to use sbrk() instead of the cumbersome malloc() and new()?
btw, what is the expansion for sbrk()?

Have a look at the specification for brk/sbrk.
The call basically asks the OS to allocate some more memory for the application by incrementing the previous "break value" by a certain amount. This amount (the first parameter) is the amount of extra memory your application then gets.
Most rudimentary malloc implementations build upon the sbrk system call to get blocks of memory that they split up and track. The mmap function is generally accepted as a better choice (which is why mallocs like dlmalloc support both with an #ifdef).
As for "how it works", an sbrk at its most simplest level could look something like this:
uintptr_t current_break; // Some global variable for your application.
// This would probably be properly tracked by the OS for the process
void *sbrk(intptr_t incr)
{
uintptr_t old_break = current_break;
current_break += incr;
return (void*) old_break;
}
Modern operating systems would do far more, such as map pages into the address space and add tracking information for each block of memory allocated.

sbrk is pretty much obsolete, these days you'd use mmap to map some pages out of /dev/zero. It certainly isn't something you use instead of malloc and friends, it's more a way of implementing those. Also, of course, it exists only on posix-based operating systems that care about backwards compatibility to ancient code.
If you find Malloc and New too cumbersome, you should look into garbage collection instead... but beware, there is a potential performance cost to that, so you need to understand what you are doing.

You never want to use sbrk instead of malloc or free. It is non-portable and is typically used only by implementers of the standard C library or in cases where it's not available. It's described pretty well in its man page:
Description
brk() sets the end of the
data segment to the value specified by
end_data_segment, when that value is
reasonable, the system does have
enough memory and the process does not
exceed its max data size (see
setrlimit(2)).
sbrk() increments the program's data
space by increment bytes. sbrk() isn't
a system call, it is just a C library
wrapper. Calling sbrk() with an
increment of 0 can be used to find the
current location of the program break.
Return Value
On success, brk() returns
zero, and sbrk() returns a pointer to
the start of the new area. On error,
-1 is returned, and errno is set to ENOMEM.
Finally,malloc and free are not cumbersome - they are the standard way to allocate and release memory in C. Even if you want to implement your own memory allocator, it's best to just use malloc and free as the basis - a common approach is to allocate a large chunk at a time with malloc and provide memory allocation from it (this is what suballocators, or pools, usually implement)
Re the origin of the name sbrk (or its cousin brk), it may have something to do with the fact that the end of the heap is marked by a pointer known as the "break". The heap starts right after the BSS segments and typically grows up towards the stack.

You've tagged this C++ so why would you use 'cumbersome' malloc() rather than new? I am not sure what is cumbersome about malloc in any case; internally maybe so, but why would you care? And if you did care (for reasons of determinism for example), you could allocate a large pool and implement your own allocator for that pool. In C++ of course you can overload the new operator to do that.
sbrk is used to glue the C library to the underlying system's OS memory management. So make OS calls rather than using sbrk(). As to how it works, that is system dependent. If for example you are using the Newlib C library (commonly used on 'bare-metal' embedded systems with the GNU compiler), you have to implement sbrk yourself, so how it works in those circumstances is up to you so long as it achieves its required behaviour of extending the heap or failing.
As you can see from the link, it does not do much and would be extremely cumbersome to use directly - you'd probably end-up wrapping it in all the functionality that malloc and new provide in any case.

This depends on what you mean by malloc being "Cumbersome". sbrk is typically not used directly anymore, unless you're implementing your own memory allocator: IE, operator overriding "new". Even then I'd possibly use malloc to give me my initial memory.
If you'd like to see how to to implement malloc() on top of sbrk(), check out http://web.ics.purdue.edu/~cs354/labs/lab6/ which is an exercise going through that.
On a modern system you shouldn't touch this interface, though. Since you're calling malloc and new cumbersome, I suspect you don't have all the requisite experience to safely and properly use sbrk for your code.

Related

How to check how much memory has already been allocated by new()? [duplicate]

How to detect programmatically count of bytes allocated by process on Heap?
This test should work from process itself.
I think mallinfo() is what you want:
#include <malloc.h>
struct mallinfo *info;
info = mallinfo();
printf ("total allocated space: %llu bytes\n", info->uordblks);
printf ("total free space: %llu bytes\n", info->fordblks);
The struct mallinfo structure is technical, and specific to the malloc() implementation. But the information you want is in there. Here is how I report the values:
mallinfo.arena = "Total Size (bytes)"
mallinfo.uordblks = "Busy size (bytes)"
mallinfo.fordblks = "Free size (bytes)"
mallinfo.ordblks = "Free blocks (count)"
mallinfo.keepcost = "Top block size (bytes)"
mallinfo.hblks = "Blocks mapped via mmap() (count)"
mallinfo.hblkhd = "Bytes mapped via mmap() (bytes)"
These two are allegedly not used, but they seem to change on my system, and thus might be valid:
mallinfo.smblks = "Fast bin blocks (count)"
mallinfo.fsmblks = "Fast bin bytes (bytes)"
And the other interesting value is returned by "sbrk (0)"
There are a number of possibilities.
How accurate do you need it to be? You can get some useful data via cat /proc/${PID}/status | grep VmData.
You can #define your own malloc(), realloc(), calloc(), and free() functions, wrapping the real functions behind your own counter. You can do really cool things here with __FILE__, __LINE__, & __func__ to facilitate identifying core leaks in simple tests. But it will only instrument your own code!
(Similarly, you can also redefine the default operator new and operator delete methods, both array and non-array variants, and both throwing std::bad_alloc and std::nothrow_t variants. Again, this will only instrument your own code!)
(Be aware: On most C++ systems, new ultimately invokes malloc(). It doesn't have to. Especially with in-place new! But typically new does make use of malloc(). (Or it operates on a region of memory that has previously been malloc()'ed.) Otherwise you'd get into really funky stuff with multiple heap managers...)
You can use sbrk(0) to see where the data segment is currently set. That's not so great. It's a very coarse measurement, and it doesn't account for holes (unused memory regions) in the heap. (You're much better off with the VmData line from /proc/${PID}/status.) But if you're just looking for a general idea...
You can trap malloc()/free()/etc by writing your own shared library and forcing your process to use it instead of the real versions via LD_PRELOAD. You can use dlopen()/dlsym() to load & invoke the *real* malloc()/free()/etc. This works quite beautifully. The original code is unmodified, not even recompiled. But be aware of re-entrant situations when coding this library, and that your process will initially invoke malloc()/calloc()/realloc() before dlopen()/dlsym() can complete loading the real functions.
You might check out tools like Valgrind, though that's really aimed more at memory leaks.
Then again, perhaps mtrace() is what you want? Or __malloc_hook? Very proprietary (GNU) & nonstandard... But you are tagged "Linux"...
There's no easy, automatic way to do it, if that's what you're asking. You basically have to manually keep track of heap allocations yourself using a counter variable. The problem is that it's difficult to control which parts of your program are allocating memory on the heap, especially if you're using a lot of libraries out of your control. To complicate things further, there's two ways a program might allocate heap memory: new or malloc. (Not to mention direct OS calls like sbrk.)
You can override global operator new, and have each call to new increase a global tally. However, this won't necessarily include times when your program calls malloc, or when your program uses some class-specific new override. You can also override malloc using a macro, but this is not necessarily portable. And you'd also have to override all the variations of malloc, like realloc, calloc, etc. All of this is further complicated by the fact that on some implementations, new itself may call malloc.
So, essentially, it's very difficult to do this properly from within your program. I'd recommend using a memory profiler tool instead.
A speculative solution: redefine new and delete operators.
On each new operator call, a number of bytes to allocate is passed. Allocate a bit more memory and store the amount of bytes allocated within. Add this amount to global variable that holds the heap size.
On delete operator call, check the value you stored before you dispose the memory. Subtract it from that global variable.
If you're on windows, you can use GetProcessHeap(), HeapQueryInfo() to retrieve information about the processes heap. An example of walking the heap from MSDN
Since you've tagged your question 'linux' it might help to look at some of the information provided in the /proc directory. I haven't researched this a lot so I can only give you a starting point.
/proc/<your programs pid> contains files with some information about your process from the viewpoint of the kernel. There is a symlink /proc/self that will always about the process your investigating this from.
The files you might be most interested in are stat, statm and status. The latter is more human-readable, whereas the former two give the same info in a more machine-readable format.
A starting point about how to interpret the content of those files is available in the proc(5) manpage.
Other then keeping track of all your memory allocations I don't believe there is a way to calculate the heap size or usage
Use malloc_info(). Have fun with the XML aspect of it. mallinfo() (as shown in another answer) does a similar thing but is restricted to 32-bit values... a foolish thing to rely on in 2010+.

Does malloc and new know about each other?

Assume I have 10 kB heap and mix C and C++ code like that.
char* block1 = malloc(5*1024); //allocate 5
char* block2 = new[](4*1024); // allocate 4
Is there C heap and C++ heap or just a single heap common for both? So that "new" knows that the first 5 kb of heap are already allocated?
There may or may not be separate C and C++ heaps. You can't write a conforming C++ program that can tell the difference, so it's entirely up to the implementation.
The standard describes the first step in the default behavior of operator new like this:
Executes a loop: Within the loop, the function first attempts to
allocate the requested storage. Whether the attempt involves a call to
the C standard library functions malloc or aligned_alloc is
unspecified. [new.delete.single]/4.1.
And for malloc itself, the standard says: "[aligned_alloc, calloc, malloc, and realloc] do not attempt to allocate storage by calling ::operator new()" [c.malloc]/3.
So the intention is that it's okay to call malloc from operator new, but it's not required.
In practice, operator new calls malloc.
The way that memory allocation works, is that the userspace program first requests one or several memory pages from the operating systems by means of a syscall (sbrk or mmap on *nix).
This is usually done by the malloc-implementation (there are several) that is included in your "C-library". This malloc-implementation then manages all pages that it (successfully) requested. This is done in userspace.
Returning to your question: Most implementations of ::operator new will just relay to malloc. But you can always use a different allocator-implementation and even mix several in your program (see: memory pools).*
This is the reason why the standard requires you to not mix malloc/free and new/delete.
*) Many malloc-implementations have problems with lots of small objects (which is pretty common in C++), this is a good reason for changing the allocator.
An efficient implementation should rather be using system calls brk()/mmap() directly rather than going via malloc().
Though looking at gnu implementation, that isn't the case. It's using malloc() internally.

What parts of standard C++ will call malloc/free rather than new/delete?

What parts of standard C++ will call malloc/free rather than new/delete?
This MSDN article lists several cases where malloc/free will be called rather than new/delete:
http://msdn.microsoft.com/en-us/library/6ewkz86d.aspx
I'd like to know if this list is (in increasing order of goodness and decreasing order of likelihood):
True for other common implementations
Exhaustive
Guaranteed by some part of the C++ standard
The context is that I'd like to replace global new/delete and am wondering what allocations I'd miss if I did.
I'd like to know if this list is (in increasing order of goodness and decreasing order of likelihood):
1. True for other common implementations
2. Exhaustive
3. Guaranteed by some part of the C++ standard
I'd say you cannot really tell from that list (I suppose the one given in the Remarks section) what other C++ implementations than MS will use.
The C++ implementation is free to use any of the OS provided system calls arbitrarily. So the answer for all 3 of your questions is: No.
As for use of malloc() vs new() in implementations of the C++ specific part of the compiler ABI:
I think you can suppose that C++ specific implementations will use new() or placement new for any allocator implementations.
If those listed methods use new() (most unlikely) or malloc() internally to allocate memory doesn't matter for a user of the C++ standard library implementations.
NOTE:
If you're asking from the background of planning to override new(), or use placement new to provide your own memory allocation mechanism for all memory allocation in a programs context: That's not the way to go!
You'll have to provide your own versions of malloc(), free() et. al. then. E.g. when using GCC in conjunction with newlib, there are appropriate stubs you can use for this.
A new is basically a wrapped malloc. The compiler is allowed to use stdio functions at will, for example if you try and implement your own memcpy you'll get some weird recursion. If the compiler sees you copying more than a certain amount (say a dumb bit-for-bit copy constructor) it will use memcpy.
So yes, new is sort of a lie, new means "allocate some memory and construct something there and let me write it as one thing", if you allocate an array of floats say, they are uninitialised, malloc will probably be directly used.
Notice I say probably, I'm not sure if they're set to zero these days :P
Anyway, all compiler optimisations ('cept copy elisioning and other return-value-optimisation stuff - BUT THIS IS THE ONLY EXCEPTION) are invisible to you, that is the point. The program cannot tell it was optimised, you'd have to be timing it and stuff. For example:
(x*10)/2
This will not be optimised if the compiler has no idea about the range of x, because x*10 could overflow, but x*5 might not. So if it optimised it'd change the result.
if(x>0 && x<10) {
(x*10)/2
}
will become x*5 because the compiler, being really smart (much more than this) sees "there's no way x*10 can overflow, so x*5 is safe."
If you have a global new/delete that you defined, the compiler cannot optimise because it cannot know it'll have no effects if it does. If you define your own everything it "simplified" to malloc/free will go away.
NOTE:_
I've deliberately ignored the malloc and type-saftey stuff. It's not relevant.
The compiler assumes that malloc, free, memcpy and so forth are all super-optimised and will use them ONLY WHERE SAFE - as described above. There's a GCC thread on the mailing list somewhere where I learned of the memcpy thing.
Calloc and malloc are much, much more low level than new and delete. Firstly malloc and calloc are not safe, because u use cast on type whatever you want, and access of data in that memory is uncontrolled. (You can end up writing on someone else's memory) If you are doing some real low level programming you will have to use malloc and calloc. If you are regular programmer just use new and delete they are much easier. Why do you need precise implementation? (I have to say implementation depends because there are many different ones)

Boost::Mutex & Malloc

I'm trying to use a faster memory allocator in C++. I can't use Hoard due to licensing / cost. I was using NEDMalloc in a single threaded setting and got excellent performance, but I'm wondering if I should switch to something else -- as I understand things, NEDMalloc is just a replacement for C-based malloc() & free(), not the C++-based new & delete operators (which I use extensively).
The problem is that I now need to be thread-safe, so I'm trying to malloc an object which is reference counted (to prevent excess copying), but which also contains a mutex pointer. That way, if you're about to delete the last copy, you first need to lock the pointer, then free the object, and lastly unlock & free the mutex.
However, using malloc to create a boost::mutex appears impossible because I can't initialize the private object as calling the constructor directly ist verboten.
So I'm left with this odd situation, where I'm using new to allocate the lock and nedmalloc to allocate everything else. But when I allocate a large amount of memory, I run into allocation errors (which disappear when I switch to malloc instead of nedmalloc ~ but the performance is terrible). My guess is that this is due to fragmentation in the memory and an inability of nedmalloc and new to place nice side by side.
There has to be a better solution. What would you suggest?
Google's malloc replacement is quite fast, thread safe by default, and easy to use. Simply link it into your application at it will replace the behavior or malloc/free and new/delete. This makes it particularly easy to re-profile your app to verify the new allocator is actually speeding things up.
You can overload global operators new and delete to call the new versions of malloc and free that you're using. This should make things play nicer together, though I'd be surprised if this wasn't happening already.
As for creating the mutex, use placement new -- this is how a constructor is called manually. A static array of char will do by way of buffer. For example, globals:
static char buf[sizeof(Mutex)];
static Mutex *m=0;
Then to initialize the m pointer:
m=new(buf) Mutex;
(You can also align the pointer, and so on, if you need to, and rename the variables, and so on.)
One thing that might be worth noting is that if the Mutex constructor does more memory allocation itself then this can be a problem. This is unlikely, but possible. (For this likely-to-be-rare case, there's usually no problem with an ad-hoc implementation of a cross-platform mutex wrapper, that doesn't do any allocation -- or, though it will end up a mess eventually, just use #ifdef and use the platform types directly. In either case, it's not much code, and anybody experienced with the system(s) in question can create the relevant code, bug-free, in very little time.)
Correct cleanup of objects created this way can be difficult, so I recommend not to bother (no, seriously). It's perfectly OK to let this stuff leak when you're using it to implement the memory manager; no point going mad over it. (If you're working on a system that has a notion of process exit, the OS is pretty much guaranteed to clean up the underlying mutex for you.)
Have you profiled and verified that actual memory allocation is a significant enough problem that replacing the allocator provides useful gain?
Is NEDMalloc thread safe?
Often, the default c++ new/delete operators will use malloc and free under the hood to do the actual memory allocation before/after calling the constructor/destructor. If they don't in your particular situation, you can override the global new and delete operators to call whatever allocation implementation you wish. This requires some care making sure that memory is always allocated/deallocated with the same allocator (especially when dealing with libraries).
Well, usually C++ new and delete operators internally calls plain C library functions malloc and free (plus some additional magic like calling ctors and dtors), so providing a custom implementation for these functions may be enough (this is not infrequent in embedded C++ development, but requires some linker-level work). What system and what compiler are you targeting?

Is it secure to use malloc?

Somebody told me that allocating with malloc is not secure anymore, I'm not a C/C++ guru but I've made some stuff with malloc and C/C++. Does anyone know about what risks I'm into?
Quoting him:
[..] But indeed the weak point of C/C++ it is the security, and the Achilles' heel is indeed malloc and the abuse of pointers. C/C++ it is a well known insecure language. [..] There would be few apps in what I would not recommend to continue programming with C++."
It's probably true that C++'s new is safer than malloc(), but that doesn't automatically make malloc() more unsafe than it was before. Did your friend say why he considers it insecure?
However, here's a few things you should pay attention to:
1) With C++, you do need to be careful when you use malloc()/free() and new/delete side-by-side in the same program. This is possible and permissible, but everything that was allocated with malloc() must be freed with free(), and not with delete. Similarly, everything that was allocated with new must be freed with delete, and never with free(). (This logic goes even further: If you allocate an array with new[], you must free it with delete[], and not just with delete.) Always use corresponding counterparts for allocation and deallocation, per object.
int* ni = new int;
free(ni); // ERROR: don't do this!
delete ni; // OK
int* mi = (int*)malloc(sizeof(int));
delete mi; // ERROR!
free(mi); // OK
2) malloc() and new (speaking again of C++) don't do exactly the same thing. malloc() just gives you a chunk of memory to use; new will additionally call a contructor (if available). Similarly, delete will call a destructor (if available), while free() won't. This could lead to problems, such as incorrectly initialized objects (because the constructor wasn' called) or un-freed resources (because the destructor wasn't called).
3) C++'s new also takes care of allocating the right amount of memory for the type specified, while you need to calculate this yourself with malloc():
int *ni = new int;
int *mi = (int*)malloc(sizeof(int)); // required amount of memory must be
// explicitly specified!
// (in some situations, you can make this
// a little safer against code changes by
// writing sizeof(*mi) instead.)
Conclusion:
In C++, new/delete should be preferred over malloc()/free() where possible. (In C, new/delete is not available, so the choice would be obvious there.)
[...] C/C++ it is a well known insecure language. [...]
Actually, that's wrong. Actually, "C/C++" doesn't even exist. There's C, and there's C++. They share some (or, if you want, a lot of) syntax, but they are indeed very different languages.
One thing they differ in vastly is their way to manage dynamic memory. The C way is indeed using malloc()/free() and if you need dynamic memory there's very little else you can do but use them (or a few siblings of malloc()).
The C++ way is to not to (manually) deal with dynamic resources (of which memory is but one) at all. Resource management is handed to a few well-implemented and -tested classes, preferably from the standard library, and then done automatically. For example, instead of manually dealing with zero-terminated character buffers, there's std::string, instead of manually dealing with dynamically allocated arrays, there std:vector, instead of manually dealing with open files, there's the std::fstream family of streams etc.
Your friend could be talking about:
The safety of using pointers in general. For example in C++ if you're allocating an array of char with malloc, question why you aren't using a string or vector. Pointers aren't insecure, but code that's buggy due to incorrect use of pointers is.
Something about malloc in particular. Most OSes clear memory before first handing it to a process, for security reasons. Otherwise, sensitive data from one app, could be leaked to another app. On OSes that don't do that, you could argue that there's an insecurity related to malloc. It's really more related to free.
It's also possible your friend doesn't know what he's talking about. When someone says "X is insecure", my response is, "in what way?".
Maybe your friend is older, and isn't familiar with how things work now - I used to think C and C++ were effectively the same until I discovered many new things about the language that have come out in the last 10 years (most of my teachers were old-school Bell Laboratories guys who wrote primarily in C and had only a cursory knowledge of C++ - and Bell Laboratories engineers invented C++!). Don't laugh at him/her - you might be there someday too!
I think your friend is uncomfortable with the idea that you have to do your own memory management - ie, its easy to make mistakes. In that regard, it is insecure and he/she is correct... However, that insecure aspect can be overcome with good programming practices, like RAII and using smart pointers.
For many applications, though, having automated garbage collection is probably fine, and some programmers are confused about how pointers work, so as far as getting new, inexperienced developers to program effectively in C/C++ without some training might be difficult. Which is maybe why your friend thinks C/C++ should be avoided.
It's the only way to allocate and deallocate memory in C natively. If you misuse it, it can be as insecure as anything else. Microsoft provides some "secure" versions of other functions, that take an extra size_t parametre - maybe your friend was referring to something similar? If that's the case, perhaps he simply prefers calloc() over malloc()?
If you are using C, you have to use malloc to allocate memory, unless you have a third-party library that will allocate / manage your memory for you.
Certainly your friend has a point that it is difficult to write secure code in C, especially when you are allocating memory and dealing with buffers. But we all know that, right? :)
What he maybe wanted to warn you is about pointers usage. Yes, that will cause problems if you don't understand how it works. Otherwise, ask what your friend meant, or ask him for a reference that proof his affirmation.
Saying that malloc is not safe is like saying "don't use system X because it's insecure".
Until that, use malloc in C, and new in C++.
If you use malloc in C++, people will look mad at you, but that's fine in very specific occasions.
There is nothing wrong with malloc as such. Your friend apparently means that manual memory management is insecure and easily leads to bugs. Compared to other languages where the memory is managed automatically by a garbage collector (not that it is not possible to have leaks - nowadays nobody cares if the program cleans up when it terminates, what matters is that something is not hogging memory while the program is running).
Of course in C++ you wouldn't really touch malloc at all (because it simply isn't functionally equivalent to new and just doesn't do what you need, assuming most of the time you don't want just to get raw memory). And in addition, it is completely possible to program using techniques which almost entirely eliminate the possibility of memory leaks and corruption (RAII), but that takes expertise.
Technically speaking, malloc was never secure to begin with, but that aside, the only thing I can think of is the infamous "OOM killer" (OOM = out-of-memory) that the Linux kernel uses. You can read up on it if you want. Other than that, I don't see how malloc itself is inherently insecure.
In C++, there is no such problem if you stick to good conventions. In C, well, practice. Malloc itself is not an inherently insecure function at all - people simply can deal with it's results inadequately.
It is not secure to use malloc because it's not possible to write a large scale application and ensure every malloc is freed in an efficient manner. Thus, you will have tons of memory leaks which may or may not be a problem... but, when you double free, or use the wrong delete etc, undefined behaviour can result. Indeed, using the wrong delete in C++ will typically allow arbitrary code execution.
The ONLY way for code written in a language like C or C++ to be secure is to mathematically prove the entire program with its dependencies factored in.
Modern memory-safe languages are safe from these types of bugs as long as the underlying language implementation isn't vulnerable (which is indeed rare because these are all written in C/C++, but as we move towards hardware JVMs, this problem will go away).
Perhaps the person was referring to the possibility of accessing data via malloc()?
Malloc doesn't affect the contents of the region that it provides, so it MAY be possible to collect data from other processes by mallocing a large area and then scanning the contents.
free() doesn't clear memory either so data paced into dynamically allocated buffers is, in principle, accessible.
I know someone who, many years ago admittedly, exploited malloc to create an inter-process communication scheme when he found that mallocs of equal size would return the address of the most recently free'd block.