This question already has answers here:
What are the barriers to understanding pointers and what can be done to overcome them? [closed]
(28 answers)
Closed 9 years ago.
Pointers are the core to programming languages like C and C++. This at the same time leads to many errors and memory leaks.
What are some precautions that must be taken while using pointers in C and C++?
Always initialize them
Check the bounds (size of pointer offset / index)
free the memory when done
Set to NULL after freeing
Check they are not NULL before accessing
When you malloc, use thing = malloc(N * sizeof *thing)
Don't overwrite a pointer that was malloced before you free it.
...
Some good advice there in the comments and Floris' answer, but IMO "Don't use pointers" isn't one of them
shared_ptrs are great to protect against leaks but you can't always use them. For example you are not supposed to use them with boost::intrusive containers.
additionally shared_ptrs wont help you if you have a container of said shared_ptrs and you just add but never remove from the container. You still "leak" the resources, though you haven't lost the ability to remove it.
other misc hints:
As with all resources I find it best to minimize the code-paths by which one type can be allocated & freed, so that I can match them up in review and/or instrumentation.
when allocating c-strings don't forget to reserve room for your terminator
Pointers are the core to programming languages like C and C++.
Not 'pointers' necessarily, let's talk about references ...
Note(!) that the role of 'pointers' has changed radically, when it comes to paradigms used in c vs. c++ (especially for c++11 language standards). So it would be difficult to handle them equally,
As for c++:
The usage of 'raw' c pointers is strongly discouraged with programming in c++, at least when these are to be allocated dynamically with new() or new[]() (which are the main point of being prone causing memory leaks, within your applications).
In c++ the use of reference (see & and && operators), which aren't available for c, is preferred whenever possible (since they can't lead to such thing as a 'dangling reference' vs. a 'dangling pointer').
The principle introduced in c++ is named RAII, and manages lifetime of any class instances mainly from the call stack scopes of any functions and execution paths present (no matter, if these are called within the same thread or not). I'm not saying that can't be implemented using just plain c, but it's more difficult and error prone.
In a c++ application, the proper memory management for heap allocated class instances should be done using the smart pointer facilitiess of c++11, or at least the use of the good old (meanwhile deprecated) std::auto_ptr class, for pre c++11 standards.
What are some precautions that must be taken while using pointers in C and C++?
There are some use cases for the usage of raw pointers in c++ of course (especially when interfacing between c and c++ APIs), but you should always test for their validity and know pretty good what you're actually doing! All the other cases are nicely covered by c++ standards, and you'll just need to use the right standard smart pointer class to get off from your problems.
Related
The std::get_temporary_buffer returns a std::pair holding a pointer to the beginning of the allocated storage and the number of objects allocated, and the only purpose of its counterpart: std::return_temporary_buffer is to deallocate memory previously allocated with std::get_temporary_buffer.
Both functions lies on the <memory> header which main purpose is to provide tools to enhance memory management (as it name implies) and make memory management more secure.
About the security of the memory management, the <memory> header provides also the smart pointers utility which allows to manage the memory in a RAII-like manner and hence making the memory management exception safe.
C++14 also added the std::make_unique helper function, so we can avoid using raw pointers in many cases nowadays.
With all this efforts in reducing the use of raw pointers, realizing that std::get_temporary_buffer returns a raw pointer instead of a smart pointer is pretty confusing. Thats why I want to ask:
Is there any reason for std::get_temporary_buffer to return a raw pointer instead of returning a smart one?
If there's a reason for this "old fashioned" way to allocate and deallocate memory manually, which goal it have that cannot be achieved with smart pointers?
The simple answer is that std::get_temporary_buffer was created before smart pointers were standardized, and changing the return value of std::get_temporary_buffer in C++11 would have broken code that depended on it, which is absolutely unacceptable for the C++ standard library.
Now, why haven't they standardized a new smart pointer equivalent?
Well, maybe no one was interested in having one. Personally, I find it weird to have one smart pointer own many objects. If you need a smart array, use std::vector.
If you look at the docs for the old SGI STL implementations of get_temporary_buffer et al, they say...
Note: get_temporary_buffer and return_temporary_buffer are only provided for backward compatibility. If you are writing new code, you should instead use the temporary_buffer class.
That effectively acknowledges the desirability of better automated management. GCC added temporary_buffer as an extension (see here), but it never made it into the Standard. Long and short of it is that it's just not that useful, so having a better interface won't have been a priority. The whole notion of the OS guessing at whether it should give you all the requested memory or some smaller amount flies in the face of the optimistic memory allocation strategies used by most modern Operating Systems, and once you get multiple calls requesting more than the easily available memory, being too generous with the first leaves the others a bit starved: just not a very practical notion.
If you care, you could submit a proposal for a later C++ Standard....
Difference between start-pointers and interior-pointers and in what situation we should prefer one over other?
As a complete guess, a "start-pointer" is a pointer returned by malloc or new[], whereas an "interior-pointer" is a pointer to the middle of the allocation.
If so, then the important difference is that you need to free the start-pointer, not an interior-pointer.
This isn't terminology from the standard, though. "Interior pointer" usually means a pointer into some larger block of memory and I guess/deduce the rest. So, you probably need to provide the context. What book/course/interview is the question from?
I believe Steve Jessop's answer is a correct answer that a start-pointer is a pointer returned by malloc(), etc. And an interior-pointers are pointers to places within that allocation. I cannot improve on his answer, but I will expand on it:
As an example, you might need up to a few thousand instances of some struct as a linked list. Instead of calling malloc() for the struct (or class) as needed, you call malloc() just once to allocate enough for a few thousdand instances. Then you create a free-list (a linked-list of the free instances). You can use and free by moving the instances (moving by adjusting the pointer-links) between the free-list and use list(s). Then, when the program no longer needs any of the instances of the struct, you call free() just on the start-pointer, the one originally returned by malloc().
I came across another definition of interior-pointer in the context of Windows and C++ programming for .NET Windows here: http://www.codeproject.com/Articles/8901/An-overview-of-interior-pointers-in-C-CLI.
In C++ / .NET, an interior-pointer can also mean a pointer to memory in CLI heap, i.e. .NET's managed memory. However, seems to me that it is fundamentally the same idea. With using C++ and C with .NET's manages memory, I suppose we are not concerned with starter-pointers because we will never call free() to deallocate. .NET does the garbage collection for us.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
In C++, why should new be used as little as possible?
Is it really a bad idea to use 'new' in instantiating a class in C++? Found here.
I get that using raw pointers is ill-advised, but why have a 'new' keyword at all when it's such bad practice? Or is it?
The point is that new, much like a pregnancy, creates a resource that is managed manually (i.e. by you), and as such it comes with responsibility.
C++ is a language for library writing, and any time you see a responsibility, the "C++" approach is to write a library element that handles this, and only this, responsibility. For dynamic memory allocation, those library components already exist and are summarily referred to as "smart pointers"; you'll want to look at std::unique_ptr and std::shared_ptr (or their TR1 or Boost equivalents).
While writing those single-responsibility building blocks, you will indeed need to say new and delete. But you do that once, and you think about it carefully and make sure you provide the correct copy, assignment and destruction semantics. (From the point of exception safety, single responsibility is crucial, since dealing with more than one single resource at a time is horribly unscalable.)
Once you have everything factored into suitable building blocks, you compose those blocks into bigger and bigger systems of code, but at that point you don't need to exercise any manual responsibility any more, since the building blocks already do this for you.
Since the standard library offers resource managing classes for the vast majority of use cases (dynamic arrays, smart pointers, file handles, strings), the point is that a well-factored and crafted C++ project should have very little need for any sort of manual resource management, which includes the use of new. All your handler objects are either automatic (scoped), or members of other classes whose instances are in turn scoped or managed by someone.
With this in mind, the only time you should be saying new is when you create a new resource-managing object; although even then that's not always necessary:
std::unique_ptr<Foo> p1(new Foo(1, 'a', -2.5)); // unique pointer
std::shared_ptr<Foo> p2(new Foo(1, 'a', -2.5)); // shared pointer
auto p3 = std::make_shared<Foo>(1, 'a', -2.5); // equivalent to p2, but better
Update: I think I may have addressed only half the OP's concerns. Many people coming from other languages seem to be under the impression that any object must be instantiated with a new-type expression. This is in itself a very unhelpful mindset when approaching C++:
The crucial distinction in C++ is that of object lifetime, or "storage class". This can be one of: automatic (scoped), static (permanent), or dynamic (manual). Global variables have static lifetime. The vast majority of all variables (which are declared as Foo x; inside a local scope) have automatic lifetime. It is only for dynamic storage that we use a new expression. The most important thing to realize when coming to C++ from another OO language is that most objects only ever need to have automatic lifetime, and thus there is never anything to worry about.
So the first realization should be that "C++ rarely needs dynamic storage". I feel that this may have been part of the OP's question. The question may have been better phrased as "is it a really bad idea to allocate objects dynamically?". It is only after you decide that you really need dynamic storage that we get to the discussion proper of whether you should be saying new and delete a lot, or if there are preferable alternatives, which is the point of my original answer.
Avoiding new as much as possible, means many benefits, such as:
First and foremost, you also avoid delete statements.Though smart pointers can help you here. So this is not that strong point.
You avoid Rule of Three (in C++03), or Rule of Five (in C++11). If you use new when designing a class, that is, when your class manages raw memory internally, you probably have to consider this rule.
It's easy to implement exception-safe code when you don't use new. Otherwise, you've to face hell lot of problems, making your code exception-safe.
Using new unnecessarily means you're inviting problems. I've seen when an inexperienced programmer uses new, he has often better alternatives, such as usage standard containers, and algorithms. Use of standard containers avoids most problems which comes with explicit use of new.
It's not bad, but everything you allocate with new, has to be deallocated with a delete. This is not always trivial to do, especially when you take into account exceptions.
I think that is what that post meant.
It's not a "bad idea to use new" -- the poster misstated his case rather badly. Rather, using new and not give you two different things.
new gives you a new, separately allocated instance of the class, and returns a pointer to that instance.
Using the class name without new creates an automatic instance of the class that will go "poof" when it passes out of scope. This "returns" the instance itself, not a pointer (and hence the syntax error in that other thread).
If you use new in the referenced case and added * to pass the compiler, it would result in a leaked object. If, on the other hand, you were passing a parameter to a method that was going to store it somewhere, and you passed a non-newed instance, making it work with &, you'd end up with a dangling pointer stored.
new what you delete, and free what you malloc (don't mix them, you'll get into trouble). Sometimes you have to use new, as data allocated with new will not fall out of scope... unless the data pointer is lost, which is the entire issue with new.
But that is err on the side of the programmer, not the keyword.
It depends on what the code needs. It the reply you refer to, the vector contains client instances, not pointers to client instances.
In C++, you can create object directly on the stack, without using new, like V1 and V2 in the code below:
void someFct()
{
std::vector<client> V1;
//....
std::vector<client*> V2;
}
When using V2, you will have to create new client instance with the new operation, but the client objects will not be released (deleted) when V2 will go out of scope. There is no garbage collector. You have to delete the objects before leaving the function.
To have the created instances deleted automatically, you can use std::shared_ptr. That make the code a bit longer to write, but it is simpler to maintain in the long term:
void someFct()
{
typedef std::shared_ptr<client> client_ptr;
typedef std::vector<client_ptr> client_array;
client_array V2;
V2.push_back(client_ptr(new client()));
// The client instance are now automatically released when the function ends,
// even if an exception is thrown.
}
Is it really a bad idea to use 'new' in instantiating a class in C++?
It’s often bad because it’s not necessary, and code gets much easier when you’re not using it spuriously. In cases where you can get away without using it, do so. I’ve written whole libraries without once using new.
I get that using raw pointers is ill-advised, but why have a 'new' keyword at all when it's such bad practice? Or is it?
It’s not universally bad, just unnecessary most of the time. But there are also times when it’s appropriate and that’s why there is such a keyword. That said, C++ could have gotten away without the keyword, since new conflates two concepts: 1. it allocates memory, and 2. it initialises the memory to an object.
You can decouple these processes by using other means of memory allocation, followed by a constructor invocation (“placement new”). This is actually done all over the place, such as the standard library, via allocators.
On the other hand, it’s rarely (read: never) meaningful for client code to manage uninitialised memory so it makes sense not to decouple these two processes. Hence the existence of new.
I saw some post about implement GC in C and some people said it's impossible to do it because C is weakly typed. I want to know how to implement GC in C++.
I want some general idea about how to do it. Thank you very much!
This is a Bloomberg interview question my friend told me. He did badly at that time. We want to know your ideas about this.
Garbage collection in C and C++ are both difficult topics for a few reasons:
Pointers can be typecast to integers and vice-versa. This means that I could have a block of memory that is reachable only by taking an integer, typecasting it to a pointer, then dereferencing it. A garbage collector has to be careful not to think a block is unreachable when indeed it still can be reached.
Pointers are not opaque. Many garbage collectors, like stop-and-copy collectors, like to move blocks of memory around or compact them to save space. Since you can explicitly look at pointer values in C and C++, this can be difficult to implement correctly. You would have to be sure that if someone was doing something tricky with typecasting to integers that you correctly updated the integer if you moved a block of memory around.
Memory management can be done explicitly. Any garbage collector will need to take into account that the user is able to explicitly free blocks of memory at any time.
In C++, there is a separation between allocation/deallocation and object construction/destruction. A block of memory can be allocated with sufficient space to hold an object without any object actually being constructed there. A good garbage collector would need to know, when it reclaims memory, whether or not to call the destructor for any objects that might be allocated there. This is especially true for the standard library containers, which often make use of std::allocator to use this trick for efficiency reasons.
Memory can be allocated from different areas. C and C++ can get memory either from the built-in freestore (malloc/free or new/delete), or from the OS via mmap or other system calls, and, in the case of C++, from get_temporary_buffer or return_temporary_buffer. The programs might also get memory from some third-party library. A good garbage collector needs to be able to track references to memory in these other pools and (possibly) would have to be responsible for cleaning them up.
Pointers can point into the middle of objects or arrays. In many garbage-collected languages like Java, object references always point to the start of the object. In C and C++ pointers can point into the middle of arrays, and in C++ into the middle of objects (if multiple inheritance is used). This can greatly complicate the logic for detecting what's still reachable.
So, in short, it's extremely hard to build a garbage collector for C or C++. Most libraries that do garbage collection in C and C++ are extremely conservative in their approach and are technically unsound - they assume that you won't, for example, take a pointer, cast it to an integer, write it to disk, and then load it back in at some later time. They also assume that any value in memory that's the size of a pointer could possibly be a pointer, and so sometimes refuse to free unreachable memory because there's a nonzero chance that there's a pointer to it.
As others have pointed out, the Boehm GC does do garbage collection for C and C++, but subject to the aforementioned restrictions.
Interestingly, C++11 includes some new library functions that allow the programmer to mark regions of memory as reachable and unreachable in anticipation of future garbage collection efforts. It may be possible in the future to build a really good C++11 garbage collector with this sort of information. In the meantime though, you'll need to be extremely careful not to break any of the above rules.
Look into the Boehm Garbage Collector.
C isn't C++, but both have the same "weakly typed" issues. It's not the implicit typecasts that cause an issue, though, but the tendency towards "punning" (subverting the type system), especially in data structure libraries.
There are garbage collectors out there for C and/or C++. The Boehm conservative collector is probably the best know. It's conservative in that, if it sees a bit pattern that looks like a pointer to some object, it doesn't collect that object. That value might be some other type of value completely, so the object could be collected, but "conservative" means playing safe.
Even a conservative collector can be fooled, though, if you use calculated pointers. There's a data structure, for example, where every list node has a field giving the difference between the next-node and previous-node addresses. The idea is to give double-linked list behaviour with a single link per node, at the expense of more complex iterators. Since there's no explicit pointer anywhere to most of the nodes, they may be wrongly collected.
Of course this is a very exceptional special case.
More important - you can either have reliable destructors or garbage collection, not both. When a garbage cycle is collected, the collector cannot decide which destructor to call first.
Since the RAII pattern is pervasive in C++, and that relies on destructors, there is IMO a conflict. There may be valid exceptions, but my view is that if you want garbage collection, you should use a language that's designed from the ground up for garbage collection (Java, C#, ...).
You could either use smart pointers or create your own container object which will track references and handle memory allocation etc. Smart pointers would probably be preferable. Often times you can avoid dynamic heap allocation altogether.
For example:
char* pCharArray = new char[128];
// do some stuff with characters
delete [] pCharArray;
The danger with the above being if anything throws between the new and the delete your delete will not be executed. Something like above could easily be replaced with safer "garbage collected" code:
std::vector<char> charArray;
// do some stuff with characters
Bloomberg has notoriously irrelevant interview questions from a practical coding standpoint. Like most interviewers they are primarily concerned with how you think and your communication skills than the actual solution though.
You can read about the shared_ptr struct.
It implements a simple reference-counting garbage collector.
If you want a real garbage collector, you can overload the new operator.
Create a struct similar to shared_ptr, call it Object.
This will wrap the new object created. Now with overloading its operators, you can control the GC.
All you need to do now, is just implement one of the many GC algorithms
The claim you saw is false; the Boehm collector supports C and C++. I suggest reading the Boehm collector's documentation (particularly this page)for a good overview of how one might write a garbage collector in C or C++.
Somebody told me that allocating with malloc is not secure anymore, I'm not a C/C++ guru but I've made some stuff with malloc and C/C++. Does anyone know about what risks I'm into?
Quoting him:
[..] But indeed the weak point of C/C++ it is the security, and the Achilles' heel is indeed malloc and the abuse of pointers. C/C++ it is a well known insecure language. [..] There would be few apps in what I would not recommend to continue programming with C++."
It's probably true that C++'s new is safer than malloc(), but that doesn't automatically make malloc() more unsafe than it was before. Did your friend say why he considers it insecure?
However, here's a few things you should pay attention to:
1) With C++, you do need to be careful when you use malloc()/free() and new/delete side-by-side in the same program. This is possible and permissible, but everything that was allocated with malloc() must be freed with free(), and not with delete. Similarly, everything that was allocated with new must be freed with delete, and never with free(). (This logic goes even further: If you allocate an array with new[], you must free it with delete[], and not just with delete.) Always use corresponding counterparts for allocation and deallocation, per object.
int* ni = new int;
free(ni); // ERROR: don't do this!
delete ni; // OK
int* mi = (int*)malloc(sizeof(int));
delete mi; // ERROR!
free(mi); // OK
2) malloc() and new (speaking again of C++) don't do exactly the same thing. malloc() just gives you a chunk of memory to use; new will additionally call a contructor (if available). Similarly, delete will call a destructor (if available), while free() won't. This could lead to problems, such as incorrectly initialized objects (because the constructor wasn' called) or un-freed resources (because the destructor wasn't called).
3) C++'s new also takes care of allocating the right amount of memory for the type specified, while you need to calculate this yourself with malloc():
int *ni = new int;
int *mi = (int*)malloc(sizeof(int)); // required amount of memory must be
// explicitly specified!
// (in some situations, you can make this
// a little safer against code changes by
// writing sizeof(*mi) instead.)
Conclusion:
In C++, new/delete should be preferred over malloc()/free() where possible. (In C, new/delete is not available, so the choice would be obvious there.)
[...] C/C++ it is a well known insecure language. [...]
Actually, that's wrong. Actually, "C/C++" doesn't even exist. There's C, and there's C++. They share some (or, if you want, a lot of) syntax, but they are indeed very different languages.
One thing they differ in vastly is their way to manage dynamic memory. The C way is indeed using malloc()/free() and if you need dynamic memory there's very little else you can do but use them (or a few siblings of malloc()).
The C++ way is to not to (manually) deal with dynamic resources (of which memory is but one) at all. Resource management is handed to a few well-implemented and -tested classes, preferably from the standard library, and then done automatically. For example, instead of manually dealing with zero-terminated character buffers, there's std::string, instead of manually dealing with dynamically allocated arrays, there std:vector, instead of manually dealing with open files, there's the std::fstream family of streams etc.
Your friend could be talking about:
The safety of using pointers in general. For example in C++ if you're allocating an array of char with malloc, question why you aren't using a string or vector. Pointers aren't insecure, but code that's buggy due to incorrect use of pointers is.
Something about malloc in particular. Most OSes clear memory before first handing it to a process, for security reasons. Otherwise, sensitive data from one app, could be leaked to another app. On OSes that don't do that, you could argue that there's an insecurity related to malloc. It's really more related to free.
It's also possible your friend doesn't know what he's talking about. When someone says "X is insecure", my response is, "in what way?".
Maybe your friend is older, and isn't familiar with how things work now - I used to think C and C++ were effectively the same until I discovered many new things about the language that have come out in the last 10 years (most of my teachers were old-school Bell Laboratories guys who wrote primarily in C and had only a cursory knowledge of C++ - and Bell Laboratories engineers invented C++!). Don't laugh at him/her - you might be there someday too!
I think your friend is uncomfortable with the idea that you have to do your own memory management - ie, its easy to make mistakes. In that regard, it is insecure and he/she is correct... However, that insecure aspect can be overcome with good programming practices, like RAII and using smart pointers.
For many applications, though, having automated garbage collection is probably fine, and some programmers are confused about how pointers work, so as far as getting new, inexperienced developers to program effectively in C/C++ without some training might be difficult. Which is maybe why your friend thinks C/C++ should be avoided.
It's the only way to allocate and deallocate memory in C natively. If you misuse it, it can be as insecure as anything else. Microsoft provides some "secure" versions of other functions, that take an extra size_t parametre - maybe your friend was referring to something similar? If that's the case, perhaps he simply prefers calloc() over malloc()?
If you are using C, you have to use malloc to allocate memory, unless you have a third-party library that will allocate / manage your memory for you.
Certainly your friend has a point that it is difficult to write secure code in C, especially when you are allocating memory and dealing with buffers. But we all know that, right? :)
What he maybe wanted to warn you is about pointers usage. Yes, that will cause problems if you don't understand how it works. Otherwise, ask what your friend meant, or ask him for a reference that proof his affirmation.
Saying that malloc is not safe is like saying "don't use system X because it's insecure".
Until that, use malloc in C, and new in C++.
If you use malloc in C++, people will look mad at you, but that's fine in very specific occasions.
There is nothing wrong with malloc as such. Your friend apparently means that manual memory management is insecure and easily leads to bugs. Compared to other languages where the memory is managed automatically by a garbage collector (not that it is not possible to have leaks - nowadays nobody cares if the program cleans up when it terminates, what matters is that something is not hogging memory while the program is running).
Of course in C++ you wouldn't really touch malloc at all (because it simply isn't functionally equivalent to new and just doesn't do what you need, assuming most of the time you don't want just to get raw memory). And in addition, it is completely possible to program using techniques which almost entirely eliminate the possibility of memory leaks and corruption (RAII), but that takes expertise.
Technically speaking, malloc was never secure to begin with, but that aside, the only thing I can think of is the infamous "OOM killer" (OOM = out-of-memory) that the Linux kernel uses. You can read up on it if you want. Other than that, I don't see how malloc itself is inherently insecure.
In C++, there is no such problem if you stick to good conventions. In C, well, practice. Malloc itself is not an inherently insecure function at all - people simply can deal with it's results inadequately.
It is not secure to use malloc because it's not possible to write a large scale application and ensure every malloc is freed in an efficient manner. Thus, you will have tons of memory leaks which may or may not be a problem... but, when you double free, or use the wrong delete etc, undefined behaviour can result. Indeed, using the wrong delete in C++ will typically allow arbitrary code execution.
The ONLY way for code written in a language like C or C++ to be secure is to mathematically prove the entire program with its dependencies factored in.
Modern memory-safe languages are safe from these types of bugs as long as the underlying language implementation isn't vulnerable (which is indeed rare because these are all written in C/C++, but as we move towards hardware JVMs, this problem will go away).
Perhaps the person was referring to the possibility of accessing data via malloc()?
Malloc doesn't affect the contents of the region that it provides, so it MAY be possible to collect data from other processes by mallocing a large area and then scanning the contents.
free() doesn't clear memory either so data paced into dynamically allocated buffers is, in principle, accessible.
I know someone who, many years ago admittedly, exploited malloc to create an inter-process communication scheme when he found that mallocs of equal size would return the address of the most recently free'd block.