Related
The system programming language Rust uses the ownership paradigm to ensure at compile time with zero cost for the runtime when a resource has to be freed.
In C++ we commonly use smart pointers to achieve the same goal of hiding the complexity of managing resource allocation. There are a couple of differences though:
In Rust there is always only one owner, whereas C++ shared_ptr can easily leak ownership.
In Rust we can borrow references we do not own, whereas C++ unique_ptr cannot be shared in a safe way via weak_ptr and lock().
Reference counting of shared_ptr is costly.
My question is: How can we emulate the ownership paradigm in C++ within the following constraints:
Only one owner at any time
Possibility to borrow a pointer and use it temporarily without fear of the resource going out of scope (observer_ptr is useless for this)
As much compile-time checks as possible.
Edit: Given the comments so far, we can conclude:
No compile-time support for this (I was hoping for some decltype/template magic unknown to me) in the compilers. Might be possible using static analysis elsewhere (taint?)
No way to get this without reference counting.
No standard implementation to distinguish shared_ptrs with owning or borrowing semantic
Could roll your own by creating wrapper types around shared_ptr and weak_ptr:
owned_ptr: non-copyable, move-semantics, encapsulates shared_ptr, access to borrowed_ptr
borrowed_ptr: copyable, encapsulates weak_ptr, lock method
locked_ptr: non-copyable, move-semantics, encapsulates shared_ptr from locking weak_ptr
You can't do this with compile-time checks at all. The C++ type system is lacking any way to reason about when an object goes out of scope, is moved, or is destroyed — much less turn this into a type constraint.
What you could do is have a variant of unique_ptr that keeps a counter of how many "borrows" are active at run time. Instead of get() returning a raw pointer, it would return a smart pointer that increments this counter on construction and decrements it on destruction. If the unique_ptr is destroyed while the count is non-zero, at least you know someone somewhere did something wrong.
However, this is not a fool-proof solution. Regardless of how hard you try to prevent it, there will always be ways to get a raw pointer to the underlying object, and then it's game over, since that raw pointer can easily outlive the smart pointer and the unique_ptr. It will even sometimes be necessary to get a raw pointer, to interact with an API that requires raw pointers.
Moreover, ownership is not about pointers. Box/unique_ptr allows you to heap allocate an object, but it changes nothing about ownership, life time, etc. compared to putting the same object on the stack (or inside another object, or anywhere else really). To get the same mileage out of such a system in C++, you'd have to make such "borrow counting" wrappers for all objects everywhere, not just for unique_ptrs. And that is pretty impractical.
So let's revisit the compile time option. The C++ compiler can't help us, but maybe lints can? Theoretically, if you implement the whole life time part of the type system and add annotations to all APIs you use (in addition to your own code), that may work.
But it requires annotations for all functions used in the whole program. Including private helper function of third party libraries. And those for which no source code is available. And for those whose implementation that are too complicated for the linter to understand (from Rust experience, sometimes the reason something is safe are too subtle to express in the static model of lifetimes and it has to be written slightly differently to help the compiler). For the last two, the linter can't verify that the annotation is indeed correct, so you're back to trusting the programmer. Additionally, some APIs (or rather, the conditions for when they are safe) can't really be expressed very well in the lifetime system as Rust uses it.
In other words, a complete and practically useful linter for this this would be substantial original research with the associated risk of failure.
Maybe there is a middle ground that gets 80% of the benefits with 20% of the cost, but since you want a hard guarantee (and honestly, I'd like that too), tough luck. Existing "good practices" in C++ already go a long way to minimizing the risks, by essentially thinking (and documenting) the way a Rust programmer does, just without compiler aid. I'm not sure if there is much improvement over that to be had considering the state of C++ and its ecosystem.
tl;dr Just use Rust ;-)
What follows are some examples of ways people have tried to emulate parts of Rust's ownership paradigm in C++, with limited success:
Lifetime safety: Preventing common dangling. The most thorough and rigorous approach, involving several additions to the language to support the necessary annotations. If the effort is still alive (last commit was in 2019), getting this analysis added to a mainstream compiler is probably the most likely route to "borrow checked" C++. Discussed on IRLO.
Borrowing Trouble: The Difficulties Of A C++ Borrow-Checker
Is it possible to achieve Rust's ownership model with a generic C++ wrapper?
C++Now 2017: Jonathan Müller “Emulating Rust's borrow checker in C++" (video) and associated code, about which the author says, "You're not actually supposed to use that, if you need such a feature, you should use Rust."
Emulating the Rust borrow checker with C++ move-only types and part II (which is actually more like emulating RefCell than the borrow checker, per se)
I believe you can get some of the benefits of Rust by enforcing some strict coding conventions (which is after all what you'd have to do anyway, since there's no way with "template magic" to tell the compiler not to compile code that doesn't use said "magic"). Off the top of my head, the following could get you...well...kind of close, but only for single-threaded applications:
Never use new directly; instead, use make_unique. This goes partway toward ensuring that heap-allocated objects are "owned" in a Rust-like manner.
"Borrowing" should always be represented via reference parameters to function calls. Functions that take a reference should never create any sort of pointer to the refered-to object. (It may in some cases be necessary to use a raw pointer as a paramter instead of a reference, but the same rule should apply.)
Note that this works for objects on the stack or on the heap; the function shouldn't care.
Transfer of ownership is, of course, represented via R-value references (&&) and/or R-value references to unique_ptrs.
Unfortunately, I can't think of any way to enforce Rust's rule that mutable references can only exist anywhere in the system when there are no other extant references.
Also, for any kind of parallelism, you would need to start dealing with lifetimes, and the only way I can think of to permit cross-thread lifetime management (or cross-process lifetime management using shared memory) would be to implement your own "ptr-with-lifetime" wrapper. This could be implemented using shared_ptr, because here, reference-counting would actually be important; it's still a bit of unnecessary overhead, though, because reference-count blocks actually have two reference counters (one for all the shared_ptrs pointing to the object, another for all the weak_ptrs). It's also a little... odd, because in a shared_ptr scenario, everybody with a shared_ptr has "equal" ownership, whereas in a "borrowing with lifetime" scenario, only one thread/process should actually "own" the memory.
I think one could add a degree of compile-time introspection and custom sanitisation by introducing custom wrapper classes that track ownership and borrowing.
The code below is a hypothetical sketch, and not a production solution which would need a lot more tooling, e.g. #def out the checks when not sanitising. It uses a very naive lifetime checker to 'count' borrow errors in ints, in this instance during compilation. static_asserts are not possible as the ints are not constexpr, but the values are there and can be interrogated before runtime. I believe this answers your 3 constraints, regardless of whether these are heap allocations, so I'm using a simple int type to demo the idea, rather than a smart pointer.
Try uncommenting the use cases in main() below (run in compiler explorer with -O3 to see boilerplate optimise away), and you'll see the warning counters change.
https://godbolt.org/z/Pj4WMr
// Hypothetical Rust-like owner / borrow wrappers in C++
// This wraps types with data which is compiled away in release
// It is not possible to static_assert, so this uses static ints to count errors.
#include <utility>
// Statics to track errors. Ideally these would be static_asserts
// but they depen on Owner::has_been_moved which changes during compilation.
static int owner_already_moved = 0;
static int owner_use_after_move = 0;
static int owner_already_borrowed = 0;
// This method exists to ensure static errors are reported in compiler explorer
int get_fault_count() {
return owner_already_moved + owner_use_after_move + owner_already_borrowed;
}
// Storage for ownership of a type T.
// Equivalent to mut usage in Rust
// Disallows move by value, instead ownership must be explicitly moved.
template <typename T>
struct Owner {
Owner(T v) : value(v) {}
Owner(Owner<T>& ov) = delete;
Owner(Owner<T>&& ov) {
if (ov.has_been_moved) {
owner_already_moved++;
}
value = std::move(ov.value);
ov.has_been_moved = true;
}
T& operator*() {
if (has_been_moved) {
owner_use_after_move++;
}
return value;
}
T value;
bool has_been_moved{false};
};
// Safely borrow a value of type T
// Implicit constuction from Owner of same type to check borrow is safe
template <typename T>
struct Borrower {
Borrower(Owner<T>& v) : value(v.value) {
if (v.has_been_moved) {
owner_already_borrowed++;
}
}
const T& operator*() const {
return value;
}
T value;
};
// Example of function borrowing a value, can only read const ref
static void use(Borrower<int> v) {
(void)*v;
}
// Example of function taking ownership of value, can mutate via owner ref
static void use_mut(Owner<int> v) {
*v = 5;
}
int main() {
// Rather than just 'int', Owner<int> tracks the lifetime of the value
Owner<int> x{3};
// Borrowing value before mutating causes no problems
use(x);
// Mutating value passes ownership, has_been_moved set on original x
use_mut(std::move(x));
// Uncomment for owner_already_borrowed = 1
//use(x);
// Uncomment for owner_already_moved = 1
//use_mut(std::move(x));
// Uncomment for another owner_already_borrowed++
//Borrower<int> y = x;
// Uncomment for owner_use_after_move = 1;
//return *x;
}
The use of static counters is obviously not desirable, but it is not possible to use static_assert as owner_already_moved is non-const. The idea is these statics give hints to errors appearing, and in final production code they could be #defed out.
You can use an enhanced version of a unique_ptr (to enforce a unique owner) together with an enhanced version of observer_ptr (to get a nice runtime exception for dangling pointers, i.e. if the original object maintained through unique_ptr went out of scope). The Trilinos package implements this enhanced observer_ptr, they call it Ptr. I have implemented the enhanced version of unique_ptr here (I call it UniquePtr): https://github.com/certik/trilinos/pull/1
Finally, if you want the object to be stack allocated, but still be able to pass safe references around, you need to use the Viewable class, see my initial implementation here: https://github.com/certik/trilinos/pull/2
This should allow you to use C++ just like Rust for pointers, except that in Rust you get a compile time error, while in C++ you get a runtime exception. Also, it should be noted, that you only get a runtime exception in Debug mode. In Release mode, the classes do not do these checks, so they are as fast as in Rust (essentially as fast as raw pointers), but then they can segfault. So one has to make sure the whole test suite runs in Debug mode.
I know reference counter technique but never heard of mark-sweep technique until today, when reading the book named "Concepts of programming language".
According to the book:
The original mark-sweep process of garbage collection operates as follow: The runtime system allocates storage cells as requested and disconnects pointers from cells as necessary, without regard of storage reclamation ( allowing garbage to accumulate), until it has allocated all available cells. At this point, a mark-sweep process is begun to gather all the garbage left floating-around in the heap. To facilitate the process, every heap cells has an extra indicator bit or field that is used by the collection algorithm.
From my limited understanding, smart-pointers in C++ libraries use reference counting technique. I wonder is there any library in C++ using this kind of implementation for smart-pointers? And since the book is purely theoretical, I could not visualize how the implementation is done. An example to demonstrate this idea would be greatly valuable. Please correct me if I'm wrong.
Thanks,
There is one difficulty to using garbage collection in C++, it's to identify what is pointer and what is not.
If you can tweak a compiler to provide this information for each and every object type, then you're done, but if you cannot, then you need to use conservative approach: that is scanning the memory searching for any pattern that may look like a pointer. There is also the difficulty of "bit stuffing" here, where people stuff bits into pointers (the higher bits are mostly unused in 64 bits) or XOR two different pointers to "save space".
Now, in C++0x the Standard Committee introduced a standard ABI to help implementing Garbage Collection. In n3225 you can find it at 20.9.11 Pointer safety [util.dynamic.safety]. This supposes that people will implement those functions for their types, of course:
void declare_reachable(void* p); // throw std::bad_alloc
template <typename T> T* undeclare_reachable(T* p) noexcept;
void declare_no_pointers(char* p, size_t n) noexcept;
void undeclare_no_pointers(char* p, size_t n) noexcept;
pointer_safety get_pointer_safety() noexcept;
When implemented, it will authorize you to plug any garbage collection scheme (defining those functions) into your application. It will of course require some work of course to actually provide those operations wherever they are needed. One solution could be to simply override new and delete but it does not account for pointer arithmetic...
Finally, there are many strategies for Garbage Collection: Reference Counting (with Cycle Detection algorithms) and Mark And Sweep are the main different systems, but they come in various flavors (Generational or not, Copying/Compacting or not, ...).
Although they may have upgraded it by now, Mozilla Firefox used to use a hybrid approach in which reference-counted smart pointers were used when possible, with a mark-and-sweep garbage collector running in parallel to clean up reference cycles. It's possible other projects have adopted this approach, though I'm not fully sure.
The main reason that I could see C++ programmers avoiding this type of garbage collection is that it means that object destructors would run asynchronously. This means that if any objects were created that held on to important resources, such as network connections or physical hardware, the cleanup wouldn't be guaranteed to occur in a timely fashion. Moreover, the destructors would have to be very careful to use appropriate synchronization if they were to access shared resources, while in a single-threaded, straight reference-counting solution this wouldn't be necessary.
The other complexity of this approach is that C++ allows for raw arithmetic operations on pointers, which greatly complicates the implementation of any garbage collector. It's possible to conservatively solve this problem (look at the Boehm GC, for example), though it's a significant barrier to building a system of this sort.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Common Uses For Pointers?
I am still learning the basics of C++ but I already know enough to do useful little programs.
I understand the concept of pointers and the examples I see in tutorials make sense to me. However, on the practical level, and being a (former) PHP developer, I am not yet confident to actually use them in my programs.
In fact, so far I have not felt the need to use any pointer. I have my classes and functions and I seem to be doing perfectly fine without using any pointer (let alone pointers to pointers). And I can't help feeling a bit proud of my little programs.
Still, I am aware that I am missing on one of C++'s most important feature, a double edged one: pointers and memory management can create havoc, seemingly random crashes, hard to find bugs and security holes... but at the same time, properly used, they must allow for clever and efficient programming.
So: do tell me what I am missing by not using pointers.
What are good scenarios where using pointers is a must?
What do they allow you to do that you couldn't do otherwise?
In which way to they make your programs more efficient?
And what about pointers to pointers???
[Edit: All the various answers are useful. One problem at SO is that we cannot "accept" more than one answer. I often wish I could. Actually, it's all the answers combined that help to understand better the whole picture. Thanks.]
I use pointers when I want to give a class access to an object, without giving it ownership of that object. Even then, I can use a reference, unless I need to be able to change which object I am accessing and/or I need the option of no object, in which case the pointer would be NULL.
This question has been asked on SO before. My answer from there:
I use pointers about once every six lines in the C++ code that I write. Off the top of my head, these are the most common uses:
When I need to dynamically create an object whose lifetime exceeds the scope in which it was created.
When I need to allocate an object whose size is unknown at compile time.
When I need to transfer ownership of an object from one thing to another without actually copying it (like in a linked list/heap/whatever of really big, expensive structs)
When I need to refer to the same object from two different places.
When I need to slice an array without copying it.
When I need to use compiler intrinsics to generate CPU-specific instructions, or work around situations where the compiler emits suboptimal or naive code.
When I need to write directly to a specific region of memory (because it has memory-mapped IO).
Pointers are commonly used in C++. Becoming comfortable with them, will help you understand a broader range of code. That said if you can avoid them that is great, however, in time as your programs become more complex, you will likely need them even if only to interface with other libraries.
Primarily pointers are used to refer to dynamically allocated memory (returned by new).
They allow functions to take arguments that cannot be copied onto the stack either because they are too big or cannot be copied, such as an object returned by a system call. (I think also stack alignment, can be an issue, but too hazy to be confident.)
In embedded programing they are used to refer to things like hardware registers, which require that the code write to a very specific address in memory.
Pointers are also used to access objects through their base class interfaces. That is if I have a class B that is derived from class A class B : public A {}. That is an instance of the object B could be accessed as if it where class A by providing its address to a pointer to class A, ie: A *a = &b_obj;
It is a C idiom to use pointers as iterators on arrays. This may still be common in older C++ code, but is probably considered a poor cousin to the STL iterator objects.
If you need to interface with C code, you will invariable need to handle pointers which are used to refer to dynamically allocated objects, as there are no references. C strings are just pointers to an array of characters terminated by the nul '\0' character.
Once you feel comfortable with pointers, pointers to pointers won't seem so awful. The most obvious example is the argument list to main(). This is typically declared as char *argv[], but I have seen it declared (legally I believe) as char **argv.
The declaration is C style, but it says that I have array of pointers to pointers to char. Which is interpreted as a arbitrary sized array (the size is carried by argc) of C style strings (character arrays terminated by the nul '\0' character).
If you haven't felt a need for pointers, I wouldn't spend a lot of time worrying about them until a need arises.
That said, one of the primary ways pointers can contribute to more efficient programming is by avoiding copies of actual data. For example, let's assume you were writing a network stack. You receive an Ethernet packet to be processed. You successively pass that data up the stack from the "raw" Ethernet driver to the IP driver to the TCP driver to, say, the HTTP driver to something that processes the HTML it contains.
If you're making a new copy of the contents for each of those, you end up making at least four copies of the data before you actually get around to rendering it at all.
Using pointers can avoid a lot of that -- instead of copying the data itself, you just pass around a pointer to the data. Each successive layer of the network stack looks at its own header, and passes a pointer to what it considers the "payload" up to the next higher layer in the stack. That next layer looks at its own header, modifies the pointer to show what it considers the payload, and passes it on up the stack. Instead of four copies of the data, all four layers work with one copy of the real data.
A big use for pointers is dynamic sizing of arrays. When you don't know the size of the array at compile time, you will need to allocate it at run-time.
int *array = new int[dynamicSize];
If your solution to this problem is to use std::vector from the STL, they use dynamic memory allocation behind the scenes.
There are several scenarios where pointers are required:
If you are using Abstract Base Classes with virtual methods. You can hold a std::vector and loop through all these objects and call a virtual method. This REQUIRES pointers.
You can pass a pointer to a buffer to a method reading from a file etc.
You need a lot of memory allocated on the heap.
It's a good thing to care about memory problems right from the start. So if you start using pointers, you might as well take a look at smart pointers, like boost's shared_ptr for example.
What are good scenarios where using pointers is a must?
Interviews. Implement strcpy.
What do they allow you to do that you couldn't do otherwise?
Use of inheritance hierarchy. Data structures like Binary trees.
In which way to they make your programs more efficient?
They give more control to the programmer, for creating and deleting resources at run time.
And what about pointers to pointers???
A frequently asked interview question. How will you create two dimensional array on heap.
A pointer has a special value, NULL, that reference's won't. I use pointers wherever NULL is a valid and useful value.
I just want to say that i rarely use pointers. I use references and stl objects (deque, list, map, etc).
A good idea is when you need to return an object where the calling function should free or when you dont want to return by value.
List<char*>* fileToList(char*filename) { //dont want to pass list by value
ClassName* DataToMyClass(DbConnectionOrSomeType& data) {
//alternatively you can do the below which doesnt require pointers
void DataToMyClass(DbConnectionOrSomeType& data, ClassName& myClass) {
Thats pretty much the only situation i use but i am not thinking that hard. Also if i want a function to modify a variable and cant use the return value (say i need more then one)
bool SetToFiveIfPositive(int**v) {
You can use them for linked lists, trees, etc.
They're very important data structures.
In general, pointers are useful as they can hold the address of a chunk of memory. They are especially useful in some low level drivers where they are efficiently used to operate on a piece of memory byte by byte. They are most powerful invention that C++ inherits from C.
As to pointer to pointer, here is a "hello-world" example showing you how to use it.
#include <iostream>
void main()
{
int i = 1;
int j = 2;
int *pInt = &i; // "pInt" points to "i"
std::cout<<*pInt<<std::endl; // prints: 1
*pInt = 6; // modify i, i = 6
std::cout<<i<<std::endl; // prints: 6
int **ppInt = &pInt; // "ppInt" points to "pInt"
std::cout<<**ppInt<<std::endl; // prints: 6
**ppInt = 8; // modify i, i = 8
std::cout<<i<<std::endl; // prints: 8
*ppInt = &j; // now pInt points to j
*pInt = 10; // modify j, j = 10
std::cout<<j<<std::endl; // prints: 10
}
As we see, "pInt" is a pointer to integer which points to "i" at the beginning. With it, you can modify "i". "ppInt" is a pointer to pointer which points to "pInt". With it, you can modify "pInt" which happens to be an address. As a result, "*ppInt = &j" makes "pInt" points to "j" now. So we have all the results above.
I remember hearing that the following code is not C++ compliant and was hoping someone with much more C++ legalese than me would be able to confirm or deny it.
std::vector<int*> intList;
intList.push_back(new int(2));
intList.push_back(new int(10));
intList.push_back(new int(17));
for(std::vector<int*>::iterator i = intList.begin(); i != intList.end(); ++i) {
delete *i;
}
intList.clear()
The rationale was that it is illegal for a vector to contain pointers to invalid memory. Now obviously my example will compile and it will even work on all compilers I know of, but is it standard compliant C++ or am I supposed to do the following, which I was told is in fact the standard compliant approach:
while(!intList.empty()) {
int* element = intList.back();
intList.pop_back();
delete element;
}
You code is valid, but the better solution will be to use smart pointers.
The thing is that all requirements to std::vector are located in 23.2.4 section of C++ Standard. There're no limitations about invalid pointers. std::vector works with int* as with any other type (we doesn't consider the case of vector<bool>), it doesn't care where they are point to.
Your code is fine. If you're worried for some reason about the elements being invalid momentarily, then change the body of the loop to
int* tmp = 0;
swap (tmp, *i);
delete tmp;
The C++ philosophy is to allow the programmer as much latitude as possible, and to only ban things that are actually going to cause harm. Invalid pointers do no harm in themselves, and therefore you can have them around freely. What will cause harm is using the pointer in any way, and that therefore invokes undefined behavior.
Ultimately, this is a question of personal taste more than anything. It's not "standards non-compliant" to have a vector that contains invalid pointers, but it is dangerous, just like it's dangerous to have any pointer that points to invalid memory. Your latter example will ensure that your vector never contains a bad pointer, yes, so it's the safest choice.
But if you knew that the vector would never be used during your former example's loop (if the vector is locally scoped, for example), it's perfectly fine.
Where did you hear that? Consider this:
std::vector<int *> intList(5);
I just created a vector filled with 5 invalid pointers.
In storing raw pointers in a container (I wouldn't recommend this) then having to do a 2 phase delete, I would choose your first option over the second.
I believe container::clear() will delete the contents of the map more efficiently than popping a single item at a time.
You could probably turn the for loop into a nice (psuedo) forall(begin(),end(),delete) and make it more generic so it didn't even matter if you changed from vector to some other container.
I don't believe this is an issue of standards compliance. The C++ standards define the syntax of the language and implementation requirements. You are using the STL which is a powerful library, but like all libraries it is not part of C++ itself...although I guess it could be argued that when used aggressively, libraries like STL and Qt extend the language into a different superset language.
Invalid pointers are perfectly compliant with the C++ standards, the computer just won't like it when you dereference them.
What you are asking is more of a best practices question. If your code is multi-threaded and intList is potentially shared, then your first approach may be more dangerous, but as Greg suggested if you know that intList can't be accessed then the first approach may be more efficient. That said, I believe safety should usually win in a trade-off until you know there is a performance problem.
As suggested by the Design by Contract concept, all code defines a contract whether implicit or explicit. The real issue with code like this is what are you promising the user: preconditions, postconditions, invariants, etc. The libraries make a certain contract and each function you write defines its own contract. You just need to pick the appropriate balance for you code, and as long as you make it clear to the user (or yourself six months from now) what is safe and what isn't, it will be okay.
If there are best practices documented with with an API, then use them whenever possible. They probably are best practices for a reason. But remember, a best practice may be in the eye of the beholder...that is they may not be a best practice in all situations.
it is illegal for a vector to contain
pointers to invalid memory
This is what the Standard has to say about the contents of a container:
(23.3) : The type of objects stored in these components must meet the requirements of CopyConstructible types (20.1.3), and the additional requirements of Assignable types.
(20.1.3.1, CopyConstructible) : In the following Table 30, T is a type to be supplied by a C + + program instantiating a template, t is a value of type T, and u is a value of type const T.
expression return type requirement
xxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
T(t) t is equivelant to T(t)
T(u) u is equivelant to T(u)
t.~T()
&t T* denotes the address of t
&u const T* denotes the address of u
(23.1.4, Assignable) : 64, T is the type used to instantiate the container, t is a value of T, and u is a value of (possibly
const) T.
expression return type requirement
xxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
t = u T& t is equivilant to u
That's all that is says about the contents of an STL collection. It says nothing about pointers and it is particularly silent about the pointers pointing to valid memory.
Therefore, deleteing pointers in a vector, while most likely a very bad architectural decision and an invitation to pain and suffering with the debugger at 3:00 AM on a Saturday night, is perfectly legal.
EDIT:
Regarding Kranar's comment that "assigning a pointer to an invalid pointer value results in undefined behavior." No, this is incorrect. This code is perfectly valid:
Foo* foo = new Foo();
delete foo;
Foo* foo_2 = foo; // This is legal
What is illegal is trying to do something with that pointer (or foo, for that matter):
delete foo_2; // UB
foo_2->do_something(); // UB
Foo& foo_ref = *foo_2; // UB
Simply creating a wild pointer is legal according to the Standard. Probably not a good idea, but legal nonetheless.
EDIT2:
More from the Standard regarding pointer types.
So sayeth the Standard (3.9.2.3) :
... A valid value of an object pointer
type represents either the address of
a byte in memory (1.7) or a null
pointer (4.10)...
...and regarding "a byte in memory," (1.7.1) :
The fundamental storage unit in the C
+ + memory model is the byte. A byte is at least large enough to contain
any member of the basic execution
character set and is composed of a
contiguous sequence of bits, the
number of which is
implementation-defined. The least
significant bit is called the
low-order bit; the most significant
bit is called the high-order bit. The
memory available to a C + + program
consists of one or more sequences of
contiguous bytes. Every byte has a
unique address.
There is nothing here about that byte being part of a living Foo, about you having access to it, or anything of the sort. Its just a byte in memory.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are the often misunderstood concepts in c++?
C++ is not C with classes!
And there is no language called C/C++. Everything goes downhill from there.
That C++ does have automatic resource management.
(Most people who claim that C++ does not have memory management try to use new and delete way too much, not realising that if they allowed C++ to manage the resource themselves, the task gets much easier).
Example: (Made with a made up API because I do not have time to check the docs now)
// C++
void DoSomething()
{
File file("/tmp/dosomething", "rb");
... do stuff with file...
// file is automatically free'ed and closed.
}
// C#
public void DoSomething()
{
File file = new File("/tmp/dosomething", "rb");
... do stuff with file...
// file is NOT automatically closed.
// What if the caller calls DoSomething() in a tight loop?
// C# requires you to be aware of the implementation of the File class
// and forces you to accommodate, thus voiding implementation-hiding
// principles.
// Approaches may include:
// 1) Utilizing the IDisposable pattern.
// 2) Utilizing try-finally guards, which quickly gets messy.
// 3) The nagging doubt that you've forgotten something /somewhere/ in your
// 1 million loc project.
// 4) The realization that point #3 can not be fixed by fixing the File
// class.
}
Free functions are not bad just because they are not within a class C++ is not an OOP language alone, but builds upon a whole stack of techniques.
I've heard it many times when people say free functions (those in namespaces and global namespace) are a "relict of C times" and should be avoided. Quite the opposite is true. Free functions allow to decouple functions from specific classes and allow reuse of functionality. It's also recommended to use free functions instead of member functions if the function don't need access to implementation details - because this will eliminate cascading changes when one changes the implementation of a class among other advantages.
This is also reflected in the language: The range-based for loop in C++0x (next C++ version released very soon) will be based on free function calls. It will get begin / end iterators by calling the free functions begin and end.
The difference between assignment and initialisation:
string s = "foo"; // initialisation
s = "bar"; // assignment
Initialisation always uses constructors, assignment always uses operator=
In decreasing order:
make sure to release pointers for allocated memory
when destructors should be virtual
how virtual functions work
Interestingly not many people know the full details of virtual functions, but still seem to be ok with getting work done.
The most pernicious concept I've seen is that it should be treated as C with some addons. In fact, with modern C++ systems, it should be treated as a different language, and most of the C++-bashing I see is based on the "C with add-ons" model.
To mention some issues:
While you probably need to know the difference between delete and delete[], you should normally be writing neither. Use smart pointers and std::vector<>.
In fact, you should be using a * only rarely. Use std::string for strings. (Yes, it's badly designed. Use it anyway.)
RAII means you don't generally have to write clean-up code. Clean-up code is bad style, and destroys conceptual locality. As a bonus, using RAII (including smart pointers) gives you a lot of basic exception safety for free. Overall, it's much better than garbage collection in some ways.
In general, class data members shouldn't be directly visible, either by being public or by having getters and setters. There are exceptions (such as x and y in a point class), but they are exceptions, and should be considered as such.
And the big one: there is no such language as C/C++. It is possible to write programs that can compile properly under either language, but such programs are not good C++ and are not normally good C. The languages have been diverging since Stroustrup started working on "C with Classes", and are less similar now than ever. Using "C/C++" as a language name is prima facie evidence that the user doesn't know what he or she is talking about. C++, properly used, is no more like C than Java or C# are.
The overuse of inheritance unrelated to polymorphism. Most of the time, unless you really do use runtime polymorphism, composition or static polymorphism (i.e., templates) is better.
The static keyword which can mean one of three distinct things depending on where it is used.
It can be a static member function or member variable.
It can be a static variable or function declared at namespace scope.
It can be a static variable declared inside a function.
Arrays are not pointers
They are different. So &array is not a pointer to a pointer, but a pointer to an array. This is the most misunderstood concept in both C and C++ in my opinion. You gotta have a visit to all those SO answers that tell to pass 2-d arrays as type** !
Here is an important concept in C++ that is often forgotten:
C++ should not be simply used like an object
oriented language such as Java or C#.
Inspire yourself from the STL and write generic code.
Here are some:
Using templates to implement polymorphism without vtables, à la ATL.
Logical const-ness vs actual const-ness in memory. When to use the mutable keyword.
ACKNOWLEDGEMENT: Thanks for correcting my mistake, spoulson.
EDIT:
Here are more:
Virtual inheritance (not virtual methods): In fact, I don't understand it at all! (by that, I mean I don't know how it's implemented)
Unions whose members are objects whose respective classes have non-trivial constructors.
Given this:
int x = sizeof(char);
what value is X?
The answer you often hear is dependant on the level of understanding of the specification.
Beginner - x is one because chars are always eight bit values.
Intermediate - it depends on the compiler implementation, chars could be UTF16 format.
Expert - x is one and always will be one since a char is the smallest addressable unit of memory and sizeof determines the number of units of memory required to store an instance of the type. So in a system where a char is eight bits, a 32 bit value will have a sizeof of 4; but in a system where a char is 16 bits, a 32 bit value will have a sizeof of 2.
It's unfortunate that the standard uses 'byte' to refer to a unit of memory since many programmers think of 'byte' as being eight bits.
C++ is a multi-paradigm language. Many people associate C++ strictly with OOP.
a classic among beginners to c++ from c:
confuse delete and delete[]
EDIT:
another classic failure among all levels of experience when using C API:
std::string helloString = "hello world";
printf("%s\n", helloString);
instead of:
printf("%s\n", helloString.c_str());
it happens to me every week. You could use streams, but sometimes you have to deal with printf-like APIs.
Pointers.
Dereferencing the pointers. Through either . or ->
Address of using & for when a pointer is required.
Functions that take params by reference by specifing a & in the signature.
Pointer to pointers to pointers *** or pointers by reference void someFunc(int *& arg)
There are a few things that people seem to be constantly confused by or have no idea about:
Pointers, especially function pointers and multiple pointers (e.g. int(*)(void*), void***)
The const keyword and const correctness (e.g. what is the difference between const char*, char* const and const char* const, and what does void class::member() const; mean?)
Memory allocation (e.g. every pointer new'ed should be deleted, malloc/free should not be mixed with new/delete, when to use delete [] instead of delete, why the C functions are still useful (e.g. expand(), realloc()))
Scope (i.e. that you can use { } on its own to create a new scope for variable names, rather than just as part of if, for etc...)
Switch statements. (e.g. not understanding that they can optimise as well (or better in some cases) than chains of ifs, not understanding fall through and its practical applications (loop unrolling as an example) or that there is a default case)
Calling conventions (e.g. what is the difference between cdecl and stdcall, how would you implement a pascal function, why does it even matter?)
Inheritance and multiple inheritance and, more generally, the entire OO paradigm.
Inline assembler, as it is usually implemented, is not part of C++.
Pointers to members and pointers to member functions.
Non-type template parameters.
Multiple inheritance, particularly virtual base classes and shared base objects.
Order of construction and destruction, the state of virtual functions in the middle of constructing an intermediate base class.
Cast safety and variable sizes. No, you can't assume that sizeof(void *) == sizeof(int) (or any other type for that matter, unless a portable header specifically guarantees it) in portable code.
Pointer arithmetic.
Headers and implementation files
This is also a concept misunderstood by many. Questions like what goes into header files and why it causes link errors if function definitions appear multiple times in a program on the one side but not when class definitions appear multiple times on the other side.
Very similar to those questions is why it is important to have header guards.
If a function accepts a pointer to a pointer, void* will still do it
I've seen that the concept of a void pointer is frequently confused. It's believed that if you have a pointer, you use a void*, and if you have a pointer to a pointer, you use a void**. But you can and should in both cases use void*. A void** does not have the special properties that a void* has.
It's the special property that a void* can also be assigned a pointer to a pointer and when cast back the original value is received.
I think the most misunderstood concept about C++ is why it exists and what its purpose is. Its often under fire from above (Java, C# etc.) and from below (C). C++ has the ability to operate close to the machine to deal with computational complexity and abstraction mechanisms to manage domain complexity.
NULL is always zero.
Many confuse NULL with an address, and think therefor it's not necessarily zero if the platform has a different null pointer address.
But NULL is always zero and it is not an address. It's an zero constant integer expression that can be converted to pointer types.
Memory Alignment.
std::vector does not create elements when reserve is used
I've seen it that programmers argue that they can access members at positions greater than what size() returns if they reserve()'ed up to that positions. That's a wrong assumption but is very common among programmers - especially because it's quite hard for the compiler to diagnose a mistake, which will silently make things "work".
Hehe, this is a silly reply: the most misunderstood thing in C++ programming is the error messages from g++ when template classes fail to compile!
C++ is not C with string and vector!
C structs VS C++ structs is often misunderstood.
C++ is not a typical object oriented language.
Don't believe me? look at the STL, way more templates than objects.
It's almost impossible to use Java/C# ways of writing object oriented code; it simply doesn't work.
In Java/C# programming, there's alot of newing, lots of utility objects that implement some single cohesive functionality.
In C++, any object newed must be deleted, but there's always the problem of who owns the object
As a result, objects tend to be created on the stack
But when you do that, you have to copy them around all the time if you're going to pass them around to other functions/objects, thus wasting a lot of performance that is said to be achieved with the unmanaged environment of C++
Upon realizing that, you have to think about other ways of organizing your code
You might end up doing things the procedural way, or using metaprogramming idioms like smart pointers
At this point, you've realized that OO in C++ cannot be used the same way as it is used in Java/C#
Q.E.D.
If you insist on doing oop with pointers, you'll usually have large (gigantic!) classes, with clearly defined ownership relationships between objects to avoid memory leaks. And then even if you do that, you're already too far from the Java/C# idiom of oop.
Actually I made up the term "object-oriented", and I can tell you I did not have C++ in mind.
-- Alan Kay (click the link, it's a video, the quote is at 10:33)
Although from a purist point of view (e.g. Alan Kay), even Java and C# fall short of true oop
A pointer is an iterator, but an iterator is not always a pointer
This is also an often misunderstood concept. A pointer to an object is a random access iterator: It can be incremented/decremented by an arbitrary amount of elements and can be read and written. However, an iterator class that has operator overloads doing that fulfill those requirements too. So it is also an iterator but is of course not a pointer.
I remember one of my past C++ teachers was teaching (wrongly) that you get a pointer to an element of a vector if you do vec.begin(). He was actually assuming - without knowing - that the vector implements its iterators using pointers.
That anonymous namespaces are almost always what is truly wanted when people are making static variables in C++
When making library header files, the pimpl idiom (http://www.gotw.ca/gotw/024.htm) should be used for almost all private functions and members to aid in dependency management
I still don't get why vector doesn't have a pop_front and the fact that I can't sort(list.begin(), list.end())..