Why is string not a (subclass of) vector? - c++

C strings are char arrays.
Vectors are the new arrays in C++.
Why aren't strings vectors (of chars) then?
Most of the methods of vectors and strings seem duplicated too. Is there a reason for making strings a different thing in C++?

It's pretty much just historical. Strings and vectors were developed in parallel with little thought going to how they could be considered one and the same, for T==char.
That's also why standard containers are nice and generic, whereas std::basic_string is a total monolith of member function after member function.
Edge case optimisation opportunities since made it difficult or impossible to transform std::basic_string<T, Alloc> into std::vector<T, Alloc> in any sort of standard way. Take the small string optimisation, for example. Although, now that GCC's copy-on-write mechanism is officially dead, we're a little closer.
The ability to legally dereference std::string::end() (and obtain '\0' for your trouble) is still problematic, though. A bunch of fairly stringent iterator invalidation rules for .c_str() basically prevent us from using std::vector<char> for this right from the start.
tl;dr: this is what happens when you create a camel

The various answers in Vector vs string show several differences in interfaces between vector and string, and since the typical pattern in the standard is to use static polymorphism rather than dynamic, they were created as two different classes.
Since strings do have different characteristics from vectors it doesn't seem that you would want to use public inheritance but I don't think there's anything in the standard that would prohibit protected or private inheritance, or composition to provide the underlying space management.
Additionally I suspect that string may have been developed earlier and orthogonally to vector which likely explains why there are member methods that more likely would have been made free algorithms if developed in parallel to vector.

Yes because vector is a container which can contain T and std::string is a wrapper around char* and can't contain int or other datatypes. Wouldn't make any sense to have it any other way.

Related

What is string_view?

string_view was a proposed feature within the C++ Library Fundamentals TS(N3921) added to C++17
As far as i understand it is a type that represent some kind of string "concept" that is a view of any type of container that could store something viewable as a string.
Is this right ?
Should the canonical
const std::string& parameter type become string_view ?
Is there another important point about string_view to take into consideration ?
The purpose of any and all kinds of "string reference" and "array reference" proposals is to avoid copying data which is already owned somewhere else and of which only a non-mutating view is required. The string_view in question is one such proposal; there were earlier ones called string_ref and array_ref, too.
The idea is always to store a pair of pointer-to-first-element and size of some existing data array or string.
Such a view-handle class could be passed around cheaply by value and would offer cheap substringing operations (which can be implemented as simple pointer increments and size adjustments).
Many uses of strings don't require actual owning of the strings, and the string in question will often already be owned by someone else. So there is a genuine potential for increasing the efficiency by avoiding unneeded copies (think of all the allocations and exceptions you can save).
The original C strings were suffering from the problem that the null terminator was part of the string APIs, and so you couldn't easily create substrings without mutating the underlying string (a la strtok). In C++, this is easily solved by storing the length separately and wrapping the pointer and the size into one class.
The one major obstacle and divergence from the C++ standard library philosophy that I can think of is that such "referential view" classes have completely different ownership semantics from the rest of the standard library. Basically, everything else in the standard library is unconditionally safe and correct (if it compiles, it's correct). With reference classes like this, that's no longer true. The correctness of your program depends on the ambient code that uses these classes. So that's harder to check and to teach.
(Educating myself in 2021)
From Microsoft's <string_view>:
The string_view family of template specializations provides an efficient way to pass a read-only, exception-safe, non-owning handle to the character data of any string-like objects with the first element of the sequence at position zero. (...)
From Microsoft's C++ Team Blog std::string_view: The Duct Tape of String Types from August 21st, 2018 (retrieved 2021 Apr 01):
string_view solves the “every platform and library has its own string type” problem for parameters. It can bind to any sequence of characters, so you can just write your function as accepting a string view:
void f(wstring_view); // string_view that uses wchar_t's
and call it without caring what stringlike type the calling code is using (and > for (char*, length) argument pairs just add {} around them)
(...)
(...)
Today, the most common “lowest common denominator” used to pass string data around is the null-terminated string (or as the standard calls it, the Null-Terminated Character Type Sequence). This has been with us since long before C++, and provides clean “flat C” interoperability. However, char* and its support library are associated with exploitable code, because length information is an in-band property of the data and susceptible to tampering. Moreover, the null used to delimit the length prohibits embedded nulls and causes one of the most common string operations, asking for the length, to be linear in the length of the string.
(...)
Each programming domain makes up their own new string type, lifetime semantics, and interface, but a lot of text processing code out there doesn’t care about that. Allocating entire copies of the data to process just to make differing string types happy is suboptimal for performance and reliability.

Is it bad design for a class to give access to its data (via ptr/it) when this data can be deleted before the class object is out of scope?

Classic example is iterator invalidation :
std::string test("A");
auto it = test.insert(test.begin()+1,'B');
test.erase();
...
std::cout << *it;
Do you think having this kind of API is bad design, and will be difficult to learn/use for beginners ?
A costly, performance/memory wise, solution would be, in that type of case, to assign the pointer/iterator to an empty string (or a nullptr, but that's not very helpful) when a clear method is used.
Some precisions
I'm thinking of this design for returning const chars* that can be modified internally (maybe they're stored in a std::vector that can be cleared). I don't want to return a std::string (binary compatibility) and I don't want a get(char*,std::size_t) method because of the size argument that needs to be fetched (too slow). Also I don't want to create a wrapper around std::string or my own string class.
I would recommend reading up on Stepanov's design philosophy (pages 9-11):
[This example] is written in a clear object-oriented style with getters and setters. The proponents of this style say that the advantage of having such functions is that it allows programmers later on to change the implementation. What they forget to mention is that sometimes it is awfully good to expose the implementation. Let us see what I mean. It is hard for me to imagine an evolution of a system that would let you keep the interface of get and set, but be able to change the implementation. I could imagine that the implementation outgrows int and you need to switch to long. But that is a different interface. I can imagine that you decide to switch from an array to a list but that also will force you to change the interface, since it is really not a very good idea to index into a linked list.
Now let us see why it is really good to expose the implementation. Let us assume that tomorrow you decide to sort your integers. How can you do it? Could you use the C library qsort? No, since it knows nothing about your getters and setters. Could you use the STL sort? The answer is the same. While you design your class to survive some hypothetical change in the implementation, you did not design it for the very common task of sorting. Of course, the proponents of getters and setters will suggest that you extend your interface with a member function sort. After you do that, you will discover that you need binary search and median, etc. Very soon your class will have 30 member functions but, of course, it will be hiding the implementation. And that could be done only if you are the owner of the class. Otherwise, you need to implement a decent sorting algorithm on top of the setter-getter interface from scratch and that is a far more difficult and dangerous activity than one can imagine. ...
Setters and getters make our daily programming hard but promise huge rewards in the future when we discover better ways to store arrays of integers in memory. But I do not know a single realistic scenario when hiding memory locations inside our data structure helps and exposure hurts; it is, therefore, my obligation to expose a much more convenient interface that also happens to be consistent with the familiar interface to the C arrays. When we program in C++ we should not be ashamed of its C heritage, but make full use of it. The only problems with C++, and even the only problems with C, arise when they themselves are not consistent with their own logic. ...
My remark about exposing the address locations of consecutive integers is not facetious.
It took a major effort to convince the standard committee that such a requirement is an
essential property of vectors; they would not, however, agree that vector iterators should
be pointers and, therefore, on several major platforms – including the Microsoft one – it
is faster to sort your vector by saying the unbelievably ugly
if (!v.empty()) {
sort(&*v.begin(), &*v.begin() + v.size());
}
than the intended
sort(v.begin(), v.end());
Attempts to impose pseudo-abstractness at the cost of efficiency can be defeated, but at a terrible cost.
Stepanov has a lot of other interesting documents available, especially in the "Class Notes" section.
Yes, there are several rules of thumb regarding OOP. No, I'm not convinced that they are really the best way to do things. When you're working with the STL it makes a lot of sense to do things the STL compatible way. And when your abstraction is low level (like std::vector, which is meant specifically to make working with dynamically allocated arrays easier; i.e., it should be usable almost like an array with some added features), then some of those OOP rules of thumb make no sense at all.
To answer the original question: even beginners will eventually need to learn about iterators, object lifetimes, and what I'll call an object's useful life (i.e., "the object hasn't fallen out of scope, but is no longer valid to use, like an invalidated iterator"). I don't see any reason to try to hide those facts of life from the user, so I personally wouldn't rule out an iterator-based API on those grounds. The real question is what your API is meant to abstract and what's it's meant to expose (similar to the fact that a vector is a nicer array and is meant to expose its array nature). If you answer that, you should have a better idea about whether an iterator-based API makes sense.
As Scott Meyers states in Effective C++: yes it is indeed not a good design to grant access to private/protected members via pointers, iterators or references because you never know what the client code will do with it.
As far as I can remember this should be avoided, and it is sometimes better to create a copy of data members which are then returned to the caller.
It is a bad or faulty implementation rather than design.
As for providing access to private or protected members through pointers, basically it destroys one of the basic OOP principle of Abstraction.
I am unsure though as to what the question is, Yes ofcourse it is bad to have implementation which invalidates iterator. What is the real Q here?

immutable strings vs std::string

I've recent been reading about immutable strings Why can't strings be mutable in Java and .NET? and Why .NET String is immutable? as well some stuff about why D chose immutable strings. There seem to be many advantages.
trivially thread safe
more secure
more memory efficient in most use cases.
cheap substrings (tokenizing and slicing)
Not to mention most new languages have immutable strings, D2.0, Java, C#, Python, etc.
Would C++ benefit from immutable strings?
Is it possible to implement an immutable string class in c++ (or c++0x) that would have all of these advantages?
update:
There are two attempts at immutable strings const_string and fix_str. Neither have been updated in half a decade. Are they even used? Why didn't const_string ever make it into boost?
I found most people in this thread do not really understand what immutable_string is. It is not only about the constness. The really power of immutable_string is the performance (even in single thread program) and the memory usage.
Imagine that, if all strings are immutable, and all string are implemented like
class string {
char* _head ;
size_t _len ;
} ;
How can we implement a sub-str operation? We don't need to copy any char. All we have to do is assign the _head and the _len. Then the sub-string shares the same memory segment with the source string.
Of course we can not really implement a immutable_string only with the two data members. The real implementation might need a reference-counted(or fly-weighted) memory block. Like this
class immutable_string {
boost::fly_weight<std::string> _s ;
char* _head ;
size_t _len ;
} ;
Both the memory and the performance would be better than the traditional string in most cases, especially when you know what you are doing.
Of course C++ can benefit from immutable string, and it is nice to have one. I have checked the boost::const_string and the fix_str mentioned by Cubbi. Those should be what I am talking about.
As an opinion:
Yes, I'd quite like an immutable string library for C++.
No, I would not like std::string to be immutable.
Is it really worth doing (as a standard library feature)? I would say not. The use of const gives you locally immutable strings, and the basic nature of systems programming languages means that you really do need mutable strings.
My conclusion is that C++ does not require the immutable pattern because it has const semantics.
In Java, if you have a Person class and you return the String name of the person with the getName() method, your only protection is the immutable pattern. If it would not be there you would have to clone() your strings all night and day (as you have to do with data members that are not typical value-objects, but still needs to be protected).
In C++ you have const std::string& getName() const. So you can write SomeFunction(person.getName()) where it is like void SomeFunction(const std::string& subject).
No copy happened
If anyone wants to copy he is free to do so
Technique applies to all data types, not just strings
You're certainly not the only person who though that. In fact, there is const_string library by Maxim Yegorushkin, which seems to have been written with inclusion into boost in mind. And here's a little newer library, fix_str by Roland Pibinger. I'm not sure how tricky would full string interning at run-time be, but most of the advantages are achievable when necessary.
I don't think there's a definitive answer here. It's subjective—if not because personal taste then at least because of the type of code one most often deals with. (Still, a valuable question.)
Immutable strings are great when memory is cheap—this wasn't true when C++ was developed, and it isn't the case on all platforms targeted by C++. (OTOH on more limited platforms C seems much more common than C++, so that argument is weak.)
You can create an immutable string class in C++, and you can make it largely compatible with std::string—but you will still lose when comparing to a built-in string class with dedicated optimizations and language features.
std::string is the best standard string we get, so I wouldn't like to see any messing with it. I use it very rarely, though; std::string has too many drawbacks from my point of view.
const std::string
There you go. A string literal is also immutable, unless you want to get into undefined behavior.
Edit: Of course that's only half the story. A const string variable isn't useful because you can't make it reference a new string. A reference to a const string would do it, except that C++ won't allow you to reassign a reference as in other languages like Python. The closest thing would be a smart pointer to a dynamically allocated string.
Immutable strings are great if, whenever it's necessary to create a new a string, the memory manager will always be able to determine determine the whereabouts of every string reference. On most platforms, language support for such ability could be provided at relatively modest cost, but on platforms without such language support built in it's much harder.
If, for example, one wanted to design a Pascal implementation on x86 that supported immutable strings, it would be necessary for the string allocator to be able to walk the stack to find all string references; the only execution-time cost of that would be requiring a consistent function-call approach [e.g. not using tail calls, and having every non-leaf function maintain a frame pointer]. Each memory area allocated with new would need to have a bit to indicate whether it contained any strings and those that do contain strings would need to have an index to a memory-layout descriptor, but those costs would be pretty slight.
If a GC wasn't table to walk the stack, then it would be necessary to have code use handles rather than pointers, and have code create string handles when local variables come into scope, and destroy the handles when they go out of scope. Much greater overhead.
Qt also uses immutable strings with copy-on-write.
There is some debate about how much performance it really buys you with decent compilers.
constant strings make little sense with value semantics, and sharing isn't one of C++'s greatest strengths...
Strings are mutable in Ruby.
$ irb
>> foo="hello"
=> "hello"
>> bar=foo
=> "hello"
>> foo << "world"
=> "helloworld"
>> print bar
helloworld=> nil
trivially thread safe
I would tend to forget safety arguments. If you want to be thread-safe, lock it, or don't touch it. C++ is not a convenient language, have your own conventions.
more secure
No. As soon as you have pointer arithmetics and unprotected access to the address space, forget about being secure. Safer against innocently bad coding, yes.
more memory efficient in most use cases.
Unless you implement CPU-intensive mechanisms, I don't see how.
cheap substrings (tokenizing and slicing)
That would be one very good point. Could be done by referring to a string with backreferences, where modifications to a string would cause a copy. Tokenizing and slicing become free, mutations become expensive.
C++ strings are thread safe, all immutable objects are guaranteed to be thread safe but Java's StringBuffer is mutable like C++ string is and the both of them are thread safe. Why worry about speed, define your method or function parameters with the const keyword to tell the compiler the string will be immutable in that scope. Also if string object is immutable on demand, waiting when you absolutely need to use the string, in other words, when you append other strings to the main string, you have a list of strings until you actually need the whole string then they are joined together at that point.
immutable and mutable object operate at the same speed to my knowledge , except their methods which is a matter of pro and cons. constant primitives and variable primitives move at different speeds because at the machine level, variables are assigned to a register or a memory space which require a few binary operations, while constants are labels that don't require any of those and are thus faster (or less work is done). works only for primitives and not for object.

How is std::string implemented?

I am curious to know how std::string is implemented and how does it differ from c string?If the standard does not specify any implementation then any implementation with explanation would be great with how it satisfies the string requirement given by standard?
Virtually every compiler I've used provides source code for the runtime - so whether you're using GCC or MSVC or whatever, you have the capability to look at the implementation. However, a large part or all of std::string will be implemented as template code, which can make for very difficult reading.
Scott Meyer's book, Effective STL, has a chapter on std::string implementations that's a decent overview of the common variations: "Item 15: Be aware of variations in string implementations".
He talks about 4 variations:
several variations on a ref-counted implementation (commonly known as copy on write) - when a string object is copied unchanged, the refcount is incremented but the actual string data is not. Both object point to the same refcounted data until one of the objects modifies it, causing a 'copy on write' of the data. The variations are in where things like the refcount, locks etc are stored.
a "short string optimization" (SSO) implementation. In this variant, the object contains the usual pointer to data, length, size of the dynamically allocated buffer, etc. But if the string is short enough, it will use that area to hold the string instead of dynamically allocating a buffer
Also, Herb Sutter's "More Exceptional C++" has an appendix (Appendix A: "Optimizations that aren't (in a Multithreaded World)") that discusses why copy on write refcounted implementations often have performance problems in multithreaded applications due to synchronization issues. That article is also available online (but I'm not sure if it's exactly the same as what's in the book):
http://www.gotw.ca/publications/optimizations.htm
Both those chapters would be worthwhile reading.
std::string is a class that wraps around some kind of internal buffer and provides methods for manipulating that buffer.
A string in C is just an array of characters
Explaining all the nuances of how std::string works here would take too long. Maybe have a look at the gcc source code http://gcc.gnu.org to see exactly how they do it.
There's an example implementation in an answer on this page.
In addition, you can look at gcc's implementation, assuming you have gcc installed. If not, you can access their source code via SVN. Most of std::string is implemented by basic_string, so start there.
Another possible source of info is Watcom's compiler
The c++ solution for strings are quite different from the c-version. The first and most important difference is while the c using the ASCIIZ solution, the std::string and std::wstring are using two iterators (pointers) to store the actual string. The basic usage of the string classes provides a dynamic allocated solution, so in the cost of CPU overhead with the dynamic memory handling it makes the string handling more comfortable.
As you probably already know, the C doesn't contain any built-in generic string type, only provides couple of string operations through the standard library. One of the major difference between C and C++ that the C++ provides a wrapped functionality, so it can be considered as a faked generic type.
In C you need to walk through the string if you would like to know the length of it, the std::string::size() member function is only one instruction (end - begin) basically. You can safely append strings one to an other as long as you have memory, so there is no need to worry about the buffer overflow bugs (and therefore the exploits), because the appending creates a bigger buffer if it is needed.
As somebody told here before, the string is derivated from the vector functionality, in a templated way, so it makes easier to deal with the multibyte-character systems. You can define your own string type using the typedef std::basic_string specific_str_t; expression with any arbitary data type in the template parameter.
I think there are enough pros and contras both side:
C++ string Pros:
- Faster iteration in certain cases (using the size definitely, and it doesn't need the data from the memory to check if you are at the end of the string, comparing two pointers. that could make a difference with the caching)
- The buffer operation are packed with the string functionality, so less worries about the buffer problems.
C++ string Cons:
- due to the dynamic memory allocation stuff, the basic usage could cause impact on the performance. (fortunately you can tell to the string object what should be the original buffer size, so unless you are exceed it, it won't allocate dynamic blocks from the memory)
- often weird and inconsistent names compared to other languages. this is the bad thing about any stl stuff, but you can use to it, and it makes a bit specific C++ish feeling.
- the heavy usage of the templating forces the standard library to use header based solutions so it is a big impact on the compiling time.
That depends on the standard library you use.
STLPort for example is a C++ Standard Library implementation which implements strings among other things.

What are the often misunderstood concepts in C++? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are the often misunderstood concepts in c++?
C++ is not C with classes!
And there is no language called C/C++. Everything goes downhill from there.
That C++ does have automatic resource management.
(Most people who claim that C++ does not have memory management try to use new and delete way too much, not realising that if they allowed C++ to manage the resource themselves, the task gets much easier).
Example: (Made with a made up API because I do not have time to check the docs now)
// C++
void DoSomething()
{
File file("/tmp/dosomething", "rb");
... do stuff with file...
// file is automatically free'ed and closed.
}
// C#
public void DoSomething()
{
File file = new File("/tmp/dosomething", "rb");
... do stuff with file...
// file is NOT automatically closed.
// What if the caller calls DoSomething() in a tight loop?
// C# requires you to be aware of the implementation of the File class
// and forces you to accommodate, thus voiding implementation-hiding
// principles.
// Approaches may include:
// 1) Utilizing the IDisposable pattern.
// 2) Utilizing try-finally guards, which quickly gets messy.
// 3) The nagging doubt that you've forgotten something /somewhere/ in your
// 1 million loc project.
// 4) The realization that point #3 can not be fixed by fixing the File
// class.
}
Free functions are not bad just because they are not within a class C++ is not an OOP language alone, but builds upon a whole stack of techniques.
I've heard it many times when people say free functions (those in namespaces and global namespace) are a "relict of C times" and should be avoided. Quite the opposite is true. Free functions allow to decouple functions from specific classes and allow reuse of functionality. It's also recommended to use free functions instead of member functions if the function don't need access to implementation details - because this will eliminate cascading changes when one changes the implementation of a class among other advantages.
This is also reflected in the language: The range-based for loop in C++0x (next C++ version released very soon) will be based on free function calls. It will get begin / end iterators by calling the free functions begin and end.
The difference between assignment and initialisation:
string s = "foo"; // initialisation
s = "bar"; // assignment
Initialisation always uses constructors, assignment always uses operator=
In decreasing order:
make sure to release pointers for allocated memory
when destructors should be virtual
how virtual functions work
Interestingly not many people know the full details of virtual functions, but still seem to be ok with getting work done.
The most pernicious concept I've seen is that it should be treated as C with some addons. In fact, with modern C++ systems, it should be treated as a different language, and most of the C++-bashing I see is based on the "C with add-ons" model.
To mention some issues:
While you probably need to know the difference between delete and delete[], you should normally be writing neither. Use smart pointers and std::vector<>.
In fact, you should be using a * only rarely. Use std::string for strings. (Yes, it's badly designed. Use it anyway.)
RAII means you don't generally have to write clean-up code. Clean-up code is bad style, and destroys conceptual locality. As a bonus, using RAII (including smart pointers) gives you a lot of basic exception safety for free. Overall, it's much better than garbage collection in some ways.
In general, class data members shouldn't be directly visible, either by being public or by having getters and setters. There are exceptions (such as x and y in a point class), but they are exceptions, and should be considered as such.
And the big one: there is no such language as C/C++. It is possible to write programs that can compile properly under either language, but such programs are not good C++ and are not normally good C. The languages have been diverging since Stroustrup started working on "C with Classes", and are less similar now than ever. Using "C/C++" as a language name is prima facie evidence that the user doesn't know what he or she is talking about. C++, properly used, is no more like C than Java or C# are.
The overuse of inheritance unrelated to polymorphism. Most of the time, unless you really do use runtime polymorphism, composition or static polymorphism (i.e., templates) is better.
The static keyword which can mean one of three distinct things depending on where it is used.
It can be a static member function or member variable.
It can be a static variable or function declared at namespace scope.
It can be a static variable declared inside a function.
Arrays are not pointers
They are different. So &array is not a pointer to a pointer, but a pointer to an array. This is the most misunderstood concept in both C and C++ in my opinion. You gotta have a visit to all those SO answers that tell to pass 2-d arrays as type** !
Here is an important concept in C++ that is often forgotten:
C++ should not be simply used like an object
oriented language such as Java or C#.
Inspire yourself from the STL and write generic code.
Here are some:
Using templates to implement polymorphism without vtables, à la ATL.
Logical const-ness vs actual const-ness in memory. When to use the mutable keyword.
ACKNOWLEDGEMENT: Thanks for correcting my mistake, spoulson.
EDIT:
Here are more:
Virtual inheritance (not virtual methods): In fact, I don't understand it at all! (by that, I mean I don't know how it's implemented)
Unions whose members are objects whose respective classes have non-trivial constructors.
Given this:
int x = sizeof(char);
what value is X?
The answer you often hear is dependant on the level of understanding of the specification.
Beginner - x is one because chars are always eight bit values.
Intermediate - it depends on the compiler implementation, chars could be UTF16 format.
Expert - x is one and always will be one since a char is the smallest addressable unit of memory and sizeof determines the number of units of memory required to store an instance of the type. So in a system where a char is eight bits, a 32 bit value will have a sizeof of 4; but in a system where a char is 16 bits, a 32 bit value will have a sizeof of 2.
It's unfortunate that the standard uses 'byte' to refer to a unit of memory since many programmers think of 'byte' as being eight bits.
C++ is a multi-paradigm language. Many people associate C++ strictly with OOP.
a classic among beginners to c++ from c:
confuse delete and delete[]
EDIT:
another classic failure among all levels of experience when using C API:
std::string helloString = "hello world";
printf("%s\n", helloString);
instead of:
printf("%s\n", helloString.c_str());
it happens to me every week. You could use streams, but sometimes you have to deal with printf-like APIs.
Pointers.
Dereferencing the pointers. Through either . or ->
Address of using & for when a pointer is required.
Functions that take params by reference by specifing a & in the signature.
Pointer to pointers to pointers *** or pointers by reference void someFunc(int *& arg)
There are a few things that people seem to be constantly confused by or have no idea about:
Pointers, especially function pointers and multiple pointers (e.g. int(*)(void*), void***)
The const keyword and const correctness (e.g. what is the difference between const char*, char* const and const char* const, and what does void class::member() const; mean?)
Memory allocation (e.g. every pointer new'ed should be deleted, malloc/free should not be mixed with new/delete, when to use delete [] instead of delete, why the C functions are still useful (e.g. expand(), realloc()))
Scope (i.e. that you can use { } on its own to create a new scope for variable names, rather than just as part of if, for etc...)
Switch statements. (e.g. not understanding that they can optimise as well (or better in some cases) than chains of ifs, not understanding fall through and its practical applications (loop unrolling as an example) or that there is a default case)
Calling conventions (e.g. what is the difference between cdecl and stdcall, how would you implement a pascal function, why does it even matter?)
Inheritance and multiple inheritance and, more generally, the entire OO paradigm.
Inline assembler, as it is usually implemented, is not part of C++.
Pointers to members and pointers to member functions.
Non-type template parameters.
Multiple inheritance, particularly virtual base classes and shared base objects.
Order of construction and destruction, the state of virtual functions in the middle of constructing an intermediate base class.
Cast safety and variable sizes. No, you can't assume that sizeof(void *) == sizeof(int) (or any other type for that matter, unless a portable header specifically guarantees it) in portable code.
Pointer arithmetic.
Headers and implementation files
This is also a concept misunderstood by many. Questions like what goes into header files and why it causes link errors if function definitions appear multiple times in a program on the one side but not when class definitions appear multiple times on the other side.
Very similar to those questions is why it is important to have header guards.
If a function accepts a pointer to a pointer, void* will still do it
I've seen that the concept of a void pointer is frequently confused. It's believed that if you have a pointer, you use a void*, and if you have a pointer to a pointer, you use a void**. But you can and should in both cases use void*. A void** does not have the special properties that a void* has.
It's the special property that a void* can also be assigned a pointer to a pointer and when cast back the original value is received.
I think the most misunderstood concept about C++ is why it exists and what its purpose is. Its often under fire from above (Java, C# etc.) and from below (C). C++ has the ability to operate close to the machine to deal with computational complexity and abstraction mechanisms to manage domain complexity.
NULL is always zero.
Many confuse NULL with an address, and think therefor it's not necessarily zero if the platform has a different null pointer address.
But NULL is always zero and it is not an address. It's an zero constant integer expression that can be converted to pointer types.
Memory Alignment.
std::vector does not create elements when reserve is used
I've seen it that programmers argue that they can access members at positions greater than what size() returns if they reserve()'ed up to that positions. That's a wrong assumption but is very common among programmers - especially because it's quite hard for the compiler to diagnose a mistake, which will silently make things "work".
Hehe, this is a silly reply: the most misunderstood thing in C++ programming is the error messages from g++ when template classes fail to compile!
C++ is not C with string and vector!
C structs VS C++ structs is often misunderstood.
C++ is not a typical object oriented language.
Don't believe me? look at the STL, way more templates than objects.
It's almost impossible to use Java/C# ways of writing object oriented code; it simply doesn't work.
In Java/C# programming, there's alot of newing, lots of utility objects that implement some single cohesive functionality.
In C++, any object newed must be deleted, but there's always the problem of who owns the object
As a result, objects tend to be created on the stack
But when you do that, you have to copy them around all the time if you're going to pass them around to other functions/objects, thus wasting a lot of performance that is said to be achieved with the unmanaged environment of C++
Upon realizing that, you have to think about other ways of organizing your code
You might end up doing things the procedural way, or using metaprogramming idioms like smart pointers
At this point, you've realized that OO in C++ cannot be used the same way as it is used in Java/C#
Q.E.D.
If you insist on doing oop with pointers, you'll usually have large (gigantic!) classes, with clearly defined ownership relationships between objects to avoid memory leaks. And then even if you do that, you're already too far from the Java/C# idiom of oop.
Actually I made up the term "object-oriented", and I can tell you I did not have C++ in mind.
-- Alan Kay (click the link, it's a video, the quote is at 10:33)
Although from a purist point of view (e.g. Alan Kay), even Java and C# fall short of true oop
A pointer is an iterator, but an iterator is not always a pointer
This is also an often misunderstood concept. A pointer to an object is a random access iterator: It can be incremented/decremented by an arbitrary amount of elements and can be read and written. However, an iterator class that has operator overloads doing that fulfill those requirements too. So it is also an iterator but is of course not a pointer.
I remember one of my past C++ teachers was teaching (wrongly) that you get a pointer to an element of a vector if you do vec.begin(). He was actually assuming - without knowing - that the vector implements its iterators using pointers.
That anonymous namespaces are almost always what is truly wanted when people are making static variables in C++
When making library header files, the pimpl idiom (http://www.gotw.ca/gotw/024.htm) should be used for almost all private functions and members to aid in dependency management
I still don't get why vector doesn't have a pop_front and the fact that I can't sort(list.begin(), list.end())..