c++ stl library what are these methods with underscores?

c++ stl library what are these methods with underscores? - c++

I have some methods which are not referenced in any of the cpp references over the internet
For example in "memory"
Shared_ptr's have a method called "_Expired"
It returns a boolean if the ptr is expired or not
I thought that only weakptr's have this...

They are internal functions that are part of the implementation. By having a name that starts with _ and an uppercase letter, they are defined as "implementation specific". Normal code should not use _ + uppercase names, so it's "safe" to for the implementation to use such names.
Note that there is nothing "meaningful" that you can get from these types of methods, member variables, etc, because it's part of the implementation, which will be different in a different system or using a different compiler, and even between different versions of the same STL implementation.
Exactly why a particular implementation is the way it is will be up to the designer of that implementation. Maybe they are sharing shared pointers implementation with weak pointer?
(The STL for g++ 4.6.3 doesn't have this particular construct!)

There are many functions in STL methods/classes named starting with an underscore followed by an uppercase letter (i.g. _Expired()). They are implemented mainly for internal use and are hidden to high level programmers.

Related

Mentality behind GNU _M_ prefixing

If we take a look at GNU's implementation of libstdc++, I've noticed that in the implementations of the standard classes that private member functions of various classes are prefixed with _M_. For example, std::basic_string<> has among others a member called bool _M_is_shared() const;.
I understand the motivation to have some sort of naming convention for private member variables. This helps is distinguishing between class members and function local variables visually. But I don't get why the _M_ prefix is preferred for private member functions.
If I see some code that called for example: is_shared(); there is essentially only a few options:
it's a member function of this class
it's a member function of a parent class
it's a global function.
The first two, would both have the prefix so it's no help. The last one wont happen in any sane implementation because of namespace pollution concerns. The only globals the library should introduce are ones prescribed by the standard. So here's the crux of the question...
Since private member functions aren't publicly accessible. Can't effect derived classes in any way. I don't think that name collisions are really a concern here... and basically these are nothing more than a private implementation detail. Why bother with the (IMO) ugly _M_ prefixing? Is there something in the standard that disallows introducing extra private members? If so that would strike me as silly unless there is something I am missing.

Identifiers beginning with an underscore and then a capital letter, or beginning with two underscores, are "reserved for the implementation in all contexts".
This means it would be illegal according to the Standard for someone's program to #define _M_is_shared false and break the library header file. If they used more ordinary identifiers, there would be greater risk of such a name collision in otherwise valid programs.

The standard specifies that names starting with double underscores, or an underscore followed by a capital letter, are reserved for internal compiler or library symbols.
You pointed out that these private symbols won't be accessible, but don't forget about macros and defines. Basically, the idea is as simple as "Let's replace m_member with _M_member".
The relevant part of the standard is 17.4.3.1.2 Global names
Each name the contains a double underscore __ or begins with an
underscore followed by an uppercase letter is reserved to the
implementation for any use.

Safely use containers in C++ library interface

When designing a C++ library, I read it is bad practice to include standard library containers like std::vector in the public interface (see e.g. Implications of using std::vector in a dll exported function).
What if I want to expose a function that takes or returns a list of objects? I could use a simple array, but then I would have to add a count parameter, which makes the interface more cumbersome and less safe. Also it wouldn't help much if I wanted to use a map, for example. I guess libraries like Qt define their own containers which are safe to export, but I'd rather not add Qt as a dependency, and I don't want to roll my own containers.
What's the best practice to deal with containers in the library interface? Is there maybe a tiny container implementation (preferably just one or two files I can drop in, with a permissive license) that I can use as "glue"? Or is there even a way to make std::vector etc. safe across .DLL/.so boundaries and with different compilers?

You can implement a template function. This has two advantages:
It lets your users decide what sorts of containers they want to use with your interface.
It frees you from having to worry about ABI compatibility, because there is no code in your library, it will be instantiated when the user invokes the function.
For example, put this in your header file:
template <typename Iterator>
void foo(Iterator begin, Iterator end)
{
for (Iterator it = begin; it != end; ++it)
bar(*it); // a function in your library, whose ABI doesn't depend on any container
}
Then your users can invoke foo with any container type, even ones they invented that you don't know about.
One downside is that you'll need to expose the implementation code, at least for foo.
Edit: you also said you might want to return a container. Consider alternatives like a callback function, as in the gold old days in C:
typedef bool(*Callback)(int value, void* userData);
void getElements(Callback cb, void* userData) // implementation in .cpp file, not header
{
for (int value : internalContainer)
if (!cb(value, userData))
break;
}
That's a pretty old school "C" way, but it gives you a stable interface and is pretty usable by basically any caller (even actual C code with minor changes). The two quirks are the void* userData to let the user jam some context in there (say if they want to invoke a member function) and the bool return type to let the callback tell you to stop. You can make the callback a lot fancier with std::function or whatever, but that might defeat some of your other goals.

Actually this is not only true for STL containers but applies to pretty much any C++ type (in particular also all other standard library types).
Since the ABI is not standardized you can run into all kinds of trouble. Usually you have to provide separate binaries for each supported compiler version to make it work. The only way to get a truly portable DLL is to stick with a plain C interface. This usually leads to something like COM, since you have to ensure that all allocations and matching deallocations happen in the same module and that no details of the actual object layout are exposed to the user.

TL;DR There is no issue if you distribute either the source code or compiled binaries for the various supported sets of (ABI + Standard Library implementation).
In general, the latter is seen as cumbersome (with reasons), thus the guideline.
I trust hand-waving guidelines about as far as I can throw them... and I encourage you to do the same.
This guidelines originates from an issue with ABI compatibility: the ABI is a complex set of specifications that defines the exact interface of a compiled library. It is includes notably:
the memory layout of structures
the name mangling of functions
the calling conventions of functions
the handling of exception, runtime type information, ...
...
For more details, check for example the Itanium ABI. Contrary to C which has a very simple ABI, C++ has a much more complicated surface area... and therefore many different ABIs were created for it.
On top of ABI compatibility, there is also an issue with Standard Library Implementation. Most compilers come with their own implementation of the Standard Library, and these implementations are incompatible with each others (they do not, for example, represent a std::vector the same way, even though all implement the same interface and guarantees).
As a result, a compiled binary (executable or library) may only be mixed and matched with another compiled binary if both were compiled against the same ABI and with compatible versions of a Standard Library implementation.
Cheers: no issue if you distribute source code and let the client compile.

If you are using C++11, you can use cppcomponents. https://github.com/jbandela/cppcomponents
This will allow you to use among other things std::vector as a parameter or return value across Dll/or .so files created using different compilers or standard libraries. Take a look at my answer to a similar question for an example Passing reference to STL vector over dll boundary
Note for the example, you need to add a CPPCOMPONENTS_REGISTER(ImplementFiles) after the CPPCOMPONENTS_DEFINE_FACTORY() statement

unique type identifiers across different C++ programs

Is there a way to automatically (i.e. not by hand) assign unique identifiers to types in different programs that share common source code? I'd need one program to tell another "use type X" and the other would know what that "X" meant. Of course, they would (partially) share the source code, as you cannot construct types in runtime, I just want an automatic way of constructing a map from some sort of identifiers (integers or strings) to e.g. factory functions returning objects of given type.
An obvious choice I'd go for is result of name() in std::type_info, but as I understand, that is not even guaranteed to be different across types, and using address of std::type_info instances is certainly not going to work across programs.
I cannot use C++11, but can use Boost for this.

I just want an automatic way of constructing a map from some sort of
identifiers (integers or strings) to e.g. factory functions returning
objects of given type.
Not going to happen, not within Standard C++, anyway.

You could take a look at boost serialisation. It automatically generates unique ids for polimorphic classes and allows the explicit registration of non polimorphic ones.

Why and how should I use namespaces in C++?

I have never used namespaces for my code before. (Other than for using STL functions)
Other than for avoiding name conflicts, is there any other reason to use namespaces?
Do I have to enclose both declarations and definitions in namespace scope?

One reason that's often overlooked is that simply by changing a single line of code to select one namespaces over another you can select an alternative set of functions/variables/types/constants - such as another version of a protocol, or single-threaded versus multi-threaded support, OS support for platform X or Y - compile and run. The same kind of effect might be achieved by including a header with different declarations, or with #defines and #ifdefs, but that crudely affects the entire translation unit and if linking different versions you can get undefined behaviour. With namespaces, you can make selections via using namespace that only apply within the active namespace, or do so via a namespace alias so they only apply where that alias is used, but they're actually resolved to distinct linker symbols so can be combined without undefined behaviour. This can be used in a way similar to template policies, but the effect is more implicit, automatic and pervasive - a very powerful language feature.
UPDATE: addressing marcv81's comment...
Why not use an interface with two implementations?
"interface + implementations" is conceptually what choosing a namespace to alias above is doing, but if you mean specifically runtime polymorphism and virtual dispatch:
the resultant library or executable doesn't need to contain all implementations and constantly direct calls to the selected one at runtime
as one implementation's incorporated the compiler can use myriad optimisations including inlining, dead code elimination, and constants differing between the "implementations" can be used for e.g. sizes of arrays - allowing automatic memory allocation instead of slower dynamic allocation
different namespaces have to support the same semantics of usage, but aren't bound to support the exact same set of function signatures as is the case for virtual dispatch
with namespaces you can supply custom non-member functions and templates: that's impossible with virtual dispatch (and non-member functions help with symmetric operator overloading - e.g. supporting 22 + my_type as well as my_type + 22)
different namespaces can specify different types to be used for certain purposes (e.g. a hash function might return a 32 bit value in one namespace, but a 64 bit value in another), but a virtual interface needs to have unifying static types, which means clumsy and high-overhead indirection like boost::any or boost::variant or a worst case selection where high-order bits are sometimes meaningless
virtual dispatch often involves compromises between fat interfaces and clumsy error handling: with namespaces there's the option to simply not provide functionality in namespaces where it makes no sense, giving a compile-time enforcement of necessary client porting effort

Here is a good reason (apart from the obvious stated by you).
Since namespace can be discontiguous and spread across translation units, they can also be used to separate interface from implementation details.
Definitions of names in a namespace can be provided either in the same namespace or in any of the enclosing namespaces (with fully qualified names).

It can help you for a better comprehension.
eg:
std::func <- all function/class from C++ standard library
lib1::func <- all function/class from specific library
module1::func <-- all function/class for a module of your system
You can also think of it as module in your system.
It can also be usefull for an writing documentation (eg: you can easily document namespace entity in doxygen)

Aren't name collisions enough of a reason? ADL subtleties, especially with operator overloads, are another.
That's the easiest way. You can also prefix names with the namespace, e.g. my_namespace::name, when defining.

You can think of namespaces as logical separated units for your application, and logical here means that suppose we have two different classes, putting these two classes each in a file, but when you notice that these classes share something enough to be categorized under one category, that's one strong reason to use namespaces.

Answer: If you ever want to overload the new, placement new, or delete functions you're going to want to do them in a namespace. No one wants to be forced to use your version of new if they don't require the things you require.
Yes

Trailing underscores for member variables in C++

I've seen people use a trailing underscore for member variables in classes, for instance in the renowned C++ FAQ Lite.
I think that it's purpose is not to mark variables as members, that's what "m_" is for. It's actual purpose is to make it possible to have an accessor method named like the field, like this:
class Foo {
public:
bar the_bar() { return the_bar_; }
private:
bar the_bar_;
}
Having accessors omit the "get_" part is common in the STL and boost, and I'm trying to develop a coding style as close to these as possible, but I can't really see them using the underscore trick. I wasn't able to find an accessor in STL or boost that would just return a private variable.
I have a few questions I'm hoping you will be able to answer:
Where does this convention come from? Smalltalk? Objective-C? Microsoft? I'm wondering.
Would I use the trailing underscore for all private members or just as a workaround in case I want to name a function like a variable?
Can you point me to STL or boost code that demonstrates trailing underscores for member variables?
Does anybody know what Stroustrup's views on the issue are?
Can you point me to further discussion of the issue?

In C++,
identifiers starting with an underscore, followed by a capital character
identifiers having two consecutive underscores anywhere
identifiers in the global namespace starting with an underscore
are reserved to the implementation. (More about this can be found here.) Rather than trying to remember these rules, many simply do not use identifiers starting with an underscore. That's why the trailing underscore was invented.
However, C++ itself is old, and builds on 40 years of C (both of which never had a single company behind them), and has a standard library that has "grown" over several decades, rather than brought into being in a single act of creation. This makes for the existence of a lot of differing naming conventions. Trailing underscore for privates (or only for private data) is but one, many use other ones (not few among them arguing that, if you need underscores to tell private members from local variables, your code isn't clear enough).
As for getters/setters - they are an abomination, and a sure sign of "quasi classes", which I hate.

I've read The C++ Programming Language and Stroustrup doesn't use any kind of convention for naming members. He never needs to; there is not a single simple accessor/mutator, he has a way of creating very fine object-oriented designs so there's no need to have a method of the same name. He uses structs with public members whenever he needs simple data structures. His methods always seem to be operations. I've also read somewhere that he disencourages the use of names that differ only by one character.

I am personally a big fan of this guideline: http://geosoft.no/development/cppstyle.html
It includes omitting the m_ prefix, using an underscore suffix to indicate private member variables and dropping the horrid, annoying-to-type habit of using underscores instead of space, and other, more detailed and specific suggestions, such as naming bools appropriately(isDone instead of just done) and using getVariable() instead of just variable() to name a few.

Only speaking for myself...
I always use trailing underscore for private data members, regardless if they have accessor functions or not. I don't use m_ mainly because it gets in the way when I mentally spell the variable's name.

As a maintenance developer that likes searchability I'm leaning towards m_ as its more searchable. When you, as me, are maintaining big projects with large classes (don't ask) you sometimes wonder: "Hmmm, who mutates state?". A quick search for m_ can give a hint.
I've also been known to use l_ to indicate local variables but the current project doesn't use that so I'm "clean" these days.
I'm no fan of hungarian notation. C++ has a strong type system, I use that instead.

I'm guessing that utopia would have been to use a leading underscore - this is quite common in Java and C# for members.
However, for C, leading underscores aren't a good idea, so hence I guess the recommendation by the C++ FAQ Lite to go trailing underscore:
All identifiers that begin with an
underscore and either an uppercase
letter or another underscore are
always reserved for any use.
All identifiers that begin with an
underscore are always reserved for use
as identifiers with file scope in both
the ordinary and tag name spaces.
(ISO C99 specification, section 7.1.3)

As far as I remember, it's not Microsoft that pushed the trailing underscore code style for members.
I have read that Stroustrup is pro the trailing underscore.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js