How to view standard library functions in C++? - c++

For example, i want to see the code of function toupper() to understand how it works, is there any way? I have searched and opened string.h library, but didn't find anything.

From a strict language point of view, you cannot "see the code" of a standard function, because the C++ language standard only defines functions' prototypes and behaviours, not how they are implemented.
In fact, from a strict language point of view, a standard function like toupper does not even have to have source code, because a standard header, like <string.h> does not even have to be a file!
Of course, in practice, you will probably never encounter a C++ implementation in which standard headers are not files, because files are just a natural and simple implementation of headers. This means that in practice, for the header <string.h>, there is actually a C++ source file called "string.h" somewhere on your computer. Just find it and open it.
I have searched and opened string.h library, but didn't find anything.
Then you have not looked close enough. Hint: This file most likely includes one or more other header files.
Note that if you actually looked for toupper, that function is not in <string.h> anyway. Look in <ctype.h> instead. cppreference.com is a good online reference to tell you which headers contain which functions.
http://en.cppreference.com/w/c/string/byte/toupper
Again, this does not mean that the corresponding header file of your compiler contains that function directly, but it may directly or indirectly include some other file which contains it.
In any case, beware of what you will see inside of your compiler's header files. It will usually be a lot more complicated than you may think, and, more importantly, it will often use constructs you are not allowed to use in your own code; after all, the code in those files is internal to the compiler implementation, and the compiler has a lot of privileges you don't have, for example using otherwise forbidden identifiers like _STD_BEGIN. Also expect a lot of completely non-standard #pragmas and other non-portable stuff.
Another important thing to keep in mind is that you are not supposed to dig through a function's implementation to find out what it does. In badly written software, i.e. software with confusing interfaces and no documentation (which exists everywhere in the real world), you unfortunately have to do this, provided you have access to the source code.
But C++ standard functions are perfectly documented and have, with some arguable exceptions, well-designed interfaces. It may be interesting, and educating, and sometimes even necessary for debugging, to look into their implementation on your system, but don't let this possibility keep you from learning two important software-engineering skills:
Reading documentation.
Programming to interfaces, not to implementations.

Yes, of course, you could (not all realizations, maybe). For example, the glibc implementation defines toupper function as:
#define __ctype_toupper \
((int32_t *) _NL_CURRENT (LC_CTYPE, _NL_CTYPE_TOUPPER) + 128)
int
toupper (int c)
{
return c >= -128 && c < 256 ? __ctype_toupper[c] : c;
}

Related

Using STL classes without including their appropriate headers

Consider this code:
#include <vector>
struct S { };
int main()
{
std::vector<int> v;
// Do whatever work I need to with v
// Oh, by the way, I also need std::allocator for something else...
std::allocator<S> a;
S s;
a.construct(&s, S());
a.destroy(&s);
}
std::allocator is declared in <memory>, but I have not included that header.
Questions:
Can I still rely on std::allocator being fully usable through the inclusion of <vector>? Why/why not?
If so, what other classes can I rely on being included indirectly, and under what conditions?
(Is there a list somewhere, or would I have to figure them out manually?)
Is it good practice to avoid including the specific header (e.g. <memory>) if you've already included another header that implies the inclusion of the class you need? Why/why not?
The C++ standard allows any standard header to include an arbitrary number of other standard headers. It's almost never, however, actually required.
Just for example, it's fairly common to put implementation details in a detail namespace, and then pull names from there to become publicly accessible if and only if the user has included a header that needs to make them visible.
In other words, if you're using something, include the header. This is actually a pretty common source of problems. With older compilers, including one header often ended up defining a lot that that header wasn't required to define. Newer compilers tend to be more granular, so a lot of older code needs minor patching to include the proper headers before it'll work correctly. While not exactly the biggest portability problem that arises, this is sufficiently annoying that it's clearly better to avoid it when/if you can.
Even in the few places there's a documented requirement for one header to include another (or at least do the equivalent), I think it's a fairly poor idea to depend on it. First, because the #include lines act as a sort of documentation, and depending on indirect inclusion means anybody using them as documentation has to take all that indirection definition into account. Second, because it's easy to slip into thinking that because including one header has to define a few specific items normally defined in another header that it will automatically define everything in that header, which isn't necessarily true.
Either the standard provides a guarantee or it does not. If it does not, then you have no guarantee. Your "implies the inclusion" argument fails because the compiler is not required to include any more of the class that it needs, and clearly anything you do with the class specifically is more than is needed.

Where are Library definitions located?

Could someone say me where I can found the source file where are the definitions of the standard libraries?
For example, where is the file string.c that contains the definitions of the function prototyped in string.h? And above all, it exists?
its all in compilled state, some of maybe optimized by asm. You need find sources of your compiler to see definitions
For GCC, which is open source, you can download the sources for the libstdc++ library from their mirror sites here. Included in the download is the source for the std library. Bear in mind that different vendors will have different implementations, so the link provided is merely how the developers of GCC decided to implement the standard library
You're probably not going to like this.
The C++ Standard does not say specifically where anything in the Standard Libraries are actually implemented. It says where things are declared, but only to the degree that it names the file(s) you must #include in order to bring the names in. For example, the Standard says that:
std::string
is a typedef for basic_string<...>, and in order to bring that typedef in to your program, you must #include <string>. It doesn't actually say that basic_string or string are defined in <string> however, and it doesn't say where, on your hard drive <string> is even located. In fact, it's often not in <string> in the real world. In my implementation, (MSVC10) string is defined in a different file, <xstring>, and it looks like this:
typedef basic_string<char, char_traits<char>, allocator<char> >
string;
Useful, huh?
There's another aspect. A lot of the stuff in the Standard Library is template stuff, like string, so because of the way templates work in C++ these facilities must be so-called "include libraries." But not everything in the Standard Library is made up of templates.
Consider sprintf. The Standard says that this declaration is provided by #include <cstdio> but that, like string isn't even where it's declared. And sprintf isn't a template thing. the implementation is in what's often called the CRT -- the C Runtime Library. This is a collection of DLLs and LIBs (in MSVC10, anyway) that your program links to to run code like sprintf.
Now the bad news is those components that are in the CRT are generally shipped without source code. You don't know where sprintf is implemented and you can't look at the source code. You're left with little alternative in these cases except get a job with MicroSoft so you can take a look at the source code. :)

Ways not to write function headers twice?

I've got a C/C++ question, can I reuse functions across different object files or projects without writing the function headers twice? (one for defining the function and one for declaring it)
I don't know much about C/C++, Delphi and D. I assume that in Delphi or D, you would just write once what arguments a function takes and then you can use the function across diferent projects.
And in C you need the function declaration in header files *again??, right?. Is there a good tool that will create header files from C sources? I've got one, but it's not preprocessor-aware and not very strict. And I've had some macro technique that worked rather bad.
I'm looking for ways to program in C/C++ like described here http://www.digitalmars.com/d/1.0/pretod.html
Imho, generating the headers from the source is a bad idea and is unpractical.
Headers can contain more information that just function names and parameters.
Here are some examples:
a C++ header can define an abstract class for which a source file may be unneeded
A template can only be defined in a header file
Default parameters are only specified in the class definition (thus in the header file)
You usually write your header, then write the implementation in a corresponding source file.
I think doing the other way around is counter-intuitive and doesn't fit with the spirit of C or C++.
The only exception is can see to that is the static functions. A static function only appears in its source file (.cor .cpp) and can't (obviously) be used elsewhere.
While I agree it is often annoying to copy the header definition of a method/function to the source file, you can probably configure your code editor to ease this. I use Vim and a quick script helped me with this a lot. I guess a similar solution exists for most other editors.
Anyway, while this can seem annoying, keep in mind it also gives a greater flexibility. You can distribute your header files (.h, .hpp or whatever) and then transparently change the implementation in source files afterward.
Also, just to mention it, there is no such thing as C/C++: there is C and there is C++; those are different languages (which indeed share much, but still).
It seems to me that you don't really need/want to auto-generate headers from source; you want to be able to write a single file and have a tool that can intelligently split that into a header file and a source file.
Unfortunately, I'm not aware of any such tool. It's certainly possible to write one - but you'd need a given a C++ front end. You could try writing something using clang - but it would be a significant amount of work.
Considering you have declared some functions and wrote their implementation you will have a .c/cpp file and a header .h file.
What you must do in order to use those functions:
Create a library (DLL/so or static library .a/.lib - for now I recommend static library for the ease of use) from the files were the implementation resides
Use the header file (#include it) (you don't need to rewrite the header file again) in your programs to obtain the function definitions and link with your library from step 1.
Though >this< is an example for Visual Studio it makes perfect sense for other development environments also.
This seems like a rudimentary question, so assuming I have not mis-read,
Here is a basic example of re-use, to answer your first question:
#include "stdio.h"
int main( int c, char ** argv ){
puts( "Hello world" );
}
Explanation:
1. stdio.h is a C header file containing (among others) the definition of a function called puts().
2. in main, puts() is called, from the included definition.
Some compilers (including gcc I think ) have an option to generate headers.
There is always very much confusion about headers and source-files in C++. The links I provided should help to clear that up a little.
If you are in the situation that you want to extract headers from source-file, then you probably went about it the wrong way. Usually you first declare your function in a header-file, and then provide an implementation (definition) for it in a source-file. If your function is actually a method of a class, you can also provide the definition in header file.
Technically, a header file is just a bunch of text that is actually inserted into the source file by the preprocessor:
#include <vector>
tells the preprocessor to insert contents of the file vector at the exact place where the #include appears. This really just text-replacement. So, header-files are not some kind of special language construct. They contain normal code. But by putting that code into a separate file, you can easily include it in other files using the preprocessor.
I think it's a good question which is what led me to ask this: Visual studio: automatically update C++ cpp/header file when the other is changed?
There are some refactoring tools mentioned but unfortunately I don't think there's a perfect solution; you simply have to write your function signatures twice. The exception is when you are writing your implementations inline, but there are reasons why you can't or shouldn't always do this.
You might be interested in Lazy C++. However, you should do a few projects the old-fashioned way (with separate header and source files) before attempting to use this tool. I considered using it myself, but then figured I would always be accidentally editing the generated files instead of the lzz file.
You could just put all the definitions in the header file...
This goes against common practice, but is not unheard of.

Proper layout of a C++ header file

What is the proper layout of a C++ .h file?
What I mean is header guard, includes, typedefs, enums, structs, function declarations, class definitions, classes, templates, etc, etc
I am porting an old code base that is over 10 years old and moving to a modern compiler from Codewarrior 8 is proving interesting as things seem all over the place. I get a lot of dont name a type errors, forbidding declaring without a type, etc, etc.
There is no silver bullet regarding how to organize your headers.
However one important rule is to keep it consistent across the project so that all persons involved in the project know what to expect.
Usually typedefs and defines are at the top of the file in my headers, but that can not be regarded as a rule, then come class/template definitions.
A rule that I follow for C++ is one header per class, which usually keeps the headers small enough to allow grasping the content and finding things without scrolling too much.
It depends on what you mean by proper. If you mean language-enforced, there really isn't one. In fact, you don't even have to name it ".h". I've seen ".c" files #include'd in working commercial code (name withheld to protect the guilty). #include is just a preprocessor hack to get some kind of rough modularity in the language by allowing files to textually include other files. Anything else you tend to see as standard practice is just useful idioms people have developed over time.
That doesn't help your current issue though.
I'd guess that what you are actually seeing is a lot of missing symbols due to platform differences. Nothing due to weirdly-formed .h files at all.
It is possible that the old code was written to work with an old K&R-style C compiler. They had oddities like implicit function declarations (any reference to an undeclared routine assumed it returned int and all its parameters were int). You could try seeing if your compiler has a K&R flag, but a lot of the flagged stuff may actually be latent errors in the old code.
It sounds like you're running into assumptions made based on the previous implementation (Codewarrior). For example:
#include <iostream>
int main() {
std::cout << "string literal\n";
return 0;
}
This relies on iostream including something it's not required to declare: the operator<<(ostream&, char const*) overload (it's a free function, not a method of ostream like the others). And to be completely unambiguous, #include <ostream> is also required above. In C++, library headers are allowed to include any other library header, in general, so this problem crops up whenever someone inadvertently depends on this.
(That the extra header is required in this particular circumstance is considered a flaw by many, including me, and almost all implementations do provide the declaration of this function in iostream. It is still the shortest, common example I know of to illustrate this.)
It's often more subtle and complicated than this simple example, but the core issue is the same. The solution is to check every header to make sure it includes any libraries it requires, starting with the ones giving you the errors. E.g. #include <vector> and make sure you use std::vector (to avoid relying on it being in the global namespace, which is done in some, mostly old and obsolete now, implementations) when you get "vector does not name a type".
You might also be running into dependent types, in which case you'd add typename.
I think best thing you can do is to check out layout of any library files.

What are the advantages and disadvantages of separating declaration and definition as in C++?

In C++, declaration and definition of functions, variables and constants can be separated like so:
function someFunc();
function someFunc()
{
//Implementation.
}
In fact, in the definition of classes, this is often the case. A class is usually declared with it's members in a .h file, and these are then defined in a corresponding .C file.
What are the advantages & disadvantages of this approach?
Historically this was to help the compiler. You had to give it the list of names before it used them - whether this was the actual usage, or a forward declaration (C's default funcion prototype aside).
Modern compilers for modern languages show that this is no longer a necessity, so C & C++'s (as well as Objective-C, and probably others) syntax here is histotical baggage. In fact one this is one of the big problems with C++ that even the addition of a proper module system will not solve.
Disadvantages are: lots of heavily nested include files (I've traced include trees before, they are surprisingly huge) and redundancy between declaration and definition - all leading to longer coding times and longer compile times (ever compared the compile times between comparable C++ and C# projects? This is one of the reasons for the difference). Header files must be provided for users of any components you provide. Chances of ODR violations. Reliance on the pre-processor (many modern languages do not need a pre-processor step), which makes your code more fragile and harder for tools to parse.
Advantages: no much. You could argue that you get a list of function names grouped together in one place for documentation purposes - but most IDEs have some sort of code folding ability these days, and projects of any size should be using doc generators (such as doxygen) anyway. With a cleaner, pre-processor-less, module based syntax it is easier for tools to follow your code and provide this and more, so I think this "advantage" is just about moot.
It's an artefact of how C/C++ compilers work.
As a source file gets compiled, the preprocessor substitutes each #include-statement with the contents of the included file. Only afterwards does the compiler try to interpret the result of this concatenation.
The compiler then goes over that result from beginning to end, trying to validate each statement. If a line of code invokes a function that hasn't been defined previously, it'll give up.
There's a problem with that, though, when it comes to mutually recursive function calls:
void foo()
{
bar();
}
void bar()
{
foo();
}
Here, foo won't compile as bar is unknown. If you switch the two functions around, bar won't compile as foo is unknown.
If you separate declaration and definition, though, you can order the functions as you wish:
void foo();
void bar();
void foo()
{
bar();
}
void bar()
{
foo();
}
Here, when the compiler processes foo it already knows the signature of a function called bar, and is happy.
Of course compilers could work in a different way, but that's how they work in C, C++ and to some degree Objective-C.
Disadvantages:
None directly. If you're using C/C++ anyway, it's the best way to do things. If you've got a choice of language/compiler, then maybe you can pick one where this is not an issue. The only thing to consider with splitting declarations into header files is to avoid mutually recursive #include-statements - but that's what include guards are for.
Advantages:
Compilation speed: As all included files are concatenated and then parsed, reducing the amount and complexity of code in included files will improve compilation time.
Avoid code duplication/inlining: If you fully define a function in a header file, each object file that includes this header and references this function will contain it's own version of that function. As a side-note, if you want inlining, you need to put the full definition into the header file (on most compilers).
Encapsulation/clarity: A well defined class/set of functions plus some documentation should be enough for other developers to use your code. There is (ideally) no need for them to understand how the code works - so why require them to sift through it? (The counter-argument that it's may be useful for them to access the implementation when required still stands, of course).
And of course, if you're not interested in exposing a function at all, you can usually still choose to define it fully in the implementation file rather than the header.
The standard requires that when using a function, a declaration must be in scope. This means, that the compiler should be able to verify against a prototype (the declaration in a header file) what you are passing to it. Except of course, for functions that are variadic - such functions do not validate arguments.
Think of C, when this was not required. At that time, compilers treated no return type specification to be defaulted to int. Now, assume you had a function foo() which returned a pointer to void. However, since you did not have a declaration, the compiler will think that it has to return an integer. On some Motorola systems for example, integeres and pointers would be be returned in different registers. Now, the compiler will no longer use the correct register and instead return your pointer cast to an integer in the other register. The moment you try to work with this pointer -- all hell breaks loose.
Declaring functions within the header is fine. But remember if you declare and define in the header make sure they are inline. One way to achieve this is to put the definition inside the class definition. Otherwise prepend the inline keyword. You will run into ODR violation otherwise when the header is included in multiple implementation files.
There are two main advantages to separating declaration and definition into C++ header and source files. The first is that you avoid problems with the One Definition Rule when your class/functions/whatever are #included in more than one place. Secondly, by doing things this way, you separate interface and implementation. Users of your class or library need only to see your header file in order to write code that uses it. You can also take this one step farther with the Pimpl Idiom and make it so that user code doesn't have to recompile every time the library implementation changes.
You've already mentioned the disadvantage of code repetition between the .h and .cpp files. Maybe I've written C++ code for too long, but I don't think it's that bad. You have to change all user code every time you change a function signature anyway, so what's one more file? It's only annoying when you're first writing a class and you have to copy-and-paste from the header to the new source file.
The other disadvantage in practice is that in order to write (and debug!) good code that uses a third-party library, you usually have to see inside it. That means access to the source code even if you can't change it. If all you have is a header file and a compiled object file, it can be very difficult to decide if the bug is your fault or theirs. Also, looking at the source gives you insight into how to properly use and extend a library that the documentation might not cover. Not everyone ships an MSDN with their library. And great software engineers have a nasty habit of doing things with your code that you never dreamed possible. ;-)
Advantage
Classes can be referenced from other files by just including the declaration. Definitions can then be linked later on in the compilation process.
You basically have 2 views on the class/function/whatever:
The declaration, where you declare the name, the parameters and the members (in the case of a struct/class), and the definition where you define what the functions does.
Amongst the disadvantages are repetition, yet one big advantage is that you can declare your function as int foo(float f) and leave the details in the implementation(=definition), so anyone who wants to use your function foo just includes your header file and links to your library/objectfile, so library users as well as compilers just have to care for the defined interface, which helps understanding the interfaces and speeds up compile times.
One advantage that I haven't seen yet: API
Any library or 3rd party code that is NOT open source (i.e. proprietary) will not have their implementation along with the distribution. Most companies are just plain not comfortable with giving away source code. The easy solution, just distribute the class declarations and function signatures that allow use of the DLL.
Disclaimer: I'm not saying whether it's right, wrong, or justified, I'm just saying I've seen it a lot.
One big advantage of forward declarations is that when used carefully you can cut down the compile time dependencies between modules.
If ClassA.h needs to refer to a data element in ClassB.h, you can often use just a forward references in ClassA.h and include ClassB.h in ClassA.cc rather than in ClassA.h, thus cutting down a compile time dependency.
For big systems this can be a huge time saver on a build.
Disadvantage
This leads to a lot of repetition. Most of the function signature needs to be put in two or more (as Paulious noted) places.
Separation gives clean, uncluttered view of program elements.
Possibility to create and link to binary modules/libraries without disclosing sources.
Link binaries without recompiling sources.
When done correctly, this separation reduces compile times when only the implementation has changed.