Inferring the implementation of a function in c++ - c++

I was going through the documentation of rand() function in c++ and it says
The function accesses and modifies internal state objects, which may cause data races with concurrent calls to rand or srand. Some libraries provide an alternative function that explicitly avoids this kind of data race: rand_r (non-portable). C++ library implementations are allowed to guarantee no data races for calling this function.
As a more general question how can I be sure that I am calling a c++ implementation of a function (rand in this case)?
Calling rand() inside a file having .cc or .cpp extension.
or, any particular header that can ensure this
I am asking this question because my understanding is that when I use cstdlib header, it in turn calls the c implementation of that (stdlib.h). If that's not case then does c++ provide its own implementation for all c functions?

I think you are asking the wrong question.
You've read that C++ library implementations are allowed to give you a version that has no data races. They are allowed, but they are not required to do so. If you had some all-knowing oracle capable of telling you whether you are using a C++ implementation, and if it told you that you are, would that solve your problem? No, not really, because you still wouldn't know whether that implementation would guarantee the absence of data races. Maybe it would, but you'd have no certainty.
So you have to ask the right question: how do I know whether the function I'm using guarantees that? And the answer is: check the specific documentation of the library you are using! I suppose you are reading the cplusplus.com page on rand. That is a generic site, unrelated to a specific library, so it won't help you answering this question. Instead, what compiler and standard library are you using? Check their documentation. If the authors state that their rand function is guaranteed to be race-free, then go ahead and use it. Otherwise, be conservative and assume there are some races, and don't use it.
And by the way, a lot of people would tell you that that site should be avoided, because it isn't very reliable. In general, cppreference is preferred. And it says that
It is implementation-defined whether rand() is thread-safe.
Where "implementation defined" means exactly what I said. And if you continue reading, it will also list some other problems (the numbers it generates aren't that random after all), and
It is recommended to use C++11's random number generation facilities to replace rand().

Related

Compiler optimizations on map/set in c++

Does compiler make optimzation on data structures when input size is small ?
unordered_set<int>TmpSet;
TmpSet.insert(1);
TmpSet.insert(2);
TmpSet.insert(3);
...
...
Since since is small using hashing would be not required we can simply store this in 3variables. Do optimization like this happen ? If yes who is responsible for them ?
edit: Replaced set with unordered_set as former doesn't do hasing.
Possible in theory to totally replace whole data structures with a different implementation. (As long as escape analysis can show that a reference to it couldn't be passed to separately-compiled code).
But in practice what you've written is a call to a constructor, and then three calls to template functions which aren't simple. That's all the compiler sees, not the high-level semantics of a set.
Unless it's going to optimize stuff away, real compilers aren't going to use a different implementation which would have different object representations.
If you want to micro-optimize like this, in C++ you should do it yourself, e.g. with a std::bitset<32> and set bits to indicate set membership.
It's far too complex a problem for a compiler to reliably do a good job. And besides, there could be serious quality-of-implementation issues if the compiler starts inventing different data-structures that have different speed/space tradeoffs, and it guesses wrong and uses one that doesn't suit the use-case well.
Programmers have a reasonable expectation that their compiled code will work somewhat like what they wrote, at least on a large scale. Including calling those standard header template functions with their implementation of the functions.
Maybe there's some scope for a C++ implementation adapting to the use-case, but I that would need much debate and probably a special mechanism to allow it in practice. Perhaps name-recognition of std:: names could be enough, making those into builtins instead of header implementations, but currently the implementation strategy is to write those functions in C++ in .h headers. e.g. in libstdc++ or libc++.

Why isn't the C++ standard library already pre-included in any C++ source?

In C++ the standard library is wrapped in the std namespace and the programmer is not supposed to define anything inside that namespace. Of course the standard include files don't step on each other names inside the standard library (so it's never a problem to include a standard header).
Then why isn't the whole standard library included by default instead of forcing programmers to write for example #include <vector> each time? This would also speed up compilation as the compilers could start with a pre-built symbol table for all the standard headers.
Pre-including everything would also solve some portability problems: for example when you include <map> it's defined what symbols are taken into std namespace, but it's not guaranteed that other standard symbols are not loaded into it and for example you could end up (in theory) with std::vector also becoming available.
It happens sometimes that a programmer forgets to include a standard header but the program compiles anyway because of an include dependence of the specific implementation. When moving the program to another environment (or just another version of the same compiler) the same source code could however stop compiling.
From a technical point of view I can image a compiler just preloading (with mmap) an optimal perfect-hash symbol table for the standard library.
This should be faster to do than loading and doing a C++ parse of even a single standard include file and should be able to provide faster lookup for std:: names. This data would also be read-only (thus probably allowing a more compact representation and also shareable between multiple instances of the compiler).
These are however just shoulds as I never implemented this.
The only downside I see is that we C++ programmers would lose compilation coffee breaks and Stack Overflow visits :-)
EDIT
Just to clarify the main advantage I see is for the programmers that today, despite the C++ standard library being a single monolithic namespace, are required to know which sub-part (include file) contains which function/class. To add insult to injury when they make a mistake and forget an include file still the code may compile or not depending on the implementation (thus leading to non-portable programs).
Short answer is because it is not the way the C++ language is supposed to be used
There are good reasons for that:
namespace pollution - even if this could be mitigated because std namespace is supposed to be self coherent and programmer are not forced to use using namespace std;. But including the whole library with using namespace std; will certainly lead to a big mess...
force programmer to declare the modules that he wants to use to avoid inadvertently calling a wrong standard function because standard library is now huge and not all programmers know all modules
history: C++ has still strong inheritance from C where namespace do not exist and where the standard library is supposed to be used as any other library.
To go in your sense, Windows API is an example where you only have one big include (windows.h) that loads many other smaller include files. And in fact, precompiled headers allows that to be fast enough
So IMHO a new language deriving from C++ could decide to automatically declare the whole standard library. A new major release could also do it, but it could break code intensively using using namespace directive and having custom implementations using same names as some standard modules.
But all common languages that I know (C#, Python, Java, Ruby) require the programmer to declare the parts of the standard library that he wants to use, so I suppose that systematically making available every piece of the standard library is still more awkward than really useful for the programmer, at least until someone find how to declare the parts that should not be loaded - that's why I spoke of a new derivative from C++
Most of the C++ standard libraries are template based which means that the code they'll generate will depend ultimately in how you use them. In other words, there is very little that could be compiled before instantiate a template like std::vector<MyType> m_collection;.
Also, C++ is probably the slowest language to compile and there is a lot parsing work that compilers have to do when you #include a header file that also includes other headers.
Well, first thing first, C++ tries to adhere to "you only pay for what you use".
The standard-library is sometimes not part of what you use at all, or even of what you could use if you wanted.
Also, you can replace it if there's a reason to do so: See libstdc++ and libc++.
That means just including it all without question isn't actually such a bright idea.
Anyway, the committee are slowly plugging away at creating a module-system (It takes lots of time, hopefully it will work for C++1z: C++ Modules - why were they removed from C++0x? Will they be back later on?), and when that's done most downsides to including more of the standard-library than strictly neccessary should disappear, and the individual modules should more cleanly exclude symbols they need not contain.
Also, as those modules are pre-parsed, they should give the compilation-speed improvement you want.
You offer two advantages of your scheme:
Compile-time performance. But nothing in the standard prevents an implementation from doing what you suggest[*] with a very slight modification: that the pre-compiled table is only mapped in when the translation unit includes at least one standard header. From the POV of the standard, it's unnecessary to impose potential implementation burden over a QoI issue.
Convenience to programmers: under your scheme we wouldn't have to specify which headers we need. We do this in order to support C++ implementations that have chosen not to implement your idea of making the standard headers monolithic (which currently is all of them), and so from the POV of the C++ standard it's a matter of "supporting existing practice and implementation freedom at a cost to programmers considered acceptable". Which is kind of the slogan of C++, isn't it?
Since no C++ implementation (that I know of) actually does this, my suspicion is that in point of fact it does not grant the performance improvement you think it does. Microsoft provides precompiled headers (via stdafx.h) for exactly this performance reason, and yet it still doesn't give you an option for "all the standard libraries", instead it requires you to say what you want in. It would be dead easy for this or any other implementation to provide an implementation-specific header defined to have the same effect as including all standard headers. This suggests to me that at least in Microsoft's opinion there would be no great overall benefit to providing that.
If implementations were to start providing monolithic standard libraries with a demonstrable compile-time performance improvement, then we'd discuss whether or not it's a good idea for the C++ standard to continue permitting implementations that don't. As things stand, it has to.
[*] Except perhaps for the fact that <cassert> is defined to have different behaviour according to the definition of NDEBUG at the point it's included. But I think implementations could just preprocess the user's code as normal, and then map in one of two different tables according to whether it's defined.
I think the answer comes down to C++'s philosophy of not making you pay for what you don't use. It also gives you more flexibility: you aren't forced to use parts of the standard library if you don't need them. And then there's the fact that some platforms might not support things like throwing exceptions or dynamically allocating memory (like the processors used in the Arduino, for example). And there's one other thing you said that is incorrect. As long as it's not a template class, you are allowed to add swap operators to the std namespace for your own classes.
First of all, I am afraid that having a prelude is a bit late to the game. Or rather, seeing as preludes are not easily extensible, we have to content ourselves with a very thin one (built-in types...).
As an example, let's say that I have a C++03 program:
#include <boost/unordered_map.hpp>
using namespace std;
using boost::unordered_map;
static unordered_map<int, string> const Symbols = ...;
It all works fine, but suddenly when I migrate to C++11:
error: ambiguous symbol "unordered_map", do you mean:
- std::unordered_map
- boost::unordered_map
Congratulations, you have invented the least backward compatible scheme for growing the standard library (just kidding, whoever uses using namespace std; is to blame...).
Alright, let's not pre-include them, but still bundle the perfect hash table anyway. The performance gain would be worth it, right?
Well, I seriously doubt it. First of all because the Standard Library is tiny compared to most other header files that you include (hint: compare it to Boost). Therefore the performance gain would be... smallish.
Oh, not all programs are big; but the small ones compile fast already (by virtue of being small) and the big ones include much more code than the Standard Library headers so you won't get much mileage out of it.
Note: and yes, I did benchmark the file look-up in a project with "only" a hundred -I directives; the conclusion was that pre-computing the "include path" to "file location" map and feeding it to gcc resulted in a 30% speed-up (after using ccache already). Generating it and keeping it up-to-date was complicated, so we never used it...
But could we at least include a provision that the compiler could do it in the Standard?
As far as I know, it is already included. I cannot remember if there is a specific blurb about it, but the Standard Library is really part of the "implementation" so resolving #include <vector> to an internal hash-map would fall under the as-if rule anyway.
But they could do it, still!
And lose any flexibility. For example Clang can use either the libstdc++ or the libc++ on Linux, and I believe it to be compatible with the Dirkumware's derivative that ships with VC++ (or if not completely, at least greatly).
This is another point of customization: if the Standard library does not fit your needs, or your platforms, by virtue of being mostly treated like any other library you can replace part of most of it with relative ease.
But! But!
#include <stdafx.h>
If you work on Windows, you will recognize it. This is called a pre-compiled header. It must be included first (or all benefits are lost) and in exchange instead of parsing files you are pulling in an efficient binary representation of those parsed files (ie, a serialized AST version, possibly with some type resolution already performed) which saves off maybe 30% to 50% of the work. Yep, this is close to your proposal; this is Computer Science for you, there's always someone else who thought about it first...
Clang and gcc have a similar mechanism; though from what I've heard it can be so painful to use that people prefer the more transparent ccache in practice.
And all of these will come to naught with modules.
This is the true solution to this pre-processing/parsing/type-resolving madness. Because modules are truly isolated (ie, unlike headers, not subject to inclusion order), an efficient binary representation (like pre-compiled headers) can be pre-computed for each and every module you depend on.
This not only means the Standard Library, but all libraries.
Your solution, more flexible, and dressed to the nines!
One could use an alternative implementation of the C++ Standard Library to the one shipped with the compiler. Or wrap headers with one's definitions, to add, enable or disable features (see GNU wrapper headers). Plain text headers and the C inclusion model are a more powerful and flexible mechanism than a binary black box.

Do dynamic libraries break C++ standard?

The C++ standard 3.6.3 states
Destructors for initialized objects of static duration are called as a result of returning from main and as a result of calling exit
On windows you have FreeLibrary and linux you have dlclose to unload a dynamically linked library. And you can call these functions before returning from main.
A side effect of unloading a shared library is that all destructors for static objects defined in the library are run.
Does this mean it violates the C++ standard as these destructors have been run prematurely ?
It's a meaningless question. The C++ standard doesn't say what dlclose does or should do.
If the standard were to include a specification for dlclose, it would certainly point out that dlclose is an exception to 3.6.3. So then 3.6.3 wouldn't be violated because it would be a documented exception. But we can't know that, since it doesn't cover it.
What effect dlclose has on the guarantees in the C++ standard is simply outside the scope of that standard. Nothing dlclose can do can violate the C++ standard because the standard says nothing about it.
(If this were to happen without the program doing anything specific to invoke it, then you would have a reasonable argument that the standard is being violated.)
Parapura, it may be helpful to keep in mind that the C++ standard is a language definition that imposes constraints on how the compiler converts source code into object code.
The standard does not impose constraints on the operating system, hardware, or anything else.
If a user powers off his machine, is that a violation of the C++ standard? Of course not. Does the standard need to say "unless the user powers off the device" as an "exception" to every rule? That would be silly.
Similarly, if an operating system kills a process or forces the freeing of some system resources, or even allows a third party program to clobber your data structures -- this is not a violation of the C++ standard. It may well be a bug in the OS, but the C++ language definition remains intact.
The standard is only binding on compilers, and forces the resulting executable code to have certain properties. Nevertheless, it does not bind runtime behavior, which is why we spend so much time on exception handling.
I'm taking this to be a bit of an open-ended question.
I'd say it's like this: The standard only defines what a program is. And a program (a "hosted" one, I should add) is a collection of compiled and linked translation units that has a unique main entry point.
A shared library has no such thing, so it doesn't even constitute a "program" in the sense of the standard. It's just a bunch of linked executable code without any sort of "flow". If you use load-time linking, the library becomes part of the program, and all is as expected. But if you use runtime linking, the situation is different.
Therefore, you may like to view it like this: global variables in the runtime-linked shared object are essentially dynamic objects which are constructed by the dynamic loader, and which are destroyed when the library is unloaded. The fact that those objects are declared like global objects doesn't change that, since the objects aren't part of a "program" at that point.
They are only run prematurely if you go to great effort to do so - the default behavior is standard conforming.
If it does violate the standard, who is the violator? The C++ compiler cannot be considered the violator (since things are being loaded dynamically via a library call); thus it must the the vendor of the dynamic loading functionality, aka the OS vendor. Are OS vendors bound by the C++ standard when designing their systems? That definitely seems to be outside of the scope of the standard.
Or for another perspective, consider the library itself to be a separate program providing some sort of service. When this program is terminated (by whatever means the library is unloaded) then all associated service objects should disappear as well, static or not.
This is just one of the tons and tons of platform-specific "extensions" (for a target compiler, architecture, OS, etc) that are available. All of which "violate" the standard in all sorts of ways. But there is only one expected consequence for deviating from standard C++: you aren't portable anymore. (Unless you do a lot of #ifdef or something, but still, that particular code is locked in to that platform).
Since there is currently no standard/cross-platform notion of libraries, if you want the feature, you have to either not use it or re-implement it per-platform. Since similar things are appearing on most platforms, maybe the standard will one day find a clean way to abstract them so that the standard covers them. The advantage will be a cross-platform solution and it will simplify cross platform code.

Is gets() officially deprecated? [duplicate]

This question already has answers here:
Why is the gets function so dangerous that it should not be used?
(13 answers)
Closed 1 year ago.
Based on the most recent draft of C++11, C++ refers to ISO/IEC 9899:1999/Cor.3:2007(E) for the definitions of the C library functions (per §1.2[intro.refs]/1).
Based on the most recent draft of C99 TC3, The gets function is obsolescent, and is deprecated. (per §7.26.9/2)
Can I safely say that gets() is deprecated in both C and C++?
Deprecated means you shouldn't use it and it might be removed in the future. Since both standards say it is deprecated, that means it is deprecated, officially.
Does it matter? The only way you can ever use gets is if stdin is known to be attached to a file whose contents you have full control over. This condition is almost impossible to satisfy, especially on multiprocess systems where other processes may modify files asynchronously with respect to your program. Therefore, for all practical purposes, any program using gets has undefined behavior (i.e. there are possible inputs/environmental conditions for which it will have undefined behavior), and in particular UB which is likely to lead to privilege compromise if your program has higher privileges than the provider of the data.
Edit: OK, here's one safe use of gets, about the only one I can think of right off...
if (feof(stdin)) gets(buf);
Of course some buggy implementations (possibly including glibc..?) permit reads even when the EOF indicator is already set for a stream, so....
Even code which would be broken by the removal of gets() from the library would, after such removal, be less broken than it was before such removal. I suppose it might be necessary for compiler vendors to include it in a "fully-standard compliant" mode, but the number of circumstances where it could safely be used is so vanishingly small that it would probably be reasonable to exclude it from a "normal" build.
It's going to be a while until C++11 is implemented everywhere.
Also, most compilers doesn't even fully support C99 yet.
Microsoft's, for instance, does not.
So no, it's not deprecated in both C and C++.
Well it was removed altogether from the C11 standard, so I'd take that as a yes.

"Real world" use of parameter passing in Fortran

I always assumed that fortran passed entities "by reference" to a dummy argument. Then I got this answer (the actual argument of the answer was related, but not on this)
The standard never specifies that and,
indeed goes quite a lot out of its way
to avoid such specification. Although
yours is a common misconception, it
was not strictly accurate even in most
older compilers, particularly with
optimization turned on. A strict
pass-by-reference would kill many
common optimizations.
With recent standards,
pass-by-reference is all but
disallowed in some cases. The standard
doesn't use those words in its
normative text, but there are things
that would be impractical to implement
with pass-by-reference.
When you start getting into things
like pointers, the error of assuming
that everything is pass-by-reference
will start making itself more evident
than before. You'll have to drop that
misconception or many things wil
confuse you.
I think other people have answered the
rest of the post adequately. Some also
addressed the above point, but I
wanted to emphasize it.
See here for attribution.
According to this answer, then, there's nothing in the standard specifying how data are shipped from the caller to the callee. In practical terms, how this should be interpreted in terms of actually working with it (regardless of the practical effects resulting from how compilers implement the standard) in particular when it comes to intent() specification?
Edit: I'd like to clarify my question. What I am trying to understand is how the standard expects you to work when you are performing calls. Given that the actual compiler strategy used to pass entities is undefined by the standard, you cannot in principle (according to the standard) expect that passing an argument to a function will actually behave as a "pass-by-reference", with all its related side-effects, because this behavior is compiler and optimization dependent. I assume therefore the standard imposes you a programming style you have to follow to work regardless of the actual implementation strategy.
Yes, that's true. Fortran standard doesn't specify the exact evaluation strategy for different INTENT attributes. For example, for a dummy argument with INTENT(OUT) some Fortran compiler can use call-by-reference or call-by-copy-restore evaluation.
But I do not understand why do you really need to know which evaluation strategy is used? What is the reason? I think Fortran do it right. Humans concern is (high-level) argument intent, and computers concern is (low-level) evaluation strategy. "Render unto man the things which are man’s and unto the computer the things which are the computer’s." N. Wiener
I think the standard contains enough information on the usage of different INTENT attributes. Section "5.1.2.7 INTENT attribute" of Final Committee Draft of Fortran 2003 standard (PDF, 5 MB) is a good starting point.
As the others have said, you don't have to know how the compiler works, as long as it works correctly. Fortran >=90 allows you to specify the purpose of an argument: input, output or both, and then the compiler takes care of passing the argument. As long as the compiler is correctly implemented so that both sides (caller and callee) use the same mechanism, why should the programmer care? So the Fortran standard tries not to limit how compiler writers implement their compilers, allowing them to design and optimize as they wish.
In terms of programming style, decide the purpose of your procedure arguments, and declare them with the appropriate intent. Then, for example, if you accidentally try to modify an intent(in) argument in a procedure, the compiler will issue an error message, preventing you from making this mistake. It might even do this before emitting object code, so the actual argument passing mechanism has nothing to do with enforcing this requirement of the language.
C is a lower-level language and the programmer needs to choose between passing by value or by reference for various reasons. But for problems such as scientific programming this is an unnecessary aspect to have to control. It can be essential in embedded or device driver programming.
If there is a reason that you must control of the argument passing mechanism, such as interfacing to another language, Fortran 2003 provides the ISO C Binding (which is already widely implemented). With this you can match the C method of argument passing.
I'm not sure I understand your question, but one way of seeing things is that as long as one writes standard-conforming code, one does not need to be concerned with how argument passing is implemented. The compiler makes sure (well, barring compiler bugs, of course) that arguments are passed in a way that conforms to the standard.
Now, if you play tricks e.g. with C interop or such, say by comparing argument addresses you might be able to determine whether the compiler uses pass by reference, copy-in/copy-out or something else in some particular case. But generally speaking, at that point you're outside the scope of the standard and all bets are off.