Should this use of nullptr produce a compiler error? - c++

Is there a good reason why this code compiles without warning (and crashes when run) with Visual C++ 2010:
int a = *((int*)nullptr);
Static analysis should conclude that it will crash, right?

Should this use of nullptr produce a compiler error?
No.
Dereferencing a null pointer results in undefined behavior, but no diagnostic is required.
Static analysis should conclude that it will crash, right?
It might. It doesn't have to. It would certainly be nice if a warning was issued. A dedicated static analysis tool (Klocwork, for example) would probably issue a warning.

Yes, static analysis would show this to always crash. However, this would require the compiler to actually perform this static analysis. Most compilers do not do this (at least none I know of).
So the question is: Why don't C/C++ compilers do more static type checking.
The reason the compiler does not do this is mostly: tradition, and a philosophy of making the compiler as simple as possible.
C (and to a lesser degree C++) were created in an environment where computing power was fairly expensive, and where ease of writing a compiler was important (because there were many different HW architectures).
Since static typechecking analysis will both make a compiler harder to write, and make it compile more slowly, it was not felt at the time to be a priority. Thus most compilers don't have it.
Other languages (e.g.) Java make different tradeoffs, and thus in Java many things are illegal that are allowed in C (e.g. unreachable code is a compile-time error in Java; in C most compilers don't even warn). This really boils down to philosophy.
BTW, note that you can get static typechecking in C if you want it - there are several tools available, e.g. lint (ancient), or see What open source C++ static analysis tools are available? .

Related

In standard Fortran grammar, can we specify some unused variant?

To find out all the (possible) problems that existed in the program, we had better turn on all the debug tools of the compiler. The tool will always tell us something like "remark #7712: This variable has not been used.".
In many cases, in order to keep some rules, I have to keep some input and output without using them. At the same time, I want to keep the debug tool turned on.
Can we do something by standard grammar to tell the compiler we really mean to do it and do not report any warning about it?
The Fortran standard sets out the rules for correct programs and requires that compilers identify any breach of those rules. Such breaches, which cause compilation to fail, are generally known as errors.
However, programmers make many mistakes which are not errors and which a (Fortran) compiler is not required to spot. Some compilers provide additional diagnostic capabilities, such as identifying unused variables, which go beyond what the standard requires. The compilers raise what are generally known as warnings in these cases. This type of mistake does not cause compilation to fail. Compilers also generally provide some means to determine which warnings are raised during compilation, so that you can switch off and on this diagnostic capability. For details of these capabilities refer to your compiler's documentation.
The standard is entirely silent on this type of mistake so, if I understand the question correctly, there is nothing
by standard grammar to tell the compiler we really mean to do it and
do not report any warning about it
The simplest thing (besides of course not declaring things you don't use)
may be to simply use the variables.
real x
x=huge(x) !reminder x is declared but not used.
at least makes gfortran happy that you have "used" the variable.

Deal with project which may contain undefined behaviour

I would like advice how to proceed in such situation.
Imagine I have large C++ project which works well.
I have suspicion there might be some UB in this code (because in different project written by same author I found UB).
Now, say I need to add new features to this project.
I am afraid because:
if I recompile with new compiler this can increase risk of UB happening if in the code is UB already. (e.g. new compiler might not be OK with UB which the old compiler was fine with).
Is it realistic to eliminate all UB in this large project by eye inspection (before I move to adding new feature)??
If not, then I should at least compile with same version of compiler right? (to decrease chance of problems if there is UB).
Project is done in Visual Studio so I don't know if there are object files, in which case, I could leave object files same and only modify parts in files where I need to add something - thus again minimizing risk of UB.
What is the course of action in such situation? I think this could be pretty common scenario.
I like suggestion that I test the project using new compiler before adding new code, but even then - we know testing might not reveal UB, isn't it?
In order, I would:
Compile with -Wall (/W4 for you Windows folk) and fix errors.
Write tests if there aren't any already.
Use tools like valgrind to detect issues and fix them.
Study synchronization primitives if in use, and use modern paradigms where possible.
Document the code and adhere to a style guide.
I would not attempt to avoid problems by keeping object files around. That's a nightmarish maintenance problem.
Undefined Behavior = Bugs
It's impossible to prove that a project is bug-free. Even the best programmers do create bugs. Even the best code-review cannot eliminate all bugs in a project. No, it's not realistic to eliminate all UB in a project of some size by code inspection or by any other means. Your best option is to review the code and eliminate as many as possible.
Change your perception of UB (bugs): If you encounter a bug during your re-engineering efforts, it's a good thing! You are in the best position to remove one UB.
Don't keep the old compiler just because you are afraid of UB. Recompile the project with the latest and best compiler available. Compilers can also have bugs. Newer compilers will produce better, more robust code. Newer compilers will produce better warnings. Use all warnings possible -Wall.
Eliminate all the warnings that the compiler produces. Every single warning is there for a reason, it highlights a problem. The likelihood of a "false positive" is quite dim nowadays. This is even true for MSVC (I'm not talking about real old compilers like before VC 2005)
Use a static code checker (Cppcheck). It can point you to common problems with the code.
Use a custom rule set for your code checker. It will help you to get the code up to some standard.
If possible, compile the project with another compiler (GCC, Clang) just for the sake of getting the warnings of these compilers.
Don't link against old object files. This will create more problems than what you think it avoids
As others said: First and foremost, try to find the errors, not hide them.
The first and simplest measure is to set the warning level to /W4 (you can try Wall, but due to the large amount of noise this will produce (e.g. from standard headerfiles), it is usually only of help if you know you have an error in a certain part of your code)
Use static analyzers - you can start with the builtin Code Analysis tool and then go for external tools (which are usually much more difficult to set up correctly for a non-trivial project).
Write lots of tests and make sure, you are exercising edge cases - thats where UB usually lurks.
If possible, try to compile the project (or parts of it) under clang and activate the different sanitizers (in particular there is UndefinedBehaviorSanitizer) which will further instrument your code to check for UB (only helpfull if you have tests to exercise that UB though)
Test your code at different optimization levels and combination of flags (in VS, especially _ITERATOR_DEBUG_LEVEL can be helpfull to find out-of-bounds errors)
I'd say any non-trivial code base potentially contains undefined behavior. What is special about that particular Programmer? If he/she is prone to a special kind of UB, then you can focus your efforts on this.

Is it possible to list loads due to potential aliasing violations?

I have been working on a sizeable performance critical code base, where compiling with the latest versions of gcc give numerous warnings about type punning, leading me to compile with -fno-strict-aliasing. I don't believe there is any performance loss here that can be avoided anyway. I do however believe that there may be rather more significant issues with aliasing pointers of the same type.
Is there any way to get gcc, or any other tool to list all places in a code base where additional loads / stores are taking place due to potential aliasing violations that gcc can't detect, whether pointers are of the same type or not? That way, I could go compare with a code profiler and see if the situation can be improved in places where it actually matters by the use of restrict, local variables, refactoring etc. Trying to guess what the compiler is thinking through looking at generated assembler is both time consuming and error prone, particularly for this. I'm interested in answers for both C and C++ if they are different.
GCC Debug options
Try out -ftree-dump-alias (search for alias in above link).

Are the Optimization Keywords in C and C++ Reasonable?

So we've all heard the don't-use-register line, the reasoning being that trying to out-optimize a compiler is a fool's errand.
register, from what I know, doesn't actually state anything about CPU registers, just that a given variable can't be referenced indirectly. I'll hazard a guess that it's often referred to as obsolete because compilers can detect a lack of addressing automatically thus making such optimizations transparent.
But if we're firm on that argument, can't it be levelled at every optimization-driven keyword in C? Why do we use inline and C99's restrict for example?
I suppose that some things like aliasing make deducing some optimizations hard or even impossible, so where is the line drawn before we start venturing into Sufficiently Smart Compiler territory?
Where should the line should be drawn in C and C++ between spoon-feeding a compiler optimization information and assuming it knows what it's doing?
EDIT: Jens Gustedt pointed out that my conflating of C and C++ isn't right since two of the keywords have semantic differences and one doesn't exist in standard C++. I had a good link about register in C++ which I'll add if I find it...
I would agree that register and inline are somewhat similar in this respect. If the compiler can see the body of the callee while compiling a call site, it should be able to make a good decision on inlining. The use of the inline keyword in both C and C++ has more to do with the mechanics of making the body of the function visible than with anything else.
restrict, however, is different. When compiling a function, the compiler has no idea of what the call sites are going to be. Being able to assume no aliasing can enable optimizations that would otherwise be impossible.
inline is used in the scenario where you implement a non-templated function within the header then include it from multiple compilation units.
This ensures that the compiler should create just one instance of the function as though it were inlined, so you do not get a link error for multiply defined symbol. It does not however require the compiler to actually inline it.
There are GNU flags I think force-inline or similar but that is a language extension.
register doesn't even say that you can't reference the
variable indirectly (at least in C++). It said that in the
original C, but that has been dropped.
Whether trying to out-optimize the compiler is a fool's errand
depends on the optimization. Not many compilers, for example,
will convert sin(x) * sin(x) + cos(x) * cos(x) into 1.
Today, most compilers ignore register, and no one uses it,
because compilers have become good enough at register allocation
to do a better job than you can with register. In fact,
respecting register would typically make the generated code
slower. This is not the case for inline or restrict: in
both cases, there exist techniques, at least theoretically,
which could result in the compiler doing a better job than you
can. Such techniques are not widespread, however, and (as far
as I know, at least), have a very high compile time overhead,
with in some cases compile times which grow exponentially with
the size of the program (which makes them more or less unusable
on most real programs—compile times which are measured in
years really aren't acceptable).
As to where to draw the line... it changes in time. When
I first started programming in C, register made a significant
difference, and was widely used. Today, no. I imagine that in
time, the same may happen with inline or restrict—some
experimental compilers are very close with inline already.
This is a flame-bait question but I will dive in anyway.
Compilers are a lot better at optimising that your average programmer. There was a time I programmed on a 25MHz 68030 and I got some advantage from the use of register because the compiler's optimizer was so poor. But that was back in 1990.
I see inline as just as bad as register.
In general, measure first before you modify. If you find that you code performs so poorly you want to use register or inline, take a deep breath, stand back and look for a better algorithm first.
In recent times (i.e. the last 5 years) I have gone through code bases and removed inline functions galore with no perceptible change in performance being visible. Code size, however, always benefits from the removal of inline methods. That isn't a big issue for your standard x86-style monster multicore marvel of the modern age but it does matter if you work in the embedded space.
It is a moving target, because compiler technology is improving. (Well, sometimes it is more changing than improving, but that has some of the same effect of rendering your optimization attempts moot, or worse.)
Generally, you should not guess at whether an optimization keyword or other optimization technique is good or not. One has to learn quite a bit about how computers work, including the particular platform you are targeting, and how compilers work.
So a rule about using various optimization techniques is to ask do I know the compiler will not do the best job here? Am I willing to commit to that for a while—will the compiler remain stable while this code is in use, am I willing to rewrite the code when the compiler changes this situation? Typically, you have to be an experienced and knowledgeable software engineer to know when you can do better than the compiler. It also helps if you can talk to the compiler developers.
This means people cannot give you an answer here that has a definite guideline. It depends on what compiler you are using, what your project is, what your resources are, and what your goals are, and so on.
Although some people say not to try to out-optimize the compiler, there are various areas of software engineering where people do better than a compiler and in which it is worth the expense of paying people for this.
The difference is as follows:
register is very local optimization (i.e. inside one function). The register allocation is a relatively solved problem both by smarter compilers and by larger number of register (mostly the former but say x86-64 have more registers then x86 and both have larger number then say 8-bit processor)
inline is harder as it is inter-procedure optimization. However as it involves relatively small depth of recursion and small number of procedures (if inlined procedure is too big there is no sense of inlining it) it may be safely left to the compiler.
restrict is much harder. To fully know the that two pointers don't alias you would need to analyse whole program (including libraries, system, plug-ins etc.) - and even then run into problems. However the information is clearer for programmer AND it is part of specification.
Consider very simple code:
void my_memcpy(void *dst, const void *src, size_t size) {
for (size_t i = 0; i < size; i++) {
((char *)dst)[i] = ((const char *)str)[i];
}
}
Is there a benefit to making this code efficient? Yes - memcpy tend to be very useful (say for copying GC). Can this code be vectorized (here - moved by words - say 128b instead of 8b)? Compiler would have to deduce that dst and src does not alias in any way and regions pointed by them are independent. size may depend on user input or runtime behaviour or other elements which makes the analysis practically impossible - similar problems to Halting Problem - in general we cannot analyse everything without running it. Or it might be part of C library (I assume shared libraries) and is called by program hence all call sites are not even known at compile time. Without such analysis the program would exhibit different behaviour with optimization on. On the other hand programmer might ensure that they are different objects simply by knowing the (even higher-level) design instead of need for bottom-up analysis.
restrict can also be part of documentation as it might be programmer who wrote the procedure in a way that it cannot handle 2 aliasing pointers. For example if we want to copy memory from aliasing locations the above code is incorrect.
So to sum up - Sufficiently Smart Compiler would not be able to deduce the restrict (unless we move to compilers understending the meaning of code) without knowing the whole program. Even then the it would be close to undecidability. However for local optimization the compilers are already sufficiently smart. My guess it that Sufficiently Smart Compiler with whole program analysis would be able to deduce in many interesting cases however.
PS. By local I mean single function. So local optimization cannot assume anything about arguments, global variables etc.
One thing that hasn't been mentioned is that many non-x86 compilers aren't nearly as good at optimizing as gcc and other "modern" C-compilers are.
For instance, the compilers for PIC are absolutely terrible at optimizing. Also, the optimizer for cicc (the CUDA compiler), though much better, still seems to miss a lot of fairly simple optimizations.
For these cases, I've found optimization hints like register, inline, and #pragma unroll to be extremely useful.
From what I have seen back in the days I was more involved with C/C++, these are merely orders directly given to the compiler. Compiler may try to inline a function even if it is not given the direct order to do so. That really depends on the compiler and may even raise some cross-compiler issues. As an example, visual studio provides different levels of optimization which correspond to the different intelligence levels of the compiler. I have read that all class functions are implicitly inline to give compiler a hint to minimize function call overhead. In any case, these directives are extremely helpful when you are using a less intelligent compiler while in intelligent cases, they may be very obvious for the compiler to do some optimization.
Also, be sure that these keywords are guaranteed to be safe. Some compiler optimizations may not work with some libraries such as OpenGL (as I have seen it myself). So in cases where you feel that compiler optimization may be harmful, you can use these keywords to make sure it is done the way you want it to.
The compilers such as g++ these days optimize the code very well. You might as well search for optimization elsewhere, maybe in the methods and algorithm you use or by using TBB or CUDA to make your code parallel.

How to spot undefined behavior

Is there any way to know if you program has undefined behavior in C++ (or even C), short of memorizing the entire spec?
The reason I ask is that I've noticed a lot of cases of programs working in debug but not release being due to undefined behavior. It would be nice if there were a tool to at least help spot UB, so we know there's the potential for problems.
Good coding standards. Protect you from yourself. Here are some ideas:
The code must compile at the highest warning level... without warnings. (In other words, your code must not set off any warnings at all when set to the highest level.) Turn on the error on warning flag for all projects.
This does mean some extra work when you use other peoples' libraries since they may not have done this. You will also find there are some warnings which are pointless... turn those off individually as your team decides.
Always use RAII.
Never use C style casts! Never! - I think there's like a couple rare cases when you have to break this but you will probably never find them.
If you must reinterpret_cast or cast to void then use a wrapper to make sure you're always casting to/from the same type. In other words, wrap your pointer/object in a boost::any and cast a pointer to it into whatever you need and on the other side do the same. Why? Because you will always know what type to reinterpret_cast from and the boost::any will enforce that you've cast to the correct type after that. It's the safest you can get.
Always initialize your variables at the point of declaration (or in constructor initializers when in a class).
There are more but those are some very important ones to start with.
Nobody can memorize the standard. What we intermediate to advanced C++ programmers do is use constructs we know are safe and protect ourselves from our human nature... and we don't use constructs that are not safe unless we have to and then we take extra care to make sure the danger is all wrapped up in a nice safe interface that is tested to hell and back.
One important thing to remember which is universal across all languages is to:
make your constructs easy to use correctly and difficult to use incorrectly
It's not possible to detect undefined behavior in all cases. For example, consider x = x++ + 1;. If you're familiar with the language, you know it's UB. Now, *p = (*p)++ + 1; is obviously also UB, but what about *q = (*p)++ + 1;? That's UB if q == p, but other than that it's defined (if awkward-looking). In a given program, it might well be possible to prove that p and q will never be equal when reaching that line, but that can't be done in general.
To help spot UB, use all of the tools you've got. Good compilers will warn for at least the more obvious cases, although you may have to use some compiler options for best coverage. If you have further static analysis tools, use them.
Code reviews are also very good for spotting such problems. Use them, if you've got more than one developer available.
Static code analysis tools such as PC-Lint can help a lot here
Well, this article covers most aspects..
I think you can use one tool from coverity to spot bugs which are going to lead to undefined behavior.
I guess you could use theorem provers (i only know Coq) to be sure your program does what you want.
clang tries hard to produce warnings when undefined behavior is encountered.
I'm not aware of any software tool to detect all forms of UB. Obviously using your compiler's warnings and possibly lint or another static code checker can help a lot.
The other thing that helps a lot is simply experience: The more you program the language, the more you'll see constructs that appear suspect and be able to catch them earlier in the process.
Unfortunately, there is no way way to detect all UB. You'd have to solve the Halting Problem to do that.
The best you can do is to know as many of the rules as possible, look it up when you're in doubt, and check with other programmers (through pair programming, code reviews or just SO questions)
Compiling with as many warnings as possible, and under multiple compilers can help. And running the code through static analysis tools such as Valgrind can detect many issues.
But ultimately, no tool can detect it all.
An additional problem is that many programs actually have to rely on UB. Some API's require it, and just assume that "it works on all sane compilers". OpenGL does that in one or two cases. The Win32 API won't even compile under a standards compliant compiler.
So even if you had a magic UB-detecting tool, it would still be tripped up by the cases that aren't under your control.
Simple: Don't do things that you don't know that you can do.
When you are unsure or have a fishy feeling, check the reference
A good compiler, such as the Intel C++ compiler, should be able to spot 99% of cases of undefined behaviour. You'll need to investigate the flags and switches to use. As ever, read the manual.