Tool for finding C-style Casts - c++

Does anyone know of a tool that I can use to find explicit C-style casts in code? I am refactoring some C++ code and want to replace C-style casts where ever possible.
An example C-style cast would be:
Foo foo = (Foo) bar;
In contrast examples of C++ style casts would be:
Foo foo = static_cast<Foo>(bar);
Foo foo = reinterpret_cast<Foo>(bar);
Foo foo = const_cast<Foo>(bar);

If you're using gcc/g++, just enable a warning for C-style casts:
g++ -Wold-style-cast ...

Searching for the regular expression \)\w gives surprisingly good results.

The fact that such casts are so hard to search for is one of the reasons new-style casts were introduced in the first place. And if your code is working, this seems like a rather pointless bit of refactoring - I'd simply change them to new-style casts whenever I modified the surrounding code.
Having said that, the fact that you have C-style casts at all in C++ code would indicate problems with the code which should be fixed - I wouldn't just do a global substitution, even if that were possible.

The Offload C++ compiler supports options to report as a compile time error all such casts, and to restrict the semantics of such casts to a safer equivalence with static_cast.
The relevant options are:
-cp_nocstylecasts
The compiler will issue an error on all C-style casts. C-style casts in C++ code can potentially be unsafe and lead to undesired or undefined behaviour (for example casting pointers to unrelated struct/class types). This option is useful for refactoring to find all those casts and replace them with safer C++ casts such as static_cast.
-cp_c2staticcasts
The compiler applies the more restricted semantics of C++ static_cast to C-style casts. Compiling code with this option switched on ensures that C-style casts are at least as safe as C++ static_casts
This option is useful if existing code has a large number of C-style casts and refactoring each cast into C++ casts would be too much effort.

A tool that can analyze C++ source code accurately and carry out automated custom changes (e.g., your cast replacement) is the DMS Software Reengineering Toolkit.
DMS has a full C++ parser, builds ASTs and symbol tables, and can thus navigate your code to reliably find C style casts. By using pattern-directed matches and rewrites, you can provide a set of rules that would convert all such C-style casts into your desired C++ equivalents.
DMS has been used to carry out massive automated C++ reengineering tasks for Boeing and General Dynamics, each involving thousands of files.

One issue with C-style casts is that, since they rely on parentheses which are way overloaded, they're not trivial to spot. Still, a regex such as (e.g. in Python syntax):
r'\(\s*\w+\s*\)'
is a start -- it matches a single identifier in parentheses with optional whitespace inside the parentheses. But of course that won't catch, e.g., (void*) casts -- to get trailing asterisks as well,
r'\(\s*\w+[\s*]*\)'
You could also start with an optional const to broaden the net still further, etc, etc.
Once you have a good RE, many tools (from grep to vim, from awk to sed, plus perl, python, ruby, etc) lets you apply it to identify all of its matches in your source.

If you use some kind of hungarian style notation (e.g. iInteger, pPointer etc.) then you can search for e.g. )p and ) p and so on.
It should be possible to find all those places in reasonable time even for a large code base.

I already answered once with a description of a tool that will find and change all the casts if you want it to.
If all you want to do is find such casts, there's another tool that will do this easily, and in fact is the extreme generalization of all the "regular expression" suggestions made here. That is the SD Source Code Search Engine. This tool enables one to search large code bases in terms of the language elements that make up each language. It provides a GUI allowing you enter queries, see individual hits, and show the file text at the hit point with one mouse click. One more click and you can be in your editor [for many editors] on a file. The tool will also record a list of hits in context so you can revisit them later.
In your case, the following search engine query is likely to get most of the casts:
'(' I ')' | '(' I ... '*' ')'
which means, find a sequence of tokens, first being (, second being any identifier, third being ')', or a similar sequence involving something that ends in '*'.
You don't specify any whitespace management, as the tool understands the language whitespace rules; it will even ignore a comment in the middle of a cast and still match the above.
[I'm the CTO at the company that supplies this.]

I used this regular expression in Visual Studio (2010) Find in files search box: :i\):i
Thanks to sth for the inspiration (his post)

Related

Simpler c++ template compile error output

When working with templates in C++ any errors cause the compiler to emit a lot of output. In most cases when I am working on something most of that information is noise and I have to scroll around looking for the info I am interested in, for example:
Every template candidate is listed. I rarely have use for this long list and it just clutters the output.
Aliases for template specializations are expanded, e.g. std::string is written as std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, or expanded typedefs / alias declarations. I would prefer to have them unexpanded for easier reading.
Is it possible in either g++ or clang to reduce any of this for shorter/simpler output?
Obviously the information can be important, but then I would prefer to compile again with more verbosity and keep it short and simple by default.
Unfortunately there's no way to deal with this currently. C++20 solves this problem by introducing concepts, where templates can have abstract definitions that are restricted with everything except for their binary layout. Violating these definitions will provide simple errors.
Currently, I dig into these lines and I got used to it. I'm currently dealing with a program with 5 template parameters at places. It's all about getting used to it and training your eyes to parse the content.
However, if you're really stuck, one solution I may suggest is that you copy all the relevant error output to some editor, and do a find-and-replace to simplify individual expressions, making them smaller and smaller with every replace until it becomes readable for you. Good skills in regex may help as well. In Notepad++ (or Notepadqq on linux), you can find regular expressions and use capture groups in the replacement with \1 for first capture group, \2 for second, etc.
So, bottom line: Until C++20, there's no clean solution for this except what you invent yourself.

Should I really massively introduce the explicit keyword?

When I used the (recently released) Cppcheck 1.69 on my code1, it showed a whole lot of messages where I expected none. Disabling noExplicitConstructor proved that all of them were of exactly this kind.
But I found that I'm not the only one with a lot of new Cppcheck messages, look at the results of the analysis of LibreOffice (which I'm allowed to show in public):
What would an experienced programmer do:
Suppress the check?
Massively introduce the explicit keyword?
1 This is of course not my code but code I have to work at work, it's legacy code: a mix of C and C++ in several (pre-)standard flavors (let's say C++98), and it's a pretty large code base.
I've been bitten in the past by performance hits introduced by implicit conversions as well as outright bugs. So I tend to always use explicit for all constructors that I do not want to participate in implicit conversions so that the compiler can help me catch my errors - and I then try to always also add a "// implicit intended" comment to the ctors where I explicitly intend for them to be used as converting ctors implicitly. I find that this helps me write more correct code with fewer surprises.
… So I'd say "yes, go add explicit" - in the long run you'll be glad you did - that's what I did when I first learned about it, and I'm glad I did.

What does "statically typed" and "free-form" mean for C++?

In the C++ tag wiki, it is mentioned that
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language.
Can someone please explain the terms "statically typed" and "free-form"?
Thanks.
A statically-typed language is a language where every variable has a type assigned to it at compile-time. In C++, this means that you must tell the compiler the type of each variable - that is, whether it's an int, or a double, or a string, etc. This contrasts with dynamically-typed languages like JavaScript or PHP, where each variable can hold any type, and that type can change at runtime.
A free-form language is one where there are no requirements about where various symbols have to go with regard to one another. You can add as much whitespace as you'd like (or leave out any whitespace that you don't like). You don't need to start statements on a new line, and can put the braces around code blocks anywhere you'd like. This has led to a few holy wars about The Right Way To Write C++, but I actually like the freedom it gives you.
Hope this helps!
"Statically typed" means that the types are checked at compile-time, not run-time. For example, if you write a class that does not have a foo() method, then you'll get a compile-time error if you try to call foo() on an object of that class. In dynamically-typed languages (e.g. Ruby), you would still get an error, but only at run-time.
"Free-form" means that you can use whitespace however you want (i.e. write the whole program on one line, use uneven indenting, put lots of blank lines, etc.). This is in contrast to languages like Python where whitespace is semantically significant.
Statically typed: the compiler knows what the types of all variables are. In contrast to languages like Python and Common Lisp, where the types of variables can change at runtime.
Free-form: no specific whitespace requirements. This is in contrast to old-style FORTRAN and COBOL, so I'm not sure how useful this designation is anymore.

Expressions with no side effects in C++

See, what I don't get is, why should programs like the following be legal?
int main()
{
static const int i = 0;
i < i > i;
}
I mean, surely, nobody actually has any current programs that have expressions with no side effects in them, since that would be very pointless, and it would make parsing & compiling the language much easier. So why not just disallow them? What benefit does the language actually gain from allowing this kind of syntax?
Another example being like this:
int main() {
static const int i = 0;
int x = (i);
}
What is the actual benefit of such statements?
And things like the most vexing parse. Does anybody, ever, declare functions in the middle of other functions? I mean, we got rid of things like implicit function declaration, and things like that. Why not just get rid of them for C++0x?
Probably because banning then would make the specification more complex, which would make compilers more complex.
it would make parsing & compiling the
language much easier
I don't see how. Why is it easier to parse and compile i < i > i if you're required to issue a diagnostic, than it is to parse it if you're allowed to do anything you damn well please provided that the emitted code has no side-effects?
The Java compiler forbids unreachable code (as opposed to code with no effect), which is a mixed blessing for the programmer, and requires a little bit of extra work from the compiler than what a C++ compiler is actually required to do (basic block dependency analysis). Should C++ forbid unreachable code? Probably not. Even though C++ compilers certainly do enough optimization to identify unreachable basic blocks, in some cases they may do too much. Should if (foo) { ...} be an illegal unreachable block if foo is a false compile-time constant? What if it's not a compile-time constant, but the optimizer has figured out how to calculate the value, should it be legal and the compiler has to realise that the reason it's removing it is implementation-specific, so as not to give an error? More special cases.
nobody actually has any current
programs that have expressions with no
side effects in them
Loads. For example, if NDEBUG is true, then assert expands to a void expression with no effect. So that's yet more special cases needed in the compiler to permit some useless expressions, but not others.
The rationale, I believe, is that if it expanded to nothing then (a) compilers would end up throwing warnings for things like if (foo) assert(bar);, and (b) code like this would be legal in release but not in debug, which is just confusing:
assert(foo) // oops, forgot the semi-colon
foo.bar();
things like the most vexing parse
That's why it's called "vexing". It's a backward-compatibility issue really. If C++ now changed the meaning of those vexing parses, the meaning of existing code would change. Not much existing code, as you point out, but the C++ committee takes a fairly strong line on backward compatibility. If you want a language that changes every five minutes, use Perl ;-)
Anyway, it's too late now. Even if we had some great insight that the C++0x committee had missed, why some feature should be removed or incompatibly changed, they aren't going to break anything in the FCD unless the FCD is definitively in error.
Note that for all of your suggestions, any compiler could issue a warning for them (actually, I don't understand what your problem is with the second example, but certainly for useless expressions and for vexing parses in function bodies). If you're right that nobody does it deliberately, the warnings would cause no harm. If you're wrong that nobody does it deliberately, your stated case for removing them is incorrect. Warnings in popular compilers could pave the way for removing a feature, especially since the standard is authored largely by compiler-writers. The fact that we don't always get warnings for these things suggests to me that there's more to it than you think.
It's convenient sometimes to put useless statements into a program and compile it just to make sure they're legal - e.g. that the types involve can be resolved/matched etc.
Especially in generated code (macros as well as more elaborate external mechanisms, templates where Policies or types may introduce meaningless expansions in some no-op cases), having less special uncompilable cases to avoid keeps things simpler
There may be some temporarily commented code that removes the meaningful usage of a variable, but it could be a pain to have to similarly identify and comment all the variables that aren't used elsewhere.
While in your examples you show the variables being "int" immediately above the pointless usage, in practice the types may be much more complicated (e.g. operator<()) and whether the operations have side effects may even be unknown to the compiler (e.g. out-of-line functions), so any benefit's limited to simpler cases.
C++ needs a good reason to break backwards (and retained C) compatibility.
Why should doing nothing be treated as a special case? Furthermore, whilst the above cases are easy to spot, one could imagine far more complicated programs where it's not so easy to identify that there are no side effects.
As an iteration of the C++ standard, C++0x have to be backward compatible. Nobody can assert that the statements you wrote does not exist in some piece of critical software written/owned by, say, NASA or DoD.
Anyway regarding your very first example, the parser cannot assert that i is a static constant expression, and that i < i > i is a useless expression -- e.g. if i is a templated type, i < i > i is an "invalid variable declaration", not a "useless computation", and still not a parse error.
Maybe the operator was overloaded to have side effects like cout<<i; This is the reason why they cannot be removed now. On the other hand C# forbids non-assignment or method calls expresions to be used as statements and I believe this is a good thing as it makes the code more clear and semantically correct. However C# had the opportunity to forbid this from the very beginning which C++ does not.
Expressions with no side effects can turn up more often than you think in templated and macro code. If you've ever declared std::vector<int>, you've instantiated template code with no side effects. std::vector must destruct all its elements when releasing itself, in case you stored a class for type T. This requires, at some point, a statement similar to ptr->~T(); to invoke the destructor. int has no destructor though, so the call has no side effects and will be removed entirely by the optimizer. It's also likely it will be inside a loop, then the entire loop has no side effects, so the entire loop is removed by the optimizer.
So if you disallowed expressions with no side effects, std::vector<int> wouldn't work, for one.
Another common case is assert(a == b). In release builds you want these asserts to disappear - but you can't re-define them as an empty macro, otherwise statements like if (x) assert(a == b); suddenly put the next statement in to the if statement - a disaster! In this case assert(x) can be redefined as ((void)0), which is a statement that has no side effects. Now the if statement works correctly in release builds too - it just does nothing.
These are just two common cases. There are many more you probably don't know about. So, while expressions with no side effects seem redundant, they're actually functionally important. An optimizer will remove them entirely so there's no performance impact, too.

C++ if statement alternatives

Is it me, or does it seem like C++ asks for more use of the 'if' statement then C#?
I have this codebase and it contains lots of things such as this:
if (strcmp((char*)type,"double")==0)
I wondered isn't it a bit of a 'code smell' when there's just too many if statements?
I'm not saying there bad, but things like string comparisons, with lots of strings involved, can't they be done differently?
Is there an alternative to just writing sequences of if statements?
THIS IS JUST AN EXAMPLE, IT CAN BE ANY KIND OF IF STATEMENTS
instead of:
if (string a == "blah") then bla
if (string b == "blah") then blo
The reason you do if (strcmp((char*)type,"double")==0) is because you can't make "double" a case-expression and use a switch statement. That said, if you're doing a lot of these kinds of string matches, you may want to look at using a std::map<std::string, int> or something similar and then use the map to convert the string to an index which you then feed to switch.
Personally, in these cases, I'm a fan of things like std::map<std::string, int (Handler::*)(void)>, which lets me create a handler map of class methods, but YMMV.
EDIT: I forgot to mention: the other sweet thing about having a map of strings to methods is that you can alter (usually add to) it at run time. For example, a parser could change its list of keywords and their handlers at runtime after it knows what kind of file it's parsing.
This is code smell.
To minimize it, you should (in this case) use std::strings. Your code then becomes:
#include <string>
// [...]
std::string type = "whatever";
// [...]
if (type == "double")
This is almost identical to the C# equivalent: to compile this sample code in C# code just remove the include and the std::.
Usually, if you find code that uses char* directly in C++ it's usually doing it wrong (except maybe for some rare exceptions).
Edit: Mike DeSimone addressed the problem of further refactoring this in his answer (so I won't mention it here :) ).
I don't think C++ requires any more "ifs" than C#. The number of if statements in a program is really just a matter of coding style. You can always eliminate ifs through techniques like polymorphism, table driven methods, and so on. These same techniques are available in both C++ and C#. If there is a difference between programs written in these two languages, I suspect it has to do with the mentality of C# vs C++ programmers.
Note that I don't necessarily recommend "if" elimination. In my experience, if statements tend to be clearer than the alternatives. To directly address your second point: the way to eliminate chained string comparisons like that is to use a DFSA. Most of the time, however, string comparisons are perfectly suitable.
It's not something I've noticed; I've done 10 years C++ and 4 years of C# too!
Surely the number of if's relates to the design of your code rather than a difference between C# and C++?
To get rid of conditional expressions in either language you can consider the Inversion of Control pattern. It has the side effect of lessening those.
Based on the nature of 'bla' and 'blo' you can always try to use a std::map, with the strings as keys.
Too many if statement are code smell if you can replace them by a switch...case. Otherwise, I don't see the problem with using if.
Maybe you have used more event-driven programming in C#, while your C++ code is more sequential ?
There are better ways to implement a string parser than an endless set of if (strcmp...) statements.
One approach could be a map between strings and function pointers or functor objects.
Another design could involve a chain of responsibility pattern where the string is passed to a chain of objects that decide if they have a match or to pass it along.
I'm not aware of anything about C++ that makes it more prone to "if abuse" than any other language.