Remove exception specifications from C++ code with sed - c++

I want to automatically remove deprecated exception specifications from my c++ code and try to use sed for this task.
Exception specification format is throw following with list of exceptions (words) between parenthesis so I wrote this sed:
sed -r 's,throw\s*[(].*[)],,g' foo.cpp
It works for oneline specifications but does not work for multiline one's.
It seems like dot does not match newlines althougth according to documentation it have to: https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html
I tried this workaround but it does not work either (actually it does not even work for oneline specifications):
sed -r 's,throw\s*[(][\s\S]*[)],,g'
How to make it work properly?
EDITED:
example of exception spec:
void foo() throw (std::runtime_error); //oneline
void bar() throw (std::runtime_error,
std::logic_error); //multiline

Many text editors (for example jEdit) support multi-file regex search & replace.
However, there is no syntactic distinction between a throw specification and a throw expression throwing a parenthesized variable. The two are mostly distinguished primarily by not appearing in the same syntactic context. You could also distinguish them by resolving the name. But that won't work to distinguish the throw expression throw(foo()), which throws a default-constructed object of type foo, and the throw specification throw(foo()), which makes the absurd but technically valid claim that the annotated function may throw an exception of type "function that takes no arguments and returns a foo".
If you want a reliable way of stripping exception specifications, the best way would be to write a Clang Tidy check.

There is a clang-tidy check for this already.
I got it to replace the throw specifications from our source with noexcept(false) with this commandline
clang-tidy --fix --checks=-*,modernize-use-noexcept foo.cpp -- -I /my/include/path
Compiler options such as include paths and defines need to go after the -- . For further documentation see
https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-noexcept.html

Related

Is there a GCC warning to catch if statement and operation on same line

Is there a GCC warning I can turn on that can catch it if I have an if-statement followed by an operation on the same line, like in this example
if ( ReadOnly == accessMode ) readFile();
I want to use this to enforce a coding standard.
I don't think there is a gcc warning, since that line is perfectly legal in either C or C++. In Linux, you can use the grep command to find these lines in your .cpp files.
grep -n -e "^\s\+if(.*;$" -e "^\s\+if\s\+(.*;$" *.cpp
Or simply
grep -n "^\s\+if\s\{0,3\}(.*;$" *.cpp
The $ in the above line means end-of-line, can be removed to match more results.
^ matches start-of-line.
\s\+ matches one or more spaces.
\s\{0,3\} match 0 to 3 spaces.
.* matches everything.
The above grep commands don't find break lines, such as
if( readOnly == access )
readFile();
If the objective is to enforce a coding "standard", like some kind of coding style, I would suggest to use a tool for that purpose.
The compiler, although being able to emit diagnostics for some "code smells", they are usually related to code behavior, possible UB or other misuses of the language, not so related to coding style. For instance, it can emit a diagnostics for the following "style" misuse:
if (x)
doSomething();
doSomethingElse(); //Diagnostic: this line is not protected by the if.
But these diagnostics are limited to very few very obviously wrong code.
Regexes, while probably solving your current issue, will fall short for a general style enforcement.
So, I think the ideal way would be to use a tool designed for enforcing coding style. There are plenty of such tools. I would suggest either:
clang-format: widely used and supported.
uncrustify: tons of options to define your own style.
In most editors you can enable them to be run "on saving", restyling your code to fit your style.
As LoPiTaL suggests, use a tool designed to enforce coding standards. The compiler is ill-suited for this purpose.
For your specific ask, clang-format has the option AllowShortIfStatementsOnASingleLine that you can use.
https://clang.llvm.org/docs/ClangFormatStyleOptions.html
The answer is, "No, there is no such command line option."

Make sphinx accept invalid C++ signatures

I'm trying to solve this bug report, where the documentation for a C++ library has some signature with ellipsis (... or ??) for places where the developers don't want to dive into the specifics (C++ metaprogramming is way too verbose); for example Tk_expr Ltuple<T1, ..., Tn>_expr::get<k>() const or Fmpz_expr::ternary operation(??, ??) const should just work.
If the documentation doesn't declare the language domain is C++, sphinx complains. If it does, sphinx complains it's invalid C++... I'm not sure if the information about Gentoo is important there.
Trivial fix : put complete signatures. Unreadable!
How can one use ellipsis?
Sphinx uses Pygments for syntax highlighting, and that's the source of the messages. You must have valid syntax for the language for highlighting to occur. You can either change the language to text which will remove desirable highlighting, or you can remove or comment out the elided output.

Problems with matches containing space for gtksourceview?

I'm working on improving syntax highlighting for Ada in gtksourceview (currently, it is very outdated and very incomplete). An issue I'm having, is Ada is very positional, so matching many constructs requires matching those positions. I was able to do this in nano fairly easily.
So, let's consider a type declaration such as:
type Trit is range 0..2;
Keywords like "type", "is" and "range" are recognized (and were originally). However, type names were treated as keywords (a bad design decision, as Ada regularly defines new types, even for simple types like integers). What the use gets, is the types in Standard being colored, and all other types looking like normal text, defeating the purpose of highlighting. In some languages this might be a notable problem. However, the majority of type names occur after two regex patterns:
type\s+(\w|\.|_)+
:\s+(\w|\.|_)+
It might just be a matter of implementation (nano and gtksourceview seem to use different regex implementations). I thought the problem was recognizing spaces. As it turns out, putting the type context above the keyword context results in types now being highlighted, but the "type" keyword, or ":" operator are then not highlighted properly (they are highlighted as "type"). I was able to override this in nano, resulting in correct highlighting, but cannot seem to find out how gtksourceview does this.
Here you can see the old gtksourceview definition in action, which doesn't work for a file with many custom types. My nano definition in action sidebyside for comparison; matching by position is definately possible and works.
Here is what happens when I put the type context below the keyword context.
Here is what happens when I put the type context above the keyword context.
In both cases the context is the same, just a simple pattern to get started.
<context id="type" style-ref="type">
<match>(type)\s+\w+</match>
</context>
You may want to consider generating the parser from the formal description of the syntax of Ada in annex P of the Language Reference Manual.
Unfortunately this doesn't answer your question of how to formulate the syntax for a GtkSourceView.

Is it possible to create wrong Regular expression in ActionScript/Flex which will cause runtime error?

Is it possible to create wrong Regular expression in ActionScript/Flex which will cause runtime error? I've tried so many weird regexpes in Flex and Flex never complained! How do I know If my regexp valid?
In theory, according to the ActionScript 3.0 SyntaxError documentation, when a regular expression cannot be parsed a SyntaxError is generated at runtime that you can detect in a try/catch block.
In practice, I've never actually seen the RegExp class exhibit this behavior.
I don't have ActionScript/Flex, so I can't test this. Since you haven't given any examples, I don't know what you think is a "weird" regex. What happens if you try one of these:
/(?<=x*)foo/
(ECMAScript regexes don't support lookbehind)
/foo([/
(missing closing parentheses/brackets)
/foo)]/
(missing opening parentheses/brackets)
/foo(?)/
(Syntax error)
/foo\1/
(invalid backreference)
If your end goal is to determine whether a particular regular expression is valid or not then I'm not sure trying to intentionally generate runtime errors is the best way to accomplish that.
Instead I would recommend testing your patterns against known inputs and make sure they behave as intended. You can use a tool like this to test:
RegExr

Tool for finding C-style Casts

Does anyone know of a tool that I can use to find explicit C-style casts in code? I am refactoring some C++ code and want to replace C-style casts where ever possible.
An example C-style cast would be:
Foo foo = (Foo) bar;
In contrast examples of C++ style casts would be:
Foo foo = static_cast<Foo>(bar);
Foo foo = reinterpret_cast<Foo>(bar);
Foo foo = const_cast<Foo>(bar);
If you're using gcc/g++, just enable a warning for C-style casts:
g++ -Wold-style-cast ...
Searching for the regular expression \)\w gives surprisingly good results.
The fact that such casts are so hard to search for is one of the reasons new-style casts were introduced in the first place. And if your code is working, this seems like a rather pointless bit of refactoring - I'd simply change them to new-style casts whenever I modified the surrounding code.
Having said that, the fact that you have C-style casts at all in C++ code would indicate problems with the code which should be fixed - I wouldn't just do a global substitution, even if that were possible.
The Offload C++ compiler supports options to report as a compile time error all such casts, and to restrict the semantics of such casts to a safer equivalence with static_cast.
The relevant options are:
-cp_nocstylecasts
The compiler will issue an error on all C-style casts. C-style casts in C++ code can potentially be unsafe and lead to undesired or undefined behaviour (for example casting pointers to unrelated struct/class types). This option is useful for refactoring to find all those casts and replace them with safer C++ casts such as static_cast.
-cp_c2staticcasts
The compiler applies the more restricted semantics of C++ static_cast to C-style casts. Compiling code with this option switched on ensures that C-style casts are at least as safe as C++ static_casts
This option is useful if existing code has a large number of C-style casts and refactoring each cast into C++ casts would be too much effort.
A tool that can analyze C++ source code accurately and carry out automated custom changes (e.g., your cast replacement) is the DMS Software Reengineering Toolkit.
DMS has a full C++ parser, builds ASTs and symbol tables, and can thus navigate your code to reliably find C style casts. By using pattern-directed matches and rewrites, you can provide a set of rules that would convert all such C-style casts into your desired C++ equivalents.
DMS has been used to carry out massive automated C++ reengineering tasks for Boeing and General Dynamics, each involving thousands of files.
One issue with C-style casts is that, since they rely on parentheses which are way overloaded, they're not trivial to spot. Still, a regex such as (e.g. in Python syntax):
r'\(\s*\w+\s*\)'
is a start -- it matches a single identifier in parentheses with optional whitespace inside the parentheses. But of course that won't catch, e.g., (void*) casts -- to get trailing asterisks as well,
r'\(\s*\w+[\s*]*\)'
You could also start with an optional const to broaden the net still further, etc, etc.
Once you have a good RE, many tools (from grep to vim, from awk to sed, plus perl, python, ruby, etc) lets you apply it to identify all of its matches in your source.
If you use some kind of hungarian style notation (e.g. iInteger, pPointer etc.) then you can search for e.g. )p and ) p and so on.
It should be possible to find all those places in reasonable time even for a large code base.
I already answered once with a description of a tool that will find and change all the casts if you want it to.
If all you want to do is find such casts, there's another tool that will do this easily, and in fact is the extreme generalization of all the "regular expression" suggestions made here. That is the SD Source Code Search Engine. This tool enables one to search large code bases in terms of the language elements that make up each language. It provides a GUI allowing you enter queries, see individual hits, and show the file text at the hit point with one mouse click. One more click and you can be in your editor [for many editors] on a file. The tool will also record a list of hits in context so you can revisit them later.
In your case, the following search engine query is likely to get most of the casts:
'(' I ')' | '(' I ... '*' ')'
which means, find a sequence of tokens, first being (, second being any identifier, third being ')', or a similar sequence involving something that ends in '*'.
You don't specify any whitespace management, as the tool understands the language whitespace rules; it will even ignore a comment in the middle of a cast and still match the above.
[I'm the CTO at the company that supplies this.]
I used this regular expression in Visual Studio (2010) Find in files search box: :i\):i
Thanks to sth for the inspiration (his post)