My target is to make a program (using C++) which would take C source code as input and check for "SYNTAX ERRORS ONLY".
Now for this, do i need to know about Regular Expressions, Grammar generation and Parsers??
I would like to use tools like Yacc/Flex/Bison - but the problems i am facing are -
How to use these tools? I mean i am only scratching at the surface when i read about these tools - i feel clueless.
How can i use these tools in tandem with my C++ source code?
How "The Hell" do i Get Started with this?
Use somebody else's C parser. For example, the parser used by the clang project. http://clang.llvm.org/
Then you can focus on the other hard part of your problem: detecting errors.
To get started with Yacc and Lex (or the Gnu versions, Bison and Flex) I can recommend Tom Niemann's A Compact Guide to Lex & Yacc.
I also suggest that you have a look of other projects doing the same thing. The are often named with lint in their name, as http://www.splint.org/
It all depends on what kind of errors you want to check.
In any cases you certainly need to learn more about compiler architectures. This book is a reference http://www.cs.princeton.edu/~appel/modern/c/
If you want to work at the syntactic level,
you certainly want to work with lex and Yacc.
This link may help you to get started with a working grammar (though outdated): http://www.lysator.liu.se/c/ANSI-C-grammar-y.html
Less powerfull syntax checking can be done using regular expression. You can do less with regular expression than with an actual parser (see http://en.wikipedia.org/wiki/Chomsky_hierarchy). But it is certainly far more practical.
if you want to perform high level checking. Like "Does this group of function alway take const parameters ?" etc ... You can probably use GCC ability to dump abstract syntax trees (see http://digitocero.com/en/blog/exporting-and-visualizing-gccs-abstract-syntax-tree-ast). Checks other compilers or front-end as well. An abstract tree contains many information you can "check".
If you want to handle compilation errors: related to type checking etc... I can't help you, you probably want to look at other people projects before starting to write your own compiler.
see also:
http://decomp.ulb.ac.be/roelwuyts/playground/canalysistools/
http://wiki.altium.com/display/ADOH/Static+Code+Analysis+-+CERT+C+Secure+Code+Checking
Some people in my previous labs worked on C and C++ analysis and transformation
http://www.lrde.epita.fr/cgi-bin/twiki/view/Transformers/
The project is now in standby, and has proved to be a complex subject even for people used to compiler writting (especially in the case of C++ transformation).
Finally your needs are maybe far simpler than this. Did you think about
FILE *output = popen("gcc -Wall my_c_file.c", "r");
(and then just checking the output of gcc)
How do you define "SYNTAX ERRORS ONLY"? If you just want to know what are the errors, why don't you call external gcc to perform a compilation and report the errors?
Related
I'm on Windows, using MSVC to compile my project, but I need clang for its neat AST parser, which allow me to write a little code generator.
Problem is, clang cannot parse MSVC headers (a very-well known and understandable problem).
I tried two options :
I include MSVC header folder, parsing the built-in headers included in my code will end-up leading to a fatal error at some point, preventing me from parsing the parts I want correctly.
What I did before is simply not provide any built-in headers and forward declare the types I needed. It worked fine and somehow it doesn't anymore with latest Clang. I don't really know if the parser policy on missing header changed, but it is causing complete failure every time something like <string> is included and not much get parsed.
I am using the python bindings (libclang), but I would consider switching to C/C++ API if there would be a solution there.
Is there anyway I can alter this behavior and make clang continue parsing even when some headers are not found ?
Use SetSuppressIncludeNotFoundError. Took me an hour to find! You can imagine how glad I was to find it!
https://clang.llvm.org/doxygen/classclang_1_1Preprocessor.html#ac7bafe67fc32e41460855b39d20ff6af
One way to ignore the errors due to missing headers is to set SetSuppressIncludeNotFoundError to true in your definition of ASTFrontendAction. An example for the same is given below.
{
public:
virtual std::unique_ptr<clang::ASTConsumer> CreateASTConsumer(
clang::CompilerInstance &Compiler, llvm::StringRef InFile)
{
Compiler.getPreprocessor().SetSuppressIncludeNotFoundError(true);
return std::unique_ptr<clang::ASTConsumer>(
new CustomASTConsumer(&Compiler.getASTContext()));
}
};
For a complete example using ASTFrontendAction, please visit at https://clang.llvm.org/docs/RAVFrontendAction.html
So you want to process C++ code that uses MS headers, and you want access to ASTs so that you can generate code. And Clang won't handle MS headers.
So Clang can't be the answer unless it gets a radical upgrade.
You asked for "any solution that can make this work".
Our DMS Software Reengineering Tookit with its C++14 Front End can do this.
DMS provides general parsing, AST construction/inspection/transformation/generation, and inverse parsing (conversion of ASTs back into compilable code), parameterized by language definitions.
The C++ front end provides a full C++14 parser, preprocessor handling, AST construction, and full name and type resolution. It has been tested with GCC and MS VS 2013 header files; we're testing with 2015 header files now.
(It also handles MS VS 2013 syntax, too).
It handles the tough parsing cases completely, including the C++ famous "most vexing parse". You can see parse trees at get human readable AST from c++ code.
DMS does not provide Python bindings, nor a direct C++ interface. Rather, it is a standalone tool designed to support the construction of custom tools (e.g., your "little code generator"). It has its own very extensive set of internal APIs, coded in metaprogramming language PARLANSE, which is LISP-like. Other aspects of DMS are managed by using DSLs for lexers, grammars, and transformations. See below.
A word of caution: any tool that can process C++ is gauranteed to be complex. DMS is correspondingly complex, and it takes a while to learn to use it, so you're not going to get instant answers. The good news here
is that some things are easier to do. Your code generation problem
is likely "read a skeleton file, and then replace key entries in it with problem specific code". If that's the case, a DMS tool with the following code (simplified for presentation here) will likely do the trick:
...
(= myAST (Registry:ParseFile (. filename) (. `CppVisualStudio2013') ...)
(Registry:ApplyTransforms myAST (. `MyTransforms.rsl'))
(Registry:PrettyPrint myAST (concat filename `.modified'))
...
with a transforms file MyTransforms.rsl containing source-to-source surface-syntax (e.g, C++ syntax) transformation rules of the conceptual form
rule rulename if_you_see THIS then replace_by ("-->") THAT
An actual C++ rule might look like (making this up because I don't
know your actual code generation goals)
rule replace_abstraction(s: STRING_LITERAL):
" abstraction_place_holder(\s) "
-> " my_DSL_library(\s,17); "
The ApplyTransforms call above will apply all the rules in this file until none apply any further.
Writing surface syntax transforms, where you can do it, is way easier than making calls on a procedure library (which, like Clang, DMS offers) that hack at the tree.
You can write more complex metaprograms using PARLANSE to apply some rules in one place, other rules someplace else, and you can mix source-to-source transforms with procedural transforms that hack directly at the tree if you want.
If you want more details on what transforms look like, ask and I'll provide a link.
Are there good tools to automatically check C++ projects for coding conventions like e.g.:
all thrown objects have to be classes derived from std::exception (i.e. throw 42; or throw "runtime error"; would be flagged as errors, just like throw std::string("another runtime error"); or throwing any other type not derived from std::exception)
In the end I'm looking for something like Cppcheck but with a simpler way to add new checks than hacking the source code of the check tool... May be even something with a nice little GUI which allows you to set up the rules, write them to disk and use the rule set in an IDE like Eclipse or an continuous integration server like Jenkins.
I ran a number of static analysis tools on my current project and here are some of the key takeaways:
I used Visual Lint as a single entry point for running all these tools. VL is a plug-in for VS to run third-party static analysis tools and allows a single click route from the report to the source code. Apart from supporting a GUI for selecting between the different levels of errors reported it also provides automated background analysis (that tells you how many errors have been fixed as you go), manual analysis for a single file, color coded error displays and charting facility. The VL installer is pretty spiffy and extremely helpful when you're trying to add new static analysis tools (it even helps you download Python from ActiveState should you want to use Google cpplint and don't have Python pre-installed!). You can learn more about VL here: http://www.riverblade.co.uk/products/visual_lint/features.html
Of the numerous tools that can be run with VL, I chose three that work with native C++ code: cppcheck, Google cpplint and Inspirel Vera++. These tools have different capabilities.
Cppcheck: This is probably the most common one and we have all used it. So, I'll gloss over the details. Suffice to say that it catches errors such as using postfix increment for non-primitive types, warns about using size() when empty() should be used, scope reduction of variables, incorrect name qualification of members in class definition, incorrect initialization order of class members, missing initializations, unused variables, etc. For our codebase cppcheck reported about 6K errors. There were a few false positives (such as unused function) but these were suppresed. You can learn more about cppcheck here: http://cppcheck.sourceforge.net/manual.pdf
Google cpplint: This is a python based tool that checks your source for style violations. The style guide against which this validation is done can be found here: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml (which is basically Google's C++ style guide). Cpplint produced ~ 104K errors with our codebase of which most errors are related to whitespaces (missing or extra), tabs, brace position etc. A few that are probably worth fixing are: C-style casts, missing headers.
Inspirel Vera++: This is a programmable tool for verification, analysis and transformation of C++ source code. This is similar to cpplint in functionality. A list of the available rules can be found here: http://www.inspirel.com/vera/ce/doc/rules/index.html and a similar list of available transformations can be found here: http://www.inspirel.com/vera/ce/doc/transformations/index.html. Details on how to add your own rule can be found here: http://www.inspirel.com/vera/ce/doc/tclapi.html. For our project, Vera++ found about 90K issues (for the 20 odd rules).
In the upcoming state: Manuel Klimek, from Google, is integrating in the Clang mainline a tool that has been developed at Google for querying and transforming C++ code.
The tooling infrastructure has been layed out, it may fill up but it is already functional. The main idea is that it allows you to define actions and will run those actions on the selected files.
Google has created a simple set of C++ classes and methods to allow querying the AST in a friendly way: the AST Matcher framework, it is being developped and will allow very precise matching in the end.
It requires creating an executable at the moment, but the code is provided as libraries so it's not necessary to edit it, and one-off transformation tools can be dealt with in a single source file.
Example of the Matcher (found in this thread): the goal is to find calls to the constructor overload of std::string formed from the result of std::string::c_str() (with the default allocator), because it can be replaced by a simple copy instead.
ConstructorCall(
HasDeclaration(Method(HasName(StringConstructor))),
ArgumentCountIs(2),
// The first argument must have the form x.c_str() or p->c_str()
// where the method is string::c_str(). We can use the copy
// constructor of string instead (or the compiler might share
// the string object).
HasArgument(
0,
Id("call", Call(
Callee(Id("member", MemberExpression())),
Callee(Method(HasName(StringCStrMethod))),
On(Id("arg", Expression()))
))
),
// The second argument is the alloc object which must not be
// present explicitly.
HasArgument(1, DefaultArgument())
)
It is very promising compared to ad-hoc tool because it uses the Clang compiler AST library, so not only it is guaranteed that no matter how complicated the macros and template stuff that are used, as long as your code compiles it can be analyzed; but it also means that intricates queries that depend on the result of overload resolution can be expressed.
This code returns actual AST nodes from within the Clang library, so the programmer can locate the bits and nits precisely in the source file and edit to tweak it according to her needs.
There has been talk about using a textual matching specification, however it was deemed better to start with the C++ API as it would have added much complexity (and bike-shedding). I hope a Python API will emerge.
The key problem with "style checkers" is that style is like art: everybody has a different opinion about what is good style and what is not. The implication is that style checkers will always need to be customized to the local art tastes.
To do this right, one needs a full C++ parser with access to symbol definitions, scoping rules and ideally various kinds of flow analyses. AFAIK, CppCheck does not provide accurate parsing or symbol table definitions, so its error checking can't be both deep and correct. I think Coverity and Fortify offer something along these lines using the EDG front end; I don't know if their tools offer access to symbol tables or data flow analyses. Clang is coming along.
You also need a way to write the style checks. I think all the tools offer access to an AST and perhaps symbol tables, and you can hand code your own checks, at the cost of knowing the AST intimately, which is hard for a big language like C++. I think Coverity and Fortify have some DSL-like scheme for specifying some of the checks.
If you want to fix code that is style incorrect, you need something that can modify the code representation. Coverity and Fortify do not offer this AFAIK. I believe Clang does offer the ability to modify the AST and regenerate code; you still have to have pretty intimate knowledge of the AST structure to code the tree hacking logic and get it right.
Our DMS Software Reengineering Toolkit and its C++ front end provide most of these capabilities. Using its C++ front end, DMS can parse ANSI C++11, GCC4 (with C++11 extensions) and MSVS 2010 (with its C++11 extensions) [update May 2021: now full C++17 and most of C++20] build ASTs and symbol tables with full type information. One can also ask for the type of an arbitrary expression AST node. At present, DMS computes control flow but not data flow for C++.
An AST API lets you procedurally code arbitrary checks; or make changes to the AST to fix problems, and then DMS's prettyprinter can regenerate complete, compilable source text with comments and preserved literal format information (eg., radix of numbers, etc.). You have to know the AST structure to do this, just like other tools, but it is a lot easier to know, because it is isomorphic to the DMS C++ grammar rules. The C++ front end comes with the our C++ grammar. [DMS uses GLR parsers to make this possible].
In addition, one can write patterns and transformations using DMS's Rule Specification Language, using the surface syntax of C++ itself. One might code OPs "dont throw nonSTL exceptions" as
pattern nonSTLexception(i: IDENTIFIER):statement
= " throw \i; " if ~derived_from_STD_exception(i);
The stuff inside the (meta)quotes is C++ source code with some pattern-matching escapes, e.g, "\i" refers to the placeholder varible "i" which must be a C++ IDENTIFIER according the rule; the entire "throw \i;" clause must be a C++ "statement" (a nonterminal in the C++ grammar). The rule itself mainly expresses syntax to be matched, but can invoke semantic checks (such as "~is_derived_from_STD_exception") applied to matched subtrees (in this case, whatever "\i" matched).
In writing such patterns, you don't have to know the shape of the AST; the pattern knows it, and it is automatically matched. If you've ever coded AST walkers, you will appreciate how convenient this is.
A match knows the AST node and therefore the precision position (file/line/column) which makes it easy to generate reports with precise location information.
You need to add a custom routine to DMS, "inherits_from_STD_exception", to verify the identifier tree node passed to that routine is (as OP desired) a class derived from
std::exception. This requires finding "std::exception" in the symbol table,
and verifying that the symbol table entry for the identifier tree node is a class
declaration and transitively inherits from other class declarations (by following symbol table links) until the std::exception symbol table entry is found.
A DMS transformation rule is a pair of patterns stating in essence, "if you see this, then replace it by that".
We've built several custom style checkers with DMS for both COBOL and C++. Its still a fair amount of work, mostly because C++ is a pretty complex language and you have to think carefully about the precise meaning of your check.
Trickier checks and those tests that start to fall into deep static analysis require access to control and data flow information. DMS computes control flow for C++ now, and we're working on data flow analysis (we've already done this for Java, IBM Enterprise COBOL and a variety of C dialects). Analysis results are tied back to the AST nodes so that one can use patterns to look for elements of the style check, and then follow the data flows to tie the elements together if needed.
When all is said and done with DMS, (or indeed with any of the other tools that deal with C++ in any halfway accurate way), is that coding additional or complex style checks is unlikely to be "convenient". You should hope for "possible with good technical background."
I know this question has been asked here, however it was from 2010 and I was wondering if anybody knows of some recent ones made.
I'm looking into using a style checker to help enforce coding conventions at my current workplace. I see the following few options:
A nice flexible way to enforce difference style conventions exist. Vera++ looked interesting and extendable.
Use/hack Google's cpplint style checker (seems daunting)
get access to the parse tree (preferably the AST) of the current file and perform checks on that.
#3 seems the most flexible and wondering whether anyone knows of a program or way to hook into the AST?
Try xCode with clang and all warning on.
You can also make clang dump the AST.
CppCheck build an AST. And it also allows you to write addons, where you can get access to AST. But as stated above it may wipe necessary information needed to check style. My choice is to customize cpplint.
I am currently writing a programming language in C/C++ as an exercise (but mostly for fun). At the moment it compiles into a list of commands that are then executed (kind of like a low-level API). Its working fantastically, however, I think it would be more exciting if instead of having a interpreter executable, having the language actually compile into a .exe file. I don't know if it is possible or how challenging this might be. I could not find any resources to help me with this. - Thanks in advance.
You could consider writing a frontend for LLVM (tutorial) or GCC (article from linux journal) - if thats still fun for you is a different question.
It would certainly be possible, although it could be a fair bit of work to produce all of the necessary parts to make a runnable binary. If that is what you are trying to learn about, then it could be a great exercise.
However, if you are simply looking to make it run faster, there are other options. For example, you could possibly emit C/C++ code based on the input program and then compile/link that.
First, you have to be clear about the syntax and lexical of your language code in a formal way.
Then, you could take a look on lex.
That builds a lexical analyzer, that you can use to generate the C code (or whatever) you need.
If your language doesn't use dynamic types, then you could get it easy.
Normally I program in C# but have been forced to do some work in C++. It seems that the integration with Visual Studio (2008) is really poor compared to C# but I was wondering if there are any good tools, plugins or configurations that can improve the situation.
Another post pointed out the program Visual Assist X, which at least helps with some things such as refactoring (though it is a bit expensive for me). My major problem is, though, that the compile errors give little clue about what is wrong and I spend most of my time figuring out what I did wrong. It just feels like it is possibly to statically check for a lot more errors than VS does out of the box. And why doesn't it provide the blue underlines as with C#, that shouldn't be too hard?!
I realize that half the problem is just the fact that I am new to C++ but I really feel that it can be unreasonably hard to get a program to compile. Are there any tools of this sort out there or are my demands too high?
I think there are two possibilities: 1) either you're trying out C++ stuff that is waaay over your knowledge (and consequently, you don't know what you did wrong and how to interpret error messages), 2) you have too high expectations.
A hint: many subsequent errors are caused by the first error. When I get a huge list of errors, I usually correct just the first error and recompile. You'd be amazed how much garbage (in terms of error messages) a missing delimiter or type declaration could produce :)
It is difficult to syntactically analyze a C++ program before compilation mainly for two reasons: 1) the C++ grammar is context-dependent, 2) templates are Turing-complete (think of them as of a functional programming language with a weird syntax).
My suggestions:
If you want more features like you get in C#, get VisualAssist X, and learn how to use it. It isn't free but it can save you a lot of time.
Set your warning level high (this will initially generate more compile-errors but as you fix them, you'll get a feel for common mistakes).
Set warning as error so you don't get in the habit of ignoring warnings.
To understand compile errors, use Google (don't waste your time with the help system) to search on warning error numbers (they look like this: C4127).
Avoid templates until you get your code compiling without errors using the above methods. If you don't know templates well, study! Get some books, do some tutorials and start small. Template compile errors are notoriously hard to figure out. Visual C++ 2008 has much better error messages than previous versions but it's still hard.
If you start doing templates in earnest, get a wide-screen monitor (maybe even two) to make reading the verbose errors easier.
+1 for Visual Assist, maybe not now - but when you turn the hobby into a profession you will need it.
In my experience, the diagnsotics are already much better than in VC6, but you will need to "learn" their true meaning as part of learning the IDE.
Static checking of C++ is much more complicated than C#, due to the build mode, and the incredibly more complex language. PC-Lint (best together with Visual Lint to integrate it into the IDE) is the canonical static analysis. Not cheap either, though...
The C++ standard sometimes reads like scripture, but without a trained preacher to interpret it. One excellent interpreter is Marshal Cline with his C++ FAQ. Note that the online FAQ, while extensive, covers much less than the book.
What helped me a lot understanding complex error messages is trying to reproduce the problem in a smaller environment - but then, there was no internet back then...