Is it possible to embed a language into C++ without macros? - c++

Background
Currently I'm using C++ MySQL connector to communicate with a database, and sometimes I need to send hardcoded commands through connection. As did many times, I make errors/typos and whatnot. This is for a very basic Database Systems class, so no industry involved.
Question
Is is possible to implement constexpr compiler for another language that would just run lexical, syntax, and semantic analyzis and report errors if it found them? Or maybe some additional compilation step?
Example
Lets suppose I wanted to send this command:
SELECT * FROM Persons
but instead, I forgot to type M:
SELECT * FRO Persons
I'd discover the problem at runtime, which could be caught by the compiler if it knew the language. My only idea to solve this is preprocessor madness.
From C++, I would call it like this:
auto statement = sql_parse("...");
and hopefully it should cause compilation error if something is wrong.

Is is possible to implement constexpr compiler for another language that would just run lexical, syntax, and semantic analyzis and report errors if it found them? Or maybe some additional compilation step?
No that's not possible at compile time using only the C-preprocessor/C++ compiler.
You'll need to implement a different tool to parse and validate the SQL code and generate the necessary C++ code for the SQL bindings.
auto statement = sql_parse("...");
will need a parser that inspects the SQL statement at runtime.
There are C++ template tools like boost::spirit, that allow you to integrate DSLs (like SQL syntax) at compile time though. That's way beyond what the C-preprocessor does.
For an easy and practical solution I'd recommend you test your SQL statements separately with an appropriate tool before adopting it into the C++ code.

Related

What's the easiest way to parse C++ for code generation?

I would like to generate some wrapper code based on C++ types. I basically would like to parse some C++ headers, get the types, classes and their fields defined in the headers, and generate some code based on them.
What would be the easiest way to parse C++ and get type information? I thought about using the Clang C++ parser, but I couldn't make a working hello world in a couple of hours, so I gave up for the time being.
Could you advise any other way to parse C++, or if Clang is the easiest solution, could you point me to a simple getting started guide to be able to parse C++ types with it?
(basically any technology would be ok, C++, Java, C#, etc., this would be part of a command line tool)
Clang is definitely the easiest option. Consider using cindex python bindings, it's pretty straightforward. Alternatively, you could get an older version of clang which still features an xml backend.
EDIT: the link above seems to be down, so here is a link to the google cache of it.
Another link suggested in the comments: http://www.altdevblogaday.com/2014/03/05/implementing-a-code-generator-with-libclang/
Unless your object is to verify correctness, or the code involves advanced template stuff, consider using the XML output of DOxygen or GCC_XML. Alternatively, consider clang, even if that's what you found too complex. Note that for clang it might be best to work in *nix-land.
If your generation tool is in Java, consider using the parser from the Eclipse CDT.
my set of dependencies are:
com.ibm.icu_4.4.2.v20110823.jar
org.eclipse.cdt.core_5.3.2.201202111925.jar
org.eclipse.equinox.common_3.6.0.v20110523.jar
(these are from an old Eclipse version, because I have a dependency on old java class versions), but taking from the latest CDT wil do.
parsing involves:
FileContent reader;
reader = FileContent.createForExternalFileLocation(fullPath);
IScannerInfo info = new ScannerInfo(definedSymbols, includePaths);
return GPPLanguage.getDefault().getASTTranslationUnit(reader, info, FilesProvider.getInstance(), null, 0,log);
This returns an IASTTranslationUnit that can be accessed through a Visitor pattern (ASTVisitor).
I cannot comment on the accuracy of the parsing in corner scenarios, because so far I've been generating code based on simple C++ structure definitions.

Are there convenient tools to automatically check C++ coding conventions beyond style checks?

Are there good tools to automatically check C++ projects for coding conventions like e.g.:
all thrown objects have to be classes derived from std::exception (i.e. throw 42; or throw "runtime error"; would be flagged as errors, just like throw std::string("another runtime error"); or throwing any other type not derived from std::exception)
In the end I'm looking for something like Cppcheck but with a simpler way to add new checks than hacking the source code of the check tool... May be even something with a nice little GUI which allows you to set up the rules, write them to disk and use the rule set in an IDE like Eclipse or an continuous integration server like Jenkins.
I ran a number of static analysis tools on my current project and here are some of the key takeaways:
I used Visual Lint as a single entry point for running all these tools. VL is a plug-in for VS to run third-party static analysis tools and allows a single click route from the report to the source code. Apart from supporting a GUI for selecting between the different levels of errors reported it also provides automated background analysis (that tells you how many errors have been fixed as you go), manual analysis for a single file, color coded error displays and charting facility. The VL installer is pretty spiffy and extremely helpful when you're trying to add new static analysis tools (it even helps you download Python from ActiveState should you want to use Google cpplint and don't have Python pre-installed!). You can learn more about VL here: http://www.riverblade.co.uk/products/visual_lint/features.html
Of the numerous tools that can be run with VL, I chose three that work with native C++ code: cppcheck, Google cpplint and Inspirel Vera++. These tools have different capabilities.
Cppcheck: This is probably the most common one and we have all used it. So, I'll gloss over the details. Suffice to say that it catches errors such as using postfix increment for non-primitive types, warns about using size() when empty() should be used, scope reduction of variables, incorrect name qualification of members in class definition, incorrect initialization order of class members, missing initializations, unused variables, etc. For our codebase cppcheck reported about 6K errors. There were a few false positives (such as unused function) but these were suppresed. You can learn more about cppcheck here: http://cppcheck.sourceforge.net/manual.pdf
Google cpplint: This is a python based tool that checks your source for style violations. The style guide against which this validation is done can be found here: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml (which is basically Google's C++ style guide). Cpplint produced ~ 104K errors with our codebase of which most errors are related to whitespaces (missing or extra), tabs, brace position etc. A few that are probably worth fixing are: C-style casts, missing headers.
Inspirel Vera++: This is a programmable tool for verification, analysis and transformation of C++ source code. This is similar to cpplint in functionality. A list of the available rules can be found here: http://www.inspirel.com/vera/ce/doc/rules/index.html and a similar list of available transformations can be found here: http://www.inspirel.com/vera/ce/doc/transformations/index.html. Details on how to add your own rule can be found here: http://www.inspirel.com/vera/ce/doc/tclapi.html. For our project, Vera++ found about 90K issues (for the 20 odd rules).
In the upcoming state: Manuel Klimek, from Google, is integrating in the Clang mainline a tool that has been developed at Google for querying and transforming C++ code.
The tooling infrastructure has been layed out, it may fill up but it is already functional. The main idea is that it allows you to define actions and will run those actions on the selected files.
Google has created a simple set of C++ classes and methods to allow querying the AST in a friendly way: the AST Matcher framework, it is being developped and will allow very precise matching in the end.
It requires creating an executable at the moment, but the code is provided as libraries so it's not necessary to edit it, and one-off transformation tools can be dealt with in a single source file.
Example of the Matcher (found in this thread): the goal is to find calls to the constructor overload of std::string formed from the result of std::string::c_str() (with the default allocator), because it can be replaced by a simple copy instead.
ConstructorCall(
HasDeclaration(Method(HasName(StringConstructor))),
ArgumentCountIs(2),
// The first argument must have the form x.c_str() or p->c_str()
// where the method is string::c_str(). We can use the copy
// constructor of string instead (or the compiler might share
// the string object).
HasArgument(
0,
Id("call", Call(
Callee(Id("member", MemberExpression())),
Callee(Method(HasName(StringCStrMethod))),
On(Id("arg", Expression()))
))
),
// The second argument is the alloc object which must not be
// present explicitly.
HasArgument(1, DefaultArgument())
)
It is very promising compared to ad-hoc tool because it uses the Clang compiler AST library, so not only it is guaranteed that no matter how complicated the macros and template stuff that are used, as long as your code compiles it can be analyzed; but it also means that intricates queries that depend on the result of overload resolution can be expressed.
This code returns actual AST nodes from within the Clang library, so the programmer can locate the bits and nits precisely in the source file and edit to tweak it according to her needs.
There has been talk about using a textual matching specification, however it was deemed better to start with the C++ API as it would have added much complexity (and bike-shedding). I hope a Python API will emerge.
The key problem with "style checkers" is that style is like art: everybody has a different opinion about what is good style and what is not. The implication is that style checkers will always need to be customized to the local art tastes.
To do this right, one needs a full C++ parser with access to symbol definitions, scoping rules and ideally various kinds of flow analyses. AFAIK, CppCheck does not provide accurate parsing or symbol table definitions, so its error checking can't be both deep and correct. I think Coverity and Fortify offer something along these lines using the EDG front end; I don't know if their tools offer access to symbol tables or data flow analyses. Clang is coming along.
You also need a way to write the style checks. I think all the tools offer access to an AST and perhaps symbol tables, and you can hand code your own checks, at the cost of knowing the AST intimately, which is hard for a big language like C++. I think Coverity and Fortify have some DSL-like scheme for specifying some of the checks.
If you want to fix code that is style incorrect, you need something that can modify the code representation. Coverity and Fortify do not offer this AFAIK. I believe Clang does offer the ability to modify the AST and regenerate code; you still have to have pretty intimate knowledge of the AST structure to code the tree hacking logic and get it right.
Our DMS Software Reengineering Toolkit and its C++ front end provide most of these capabilities. Using its C++ front end, DMS can parse ANSI C++11, GCC4 (with C++11 extensions) and MSVS 2010 (with its C++11 extensions) [update May 2021: now full C++17 and most of C++20] build ASTs and symbol tables with full type information. One can also ask for the type of an arbitrary expression AST node. At present, DMS computes control flow but not data flow for C++.
An AST API lets you procedurally code arbitrary checks; or make changes to the AST to fix problems, and then DMS's prettyprinter can regenerate complete, compilable source text with comments and preserved literal format information (eg., radix of numbers, etc.). You have to know the AST structure to do this, just like other tools, but it is a lot easier to know, because it is isomorphic to the DMS C++ grammar rules. The C++ front end comes with the our C++ grammar. [DMS uses GLR parsers to make this possible].
In addition, one can write patterns and transformations using DMS's Rule Specification Language, using the surface syntax of C++ itself. One might code OPs "dont throw nonSTL exceptions" as
pattern nonSTLexception(i: IDENTIFIER):statement
= " throw \i; " if ~derived_from_STD_exception(i);
The stuff inside the (meta)quotes is C++ source code with some pattern-matching escapes, e.g, "\i" refers to the placeholder varible "i" which must be a C++ IDENTIFIER according the rule; the entire "throw \i;" clause must be a C++ "statement" (a nonterminal in the C++ grammar). The rule itself mainly expresses syntax to be matched, but can invoke semantic checks (such as "~is_derived_from_STD_exception") applied to matched subtrees (in this case, whatever "\i" matched).
In writing such patterns, you don't have to know the shape of the AST; the pattern knows it, and it is automatically matched. If you've ever coded AST walkers, you will appreciate how convenient this is.
A match knows the AST node and therefore the precision position (file/line/column) which makes it easy to generate reports with precise location information.
You need to add a custom routine to DMS, "inherits_from_STD_exception", to verify the identifier tree node passed to that routine is (as OP desired) a class derived from
std::exception. This requires finding "std::exception" in the symbol table,
and verifying that the symbol table entry for the identifier tree node is a class
declaration and transitively inherits from other class declarations (by following symbol table links) until the std::exception symbol table entry is found.
A DMS transformation rule is a pair of patterns stating in essence, "if you see this, then replace it by that".
We've built several custom style checkers with DMS for both COBOL and C++. Its still a fair amount of work, mostly because C++ is a pretty complex language and you have to think carefully about the precise meaning of your check.
Trickier checks and those tests that start to fall into deep static analysis require access to control and data flow information. DMS computes control flow for C++ now, and we're working on data flow analysis (we've already done this for Java, IBM Enterprise COBOL and a variety of C dialects). Analysis results are tied back to the AST nodes so that one can use patterns to look for elements of the style check, and then follow the data flows to tie the elements together if needed.
When all is said and done with DMS, (or indeed with any of the other tools that deal with C++ in any halfway accurate way), is that coding additional or complex style checks is unlikely to be "convenient". You should hope for "possible with good technical background."

Define C++ function at runtime

I'm trying to adjust some mathematical code I've written to allow for arbitrary functions, but I only seem to be able to do so by pre-defining them at compile time, which seems clunky. I'm currently using function pointers, but as far as I can see the same problem would arise with functors. To provide a simplistic example, for forward-difference differentiation the code used is:
double xsquared(double x) {
return x*x;
}
double expx(double x) {
return exp(x);
}
double forward(double x, double h, double (*af)(double)) {
double answer = (af(x+h)-af(x))/h;
return answer;
}
Where either of the first two functions can be passed as the third argument. What I would like to do, however, is pass user input (in valid C++) rather than having to set up the functions beforehand. Any help would be greatly appreciated!
Historically the kind of functionality you're asking for has not been available in C++. The usual workaround is to embed an interpreter for a language other than C++ (Lua and Python for example are specifically designed for being integrated into C/C++ apps to allow scripting of them), or to create a new language specific to your application with your own parser, compiler, etc. However, that's changing.
Clang is a new open source compiler that's having its development by Apple that leverages LLVM. Clang is designed from the ground up to be usable not only as a compiler but also as a C++ library that you can embed into your applications. I haven't tried it myself, but you should be able to do what you want with Clang -- you'd link it as a library and ask it to compile code your users input into the application.
You might try checking out how the ClamAV team already did this, so that new virus definitions can be written in C.
As for other compilers, I know that GCC recently added support for plugins. It maybe possible to leverage that to bridge GCC and your app, but because GCC wasn't designed for being used as a library from the beginning it might be more difficult. I'm not aware of any other compilers that have a similar ability.
As C++ is a fully compiled language, you cannot really transform user input into code unless you write your own compiler or interpreter. But in this example, it can be possible to build a simple interpreter for a Domain Specific Language which would be mathematical formulae. All depends on what you want to do.
You could always take the user's input and run it through your compiler, then executing the resulting binary. This of course would have security risks as they could execute any arbitrary code.
Probably easier is to devise a minimalist language that lets users define simple functions, parsing them in C++ to execute the proper code.
The best solution is to use an embedded language like lua or python for this type of task. See e.g. Selecting An Embedded Language for suggestions.
You may use tiny C compiler as library (libtcc).
It allows you to compile arbitrary code in run-time and load it, but it is only works for C not C++.
Generally the only way is following:
Pass the code to compiler and create shared object or DLL
Load this Shared object or DLL
Use function from this shared object.
C++, unlike some other languages like Perl, isn't capable of doing runtime interpretation of itself.
Your only option here would be to allow the user to compile small shared libraries that could be dynamically-loaded by your application at runtime.
Well, there are two things you can do:
Take full advantage of boost/C++0x lambda's and to define functions at runtime.
If only mathematical formula's are needed, libraries like muParser are designed to turn a string into bytecode, which can be seen as defining a function at runtime.
While it seems like a blow off, there are a lot of people out there who have written equation parsers and interpreters for c++ and c, many commercial, many flawed, and all as different as faces in a crowd. One place to start is the college guys writing infix to postfix translators. Some of these systems use paranthetical grouping followed by putting the items on a stack like you would find in the old HP STL library. I spent 30 seconds and found this one:
http://www.speqmath.com/tutorials/expression_parser_cpp/index.html
possible search string:"gcc 'equation parser' infix to postfix"

Confusion in continuing the Project Related to C Syntax Analyser

My target is to make a program (using C++) which would take C source code as input and check for "SYNTAX ERRORS ONLY".
Now for this, do i need to know about Regular Expressions, Grammar generation and Parsers??
I would like to use tools like Yacc/Flex/Bison - but the problems i am facing are -
How to use these tools? I mean i am only scratching at the surface when i read about these tools - i feel clueless.
How can i use these tools in tandem with my C++ source code?
How "The Hell" do i Get Started with this?
Use somebody else's C parser. For example, the parser used by the clang project. http://clang.llvm.org/
Then you can focus on the other hard part of your problem: detecting errors.
To get started with Yacc and Lex (or the Gnu versions, Bison and Flex) I can recommend Tom Niemann's A Compact Guide to Lex & Yacc.
I also suggest that you have a look of other projects doing the same thing. The are often named with lint in their name, as http://www.splint.org/
It all depends on what kind of errors you want to check.
In any cases you certainly need to learn more about compiler architectures. This book is a reference http://www.cs.princeton.edu/~appel/modern/c/
If you want to work at the syntactic level,
you certainly want to work with lex and Yacc.
This link may help you to get started with a working grammar (though outdated): http://www.lysator.liu.se/c/ANSI-C-grammar-y.html
Less powerfull syntax checking can be done using regular expression. You can do less with regular expression than with an actual parser (see http://en.wikipedia.org/wiki/Chomsky_hierarchy). But it is certainly far more practical.
if you want to perform high level checking. Like "Does this group of function alway take const parameters ?" etc ... You can probably use GCC ability to dump abstract syntax trees (see http://digitocero.com/en/blog/exporting-and-visualizing-gccs-abstract-syntax-tree-ast). Checks other compilers or front-end as well. An abstract tree contains many information you can "check".
If you want to handle compilation errors: related to type checking etc... I can't help you, you probably want to look at other people projects before starting to write your own compiler.
see also:
http://decomp.ulb.ac.be/roelwuyts/playground/canalysistools/
http://wiki.altium.com/display/ADOH/Static+Code+Analysis+-+CERT+C+Secure+Code+Checking
Some people in my previous labs worked on C and C++ analysis and transformation
http://www.lrde.epita.fr/cgi-bin/twiki/view/Transformers/
The project is now in standby, and has proved to be a complex subject even for people used to compiler writting (especially in the case of C++ transformation).
Finally your needs are maybe far simpler than this. Did you think about
FILE *output = popen("gcc -Wall my_c_file.c", "r");
(and then just checking the output of gcc)
How do you define "SYNTAX ERRORS ONLY"? If you just want to know what are the errors, why don't you call external gcc to perform a compilation and report the errors?

Using clang to analyze C++ code

We want to do some fairly simple analysis of user's C++ code and then use that information to instrument their code (basically regen their code with a bit of instrumentation code) so that the user can run a dynamic analysis of their code and get stats on things like ranges of values of certain numeric types.
clang should be able to handle enough C++ now to handle the kind of code our users would be throwing at it - and since clang's C++ coverage is continuously improving by the time we're done it'll be even better.
So how does one go about using clang like this as a standalone parser? We're thinking we could just generate an AST and then walk it looking for objects of the classes we're interested in tracking. Would be interested in hearing from others who are using clang without LLVM.
clang is designed to be modular. Quoting from its page:
A major design concept for clang is
its use of a library-based
architecture. In this design, various
parts of the front-end can be cleanly
divided into separate libraries which
can then be mixed up for different
needs and uses.
Look at clang libraries like libast for your needs. Read more here.
What you didn't indicate is what kind of "analyses" you wanted to do. Most C++ analyses require that you have accurate symbol table data so that when you encounter a symbol foo you have some idea what it is. (You technically don't even know what + is without such a symbol table!) You also need generic type information; if you have an expression "a*b", what is the type of the result? Having "name and type" information is key to almost anything you want to do for analysis.
If you insist on clang, then there are other answers here. I don't know it it provides for name and type resolution.
If you need name and type resolution, then another solution would the DMS Software Reengineering Toolkit. DMS provides generic compiler like infrastructure for parsing, analyzing, transforming, and un-parsing (regenerating source code from the compiler data structures). DMS's industrial-strength C++ front end (it has many other language front ends, too) provides full name and type resolution according to the ANSI standard as well a GCC and MS VC++ dialects.
Code transformations can be implemented via an abstract-syntax tree interface provided by DMS, or by pattern-directed program transformation rules written in the surface syntax of your target language (in this case, C++). Here's a simple transformation using the rule language:
domain Cpp~GCC3; -- says we want patterns for C++ in the GCC3 dialect
rule optimize_to_increment(lhs:left_hand_side):expression -> expression
" \lhs = \lhs + 1 " -> " \lhs++" if no_side_effects(lhs).
This implicitly operates on the ASTs built by DMS, to modify them. The conditional
allows you to inquire about arbitrary properties of pattern variables (in this case, lhs), including name and type constraints if you wish.
DMS has been used many times for very sophisticated program analysis and transformation of C++ code. We build C++ test coverage tools by instrumenting C++ code in a rather obvious way using DMS. At the website, there's a bibligraphy with papers describing how DMS was used to restructure the architecture of a large product line of military aircraft mission software. This kind of activity literally pours C++ in one architectural shape into another by applying large numbers of pattern directed transforms such as the above.
It is likely to be very easy to implement your instrumentation. And you don't have to wait for it to mature.