Is the AST, abstract syntax tree, defined by the language or by the frontend? - c++

In the last few weeks I have been experimenting with ASTs and Clang, in particular clang-tidy.
Clang offers some classes and way to interact with the ASTs, but what I don't understand is if the clang::VarDecl I am using so often is something named and created by the creators of Clang, or by the creators of the language.
Who decided that was to be called VarDecl?
I mean, is the AST (and all its elements) something that came from the mind of the inventor of the language and the various frontends just creates classes named after a document written by him/her or every frontend potentially creates its AST of a given source code and so Clang's and GCC's are different?

Is the AST, abstract syntax tree, defined by the language
Not fully. Each definition in the C++ language standard comes with a short syntax notation and there is a informative annex with grammar summary. But the annex notes https://eel.is/c++draft/gram :
This summary of C++ grammar is intended to be an aid to comprehension.
It is not an exact statement of the language. In particular, the
grammar described here accepts a superset of valid C++ constructs. [...]
There is no VarDecl in that grammar from standard, variable declaration just one interpretation of simple-declaration.
or by the frontend?
Internals of the compiler, if it has a frontend, or it hasn't, it has 3 or 1000 stages, are part of the compiler implementation. From the point of the language, the compiler can be implemented in any way it wants, as long as it translates valid programs correctly. Let's say generally, language specifies what should happen when, not how.
So to answer the question, AST (if used at all in any form) is defined by the compiler.
Who decided that was to be called VarDecl?
I most probably suspect Chris Lattner at https://github.com/llvm/llvm-project/commit/a11999d83a8ed1a2661feb858f0af786f2b829ad .
that came from the mind of the inventor of the language and the various frontends just creates classes named after a document written by him/her or every frontend potentially creates its AST of a given source code and so Clang's and GCC's are different?
Surely they are influenced by what is in the standard, but every compiler has its own internals. Well, in short, Clang VarDecl and GCC VAR_DECL are different, and they are also different conceptually - let's say GCC uses switch(...) case VAR_DECL: and Clang uses classes clang::VarDecl.

Related

How is C++ syntactic evolution managed?

Separate from frontend implementers' experiences, are there formal standards that syntactic extensions to the C++ grammar are required to meet? That is, are proposed extensions subjected to any form of mechanical analysis before being accepted?
I ask because I have read that the two most widely used C++ compilers, g++ and clang, both use hand written, recursive descent parsers. Does that mean that as the grammar evolves, it needs to remain LL(1) (or maybe LL(n)) with the proviso that certain implementation tricks are allowed / assumed / expected?
The C++ standard defines a language; it does not restrict what that language might become in the future. (The C standard does contain a section called "future directions", but that is more a warning to users of features which have been deprecated, and which identifiers might be reserved in the future, rather than being a limitation on future standards.)
That said, the standards process is basically conservative, since the committee includes representatives of the major compilers as well as major user groups, none of whom are likely to accept changes which make the language even harder to parse.
As far as I know, there is no mechanical validation of proposed changes. But there is a lot of manual analysis by people with a lot of experience and expertise. Moreover, proposed changes are generally accompanied by proof-of-concept implementations to demonstrate their viability and utility.

What is the history of struct/class in C++?

I'm taking a class on Object Oriented Programming in C++. In a recent assignment I defined a member function within a struct. My instructor explained that, although it's compilable to use member functions within structs, he would prefer we didn't, for backward compatibility with C, and (especially in this beginner class) to practice good data encapsulation- we should use a struct for types that contain mostly data, and a class for applications that benefit from more procedural encapsulation. He indicated that this practice comes from the history of structs/classes in C++, which is what I'd like to know more about.
I know that structs are functionally the same as classes except for default member/inheritance access. My question is:
Why are structs AND classes included in C++? From my background in C#, where structs and classes have important differences, it seems like struct in C++ is just syntactic sugar for defining classes with default public-ness. Is it?
I'm not looking for opinions on when/why one should be used instead of the other- I was born well after these constructs were, and I'm looking for the history of them. Were they conceived together? If so, why? If not, which came first, and why was the second added? I realize that there are many venerable elders within this community who may have living memory of these features' origins, and links to standards publications, or examples of code, where both, one, or the other first appeared, will add to answers' helpfulness.
Please note, this question is not:
What are the differences between struct and class in C++?
C++ Structs with Member Functions vs. Classes with Public Variables
Can C++ struct have member functions?
nor Function for C++ struct
To ease development of polyglot headers.
C and C++ are separate languages, but the C++ language is designed to provide a large useful common subset with the C language. Several commonly used constructions in the C++ language have the same meaning (or nearly so) as they have in the C language. In the case of struct, the C++ language defines it as syntactic sugar for class that approximates the behavior of a struct in the C language, namely that members are public by default. Thus a program or part thereof can be written in this common subset. This allows, for example, a library to provide a single header file that declares the same API to both programs written in the C language and programs written in the C++ language.
Some differences between the C language and the C++ language are listed in a question that has since been closed as too broad. From the perspective of somebody programming in this common subset language, these differences can be treated as "implementation-defined behavior", where compilers for the C language produce one behavior and compilers for the C++ language produce the other.
In fact, the C++ standard provides mechanisms to aid development of polyglot programs that are valid in both C and C++ languages. The extern "C" linkage specifier allows a program to be written partly in C and partly in C++. And the __cplusplus symbol is used in #ifdef __cplusplus conditions to conditionally enable macros, linkage specifiers, and other specifics that only one of the two languages is supposed to see.
In the old days (late 1980s or early 1990s), a C++ compiler (then called cfront) translated C++ code to C code. That C++ was widely different from current C++11 or C++14 (or even old C++03).
I don't remember the details, but it could have happened that struct at that time was parsed completely, but passed unchanged into the generated C code, while class was preprocessed to something different (and was translated to a struct).
I might be completely wrong, this is from my human memory, and I only used (on SunOS3, Sun3/160 workstations) a couple of times that cfront. It left me unimpressed, but interested. At that time, the translation to C code added a significant time to the build process. But things have changed a lot, translating to C makes a lot of sense today...
(In those days, a hello world program in Ada took 5 minutes to compile, and with cfront it might be 3 minutes, but in C it was less than a minute)
Later on, the definition of C++ changed, and struct foo { was indeed equivalent to class foo{ public:, but I am not sure that was the case in the primordial C++ compiler.
Both were included in the original C++ language, but struct long predates C++ in the C language.
C++ includes structs because it inherited them from C. There was no reason to remove them from C++ and break existing C code bases that may be compiled with a C++ compiler.
C++ introduces class as a new keyword. Presumably the keyword was introduced to provide the option of differentiating newly created C++ classes from existing structs in legacy C code bases.
C++ is mostly a superset of C. So (mostly) if you can do something in C you can do it in C++. Structs are a C feature and were included in C++. Refer to this for the reasons C++ isn't a true superset: Where is C not a subset of C++?

Is Vala a sane language to parse, compared to C++?

The problems parsing C++ are well known. It can't be parsed purely based on syntax, it can't be done as LALR (whatever the term is, i'm not a language theorist), the language spec is a zillion pages, etc. For that and other reasons I'm deciding on an alternative language for my personal projects.
Vala looks like a good language. Although providing many improvements over C++, is just as troublesome to parse? Or does it have a neat, reasonable length formal grammar, or some logical description, suitable for building parsers for compilers, source analyzers and other tools?
Whatever the answer, does that go for the Genie alternative syntax?
(I also wonder albeit less intensely about D and other post-C++ non-VM languages.)
C++ is one of the most complex (if not the most complex) programming language to parse in common use. Of particular difficulty is it's name lookup rules and template instantiation rules. C++ is not parsable using a LALR(1) parser (such as the parsers generated by Bison and Yacc), but it is by all means parsable (after all, people use parsers which have no problem parsing C++ every day). (In fact, early versions of G++ were built on top of Bison's Generalized LR parser framework Actually not, see comments) before it was more recently replaced with a hand written recursive descent parser)
On the other hand, I'm not sure I see what "improvements" Vala offers over C++. The languages look to attempt to accomplish the same goals. On the other hand, you're probably not going to find much outside of GTK+ written with Vala interfaces. You're going to be using C interfaces to everything else, which really defeats the point of using such a language.
If you don't like C++ due to it's complexity, it might be a good idea to consider Objective-C instead, because it is a simple extension of C, (like Vala), but has a much larger community of programmers for you to draw upon given it's foundation for everything in Mac land.
Finally, I don't see why the difficulty of parsing the language itself has to do with what a programmer should be caring about in order to use the language. Just my 2 cents.
It's pretty simple. You can use libvala to do both parsing, semantic analyzing and code generation instead of writing your own.

Partially parse C++ for a domain-specific language

I would like to create a domain specific language as an augmented-C++ language. I will need mostly two types of contructs:
Top-level constructs for specialized types or declarations
In-code constructs, i.e. to add primitives to make functions calls or idiom easier
The language will be used for scientific computing purposes, and will ultimately be translated into plain C++. C++ has been chosen as it seems to offer a good compromise between: ease of use, efficiency and availability of a wide range of libraries.
A previous attempt using flex and bison failed due to the complexity of the C++ syntax. The existing parser can still fail on some constructs. So we want to start over, but on better bases.
Do you know about similar projects? And if you attempted to do so, what tools would you use? What would be the main pitfalls? Would you have recommendations in term of syntax?
There are many (clever) attempts to have domain specific languages within the C++ language.
It's usually called DSEL for Domain Specific Embedded Language. For example, you could look up the Boost.Spirit syntax, or Boost.rdb (in the boost vault).
Those are fully compliant C++ libraries which make use of C++ syntax.
If you want to hide some complexity, you might add in a few macros.
I would be happy to provide some examples if you gave us something to work with :)
You can try extending an open source Elsa C++ parser (it is now a part of a Mozilla's Pork project):
https://wiki.mozilla.org/Pork
The way to extend C++ is not to try to extend the language, which will be extremely difficult and probably break as new base compiler releases implement new features, but to write class libraries to support your problem domain. This has been what C++ programming has been all about since the language's inception.
If you really want to extend C++, you'll need a full C++ parser plus name and type resolution. As you've found out, this is pretty hard. Your best solution is to get an existing one and modify it.
Our DMS Software Reengineering Toolkit is an infrastructure for implementing langauge processors. It is
designed to support the construction of tools that parse languages, carry out transformations, and spit out the same language (with enhanced code) or a different language/dialect.
DMS has a full C++ Front End, that parses C++, builds abstract syntax trees and symbol tables (e.g., all that name and type resolution stuff).
The DMS/C++ front end is provided with DMS in source form, so that it can be customized to achieve the kind of effect you want. You'd define your DSL as an extension of the C++ front end, and then write transformations that convert your special constructs into "vanilla" C++ constructs, and then spit out compilable result.
DMS/C++ have been used for a wide variety of transformation tasks, including ones that involved extending C++ as you've described, and including tasks that carry out massive reorganizations of large C++ applications. (See the Publications at that website).
To solve you first bullet, maybe you can use C++0x new features "initializer lists", and "user defined litterals" avoiding the need for a new parser. They may help for the second bullet, too.

C++ languages extensions

I already read the FAQ, and i think maybe this is a subjective question, but i need to ask.
Does anyine knows what exactly (i mean formally) is a C++ language extensions.
I already saw examples, like nvdia CUDA c ext, Avalon transaction-based c++ ext.
So the point is something like a formal definition or so.
thxs anyway.
A language extension is simply anything that goes beyond what the language specification calls for. Your compiler might add new features, like special "min" and "max" operators. Your compiler might define the behavior of division by zero, which is otherwise undefined, according to the standard. It might provide additional parameters for your main function. It might be the incorporation of another language's features, such as allowing C-style variable-sized arrays in C++. It might be a facility for specifying a function's calling convention.
Using a language extension usually makes your code non-portable because when you take your code to another OS, compiler, or even compiler version, the extension may not be available anymore, or its behavior may be different from what you had originally used.
Please see Extensible programming:
Extensible programming is a term used
in computer science to describe a
style of computer programming that
focuses on mechanisms to extend the
programming language, compiler and
runtime environment.
and more to the point, the Extensible syntax section:
This simply means that the source
language(s) to be compiled must not be
closed, fixed, or static. It must be
possible to add new keywords,
concepts, and structures to the source
language(s).