Collecting classes definitions on the fly while parsing - c++

Currently, I'm working on a simple compiler project.
Suppose having the following grammar:
file_input : file_item*
;
file_item : class_def
| variable_decl
;
class_def : 'class' NAME scope
;
variable_decl : 'dim' NAME 'as' NAME
;
now, while building our symbol table if we declared a variable before the class definition we will get semantic error, because it won't find the class required in the symbol table
simply, we need let the compiler wait till the class name is defined, so declaring a variable of type foo and defining the class foo later won't disturb the compiler.
any suggestion on how to achieve that ?
thanks for your time.

You'll require a multi-pass approach:
First walk over the AST once to build the table mapping class names to the class definition without doing anything else that would require performing lookups on the table. Then walk it a second time, with the table already built and you'll be able to loop up any class you want when encountering a variable definition.

One approach could be that when class foo is used in a variable declaration and it doesn't yet exist, create the class foo immediately, but add a flag (something like "undefined") to the class definition. When the class is actually defined later on, update the class definition in the symbol table and remove the "undefined" flag.
At the end of the compile, look through in the symbol table for any classes that are still flagged as "undefined" and report the error then. It might be useful to record the line number of the first use of the class for error reporting purposes.
This will work for now, but later on when you want to check for correct member access within a class, it will be tricky to do without the full class definition. You could do a similar thing where you defer the parsing of the member access until you have the definition, but overall I think it would be harder than just multi-pass as sepp2k suggested.

Related

Clarification on classes and scopes in this scenario

I'm currently working with the JUCE Framework to create an audio VST plugin to get to grips and learn, but just want to clarify some basic stuff relating to classes.
in my header file, i have a class EQPLUGProcessor and inside that class i call static juce::AudioProcessorValueTreeState::ParameterLayout createParameterLayout();
When i call the function createParameterLayout() in my .cpp i have to write juce::AudioProcessorValueTreeState::ParameterLayout EQPLUGAudioProcessor::createParameterLayout(){}
My question is, why do i have to include the juce::AudioProcessorValueTreeState::ParameterLayout before the actual scope that the function is in ( EQPLUGAudioProcessor)? Surely i should be telling the compiler to look in the EQPLUGAudioProcessor and thats it?
I get that EQPLUGAudioProcessor is the class which its all inside, but still cant seem to understand when, where and why i'd need to clarify the classes that the function comes from again in the .cpp?
Let me know if this requires clarification.
An enclosing namespace or a class does not have to be specified only inside the same namespace or a class:
class store {
class give_me {
// ...
};
static give_me something_cool();
};
Here, the declaration of something_cool() only needs to reference give_me, rather than store::give_me. This is because this declaration appears inside the declaration of its class.
Now that this class is declared, and it's time to define it's class method, everything must be spelled out:
store::give_me store::something_cool()
{
// ...
}
If the class method returned a void instead you'l still have to write something like:
void store::something_cool()
{
// ...
}
You already understand that you can't just write void something_cool() and define this class method. This would only define some unrelated function with this name.
You have to write store::something_cool because this definition no longer appears within the store scope.
Well, the same thing applies not just to class methods but also to inner classes (and also other kinds of symbols that are declared in some enclosing scope). Since give_me is not a class that's declared in global scope, it is an inner class, when in global scope you must reference it as store::give_me.
That's just how C++ works. There are also various complicated rules that define where a scope begins, and ends, with respect to C++'s syntax, that's tangentially related to this. In some cases it is possible to take advantage of these scoping rules and avoding explicit scope references by using an auto declaration with a trailing return type; but how to do that will have to be a different question for some other time.

Why dirty injection is necessary even for code within template's scope?

Please consider the following:
import options
template tpl[T](a: untyped) : Option[T] =
var b {.inject.}: T = 4
a
none(int)
discard
tpl[int]:
echo b
This builds and runs and results in output:
4
But, if you remove the {.inject.} pragma, you get:
...template/generic instantiation from here
Error: undeclared identifier: 'b'
I don't think we can consider the block of code echo b foreign to the "insides" of the template since: it's only used expanded withing the template, and: it's passed as an argument, not used outside.
I'm forced to use something dirty and now I've dirtied my global scope to make this work ?
The way this works makes sense. You want to be explicit about what is available inside the code specified by the user.
You can keep your global scope hygienic by using block:
import options
template tpl[T](a: untyped) : Option[T] =
block:
var b {.inject.}: T = 4
a
none(int)
discard
tpl[int]:
echo b
The reasoning is that the injected names are part of the template's public interface. You don't want the implementation details of the template to leak inside the user code and possibly create naming clashes there, so all variables created inside the template are hidden from the user code by default.
Using inject like this is not considered dirty, it's the right thing to do here.

Use of singletons in compiler

I am writing a compiler for a C++-like language. I have to deal with symbol table which is represented in my code with class that has only static data and methods.
It's like:
class GlobalTable
{
/* static members */
static map<int, Symbol*> symbol_by_id;
static map<Symbol*, int> id_by_symbol;
/* some static methods */
};
Also is have a class which represents config:
class GlobalConfig
{
static const int int_size;
/* and so on ... */
};
I need to access these from a lot of places. Passing it around will result in swelling of code.
Is it convenient to use my class like that, or there is a better way to organize everything?
In a compiler, usually you store a symbol type in a symbol table, for some definition of symbol. Generally, I have a symbol table per scope, so the global scope in each module or compilable unit has its own symbol table.
You'll also have AST nodes that have symbols and symbol subtpes as their members.
Most internal compiler apis deal with symbol and ast subtypes, and those are stored in a indexed, fast access data structure. All of this has to be available to most compiler functions, so you either use a global compiler or context variable, or pass one around as a paramter.
Strictly, there is no need to use singletons. I can only talk in examples, so the way I do it is with a Compiler class that has instance members like currentScope and globalScope as well as the root AST or currentCompilableUnit, so the parser main instantiates a Compiler, and everything is reentrant. It is a bit more convenient, though, if your compiler instance is a global variable, so your functions can omit the compiler parameter, but besides that, not really any need for any others.
In short, it is typical for most APIs to include a "scope" or "compiler" struct or class in many of its signatures. Though if you use OOP to model the compiler, any methods of the class have implicit "this" access.

Class inheritance obsecure

I was going through a old code written by someone i encountered one class defined as
class SomenameofClass::Someanothername of the class
{
//some code goes here
};
what does it mean ?
does it signifies private inheritance ?
This is the definition of a nested class which was declared elsewhere like this:
class SomenameofClass
{
class Someanothername;
};
Usually this is done when the nested class is only used in the implementation of the outer class, so its definition doesn't need to be exposed in a header file.
Nested classes are considered to be within the scope of the enclosing class and are available for use within that scope. To refer to a nested class from a scope other than its immediate enclosing scope, you must use a fully qualified name.
Its nested class, these are used to avoid name conflicts among different scopes. If a class is used by only one class, nest it, it will avoid naming conflicts and you will be notified if there is any conflict by intellisense

Pros and cons of using nested C++ classes and enumerations?

What are the pros and cons of using nested public C++ classes and enumerations? For example, suppose you have a class called printer, and this class also stores information on output trays, you could have:
class printer
{
public:
std::string name_;
enum TYPE
{
TYPE_LOCAL,
TYPE_NETWORK,
};
class output_tray
{
...
};
...
};
printer prn;
printer::TYPE type;
printer::output_tray tray;
Alternatively:
class printer
{
public:
std::string name_;
...
};
enum PRINTER_TYPE
{
PRINTER_TYPE_LOCAL,
PRINTER_TYPE_NETWORK,
};
class output_tray
{
...
};
printer prn;
PRINTER_TYPE type;
output_tray tray;
I can see the benefits of nesting private enums/classes, but when it comes to public ones, the office is split - it seems to be more of a style choice.
So, which do you prefer and why?
Nested classes
There are several side effects to classes nested inside classes that I usually consider flaws (if not pure antipatterns).
Let's imagine the following code :
class A
{
public :
class B { /* etc. */ } ;
// etc.
} ;
Or even:
class A
{
public :
class B ;
// etc.
} ;
class A::B
{
public :
// etc.
} ;
So:
Privilegied Access: A::B has privilegied access to all members of A (methods, variables, symbols, etc.), which weakens encapsulation
A's scope is candidate for symbol lookup: code from inside B will see all symbols from A as possible candidates for a symbol lookup, which can confuse the code
forward-declaration: There is no way to forward-declare A::B without giving a full declaration of A
Extensibility: It is impossible to add another class A::C unless you are owner of A
Code verbosity: putting classes into classes only makes headers larger. You can still separate this into multiple declarations, but there's no way to use namespace-like aliases, imports or usings.
As a conclusion, unless exceptions (e.g. the nested class is an intimate part of the nesting class... And even then...), I see no point in nested classes in normal code, as the flaws outweights by magnitudes the perceived advantages.
Furthermore, it smells as a clumsy attempt to simulate namespacing without using C++ namespaces.
On the pro-side, you isolate this code, and if private, make it unusable but from the "outside" class...
Nested enums
Pros: Everything.
Con: Nothing.
The fact is enum items will pollute the global scope:
// collision
enum Value { empty = 7, undefined, defined } ;
enum Glass { empty = 42, half, full } ;
// empty is from Value or Glass?
Ony by putting each enum in a different namespace/class will enable you to avoid this collision:
namespace Value { enum type { empty = 7, undefined, defined } ; }
namespace Glass { enum type { empty = 42, half, full } ; }
// Value::type e = Value::empty ;
// Glass::type f = Glass::empty ;
Note that C++0x defined the class enum:
enum class Value { empty, undefined, defined } ;
enum class Glass { empty, half, full } ;
// Value e = Value::empty ;
// Glass f = Glass::empty ;
exactly for this kind of problems.
One con that can become a big deal for large projects is that it is impossible to make a forward declaration for nested classes or enums.
If you're never going to be using the dependent class for anything but working with the independent class's implementations, nested classes are fine, in my opinion.
It's when you want to be using the "internal" class as an object in its own right that things can start getting a little manky and you have to start writing extractor/inserter routines. Not a pretty situation.
It seems like you should be using namespaces instead of classes to group like things that are related to each other in this way. One con that I could see in doing nested classes is you end up with a really large source file that could be hard to grok when you are searching for a section.
There are no pros and cons per se of using nested public C++ classes. There are only facts. Those facts are mandated by the C++ standard. Whether a fact about nested public C++ classes is a pro or a con depends on the particular problem that you are trying to solve. The example you have given does not allow a judgement about whether nested classes are appropriate or not.
One fact about nested classes is, that they have privileged access to all members of the class that they belong to. This is a con, if the nested classes does not need such access. But if the nested class does not need such access, then it should not have been declared as a nested class. There are situations, when a class A wants to grant privileged access to certain other classes B. There are three solutions to this problem
Make B a friend of A
Make B a nested class of A
Make the methods and attributes, that B needs, public members of A.
In this situation, it's #3 that violates encapsulation, because A has control over his friends and over his nested classes, but not over classes that call his public methods or access his public attributes.
Another fact about nested classes is, that it is impossible to add another class A::C as a nested class of A unless you are owner of A. However, this is perfectly reasonable, because nested classes have privileged access. If it were possible to add A::C as a nested class of A, then A::C could trick A into granting access to privileged information; and that yould violate encapsulation. It's basically the same as with the friend declaration: the friend declaration does not grant you any special privileges, that your friend is hiding from others; it allows your friends to access information that you are hiding from your non-friends. In C++, calling someone a friend is an altruistic act, not an egoistic one. The same holds for allowing a class to be a nested class.
Som other facts about nested public classes:
A's scope is candidate for symbol lookup of B: If you don't want this, make B a friend of A instead of a nested class. However, there are cases where you want exactly this kind of symbol lookup.
A::B cannot be forward-declared: A and A::B are tightly coupled. Being able to use A::B without knowing A would only hide this fact.
To summarize this: if the tool does not fit your needs, don't blame the tool; blame yourself for using the tool; others might have different problems, for which the tool is perfect.
paercebal said everything I would say about nested enums.
WRT nested classes, my common and almost sole use case for them is when I have a class which is manipulating a specific type of resource, and I need a data class which represents something specific to that resource. In your case, output_tray might be a good example, but I don't generally use nested classes if the class is going to have any methods which are going to be called from outside the containing class, or is more than primarily a data class. I generally also don't nest data classes unless the contained class is not ever directly referenced outside the containing class.
So, for example, if I had a printer_manipulator class, it might have a contained class for printer manipulation errors, but printer itself would be a non-contained class.
Hope this helps. :)
Remember that you can always promote a nested class to a top-level one later, but you may not be able to do the opposite without breaking existing code. Therefore, my advice would be make it a nested class first, and if it starts to become a problem, make it a top-level class in the next version.
For me a big con to having it outside is that it becomes part of the global namespace. If the enum or related class only really applies to the class that it's in, then it makes sense. So in the printer case, everything that includes the printer will know about having full access to the enum PRINTER_TYPE, where it doesn't really need to know about it. I can't say i've ever used an internal class, but for an enum, this seems more logical to keep it inside. As another poster has pointed out, it's also a good idea to to use namespaces to group similar items, since clogging the global namespace can really be a bad thing. I have previously worked on projects which are massive and just bringing up an auto complete list on the global namespace takes 20 minutes. In my opinion nested enums and namespaced classes/structs are probably the cleanest approach.
I agree with the posts advocating for embedding your enum in a class but there are cases where it makes more sense to not do that (but please, at least put it in a namespace). If multiple classes are utilizing an enum defined within a different class, then those classes are directly dependent on that other concrete class (that owns the enum). That surely represents a design flaw since that class will be responsible for that enum as well as other responsibilities.
So, yeah, embed the enum in a class if other code only uses that enum to interface directly with that concrete class. Otherwise, find a better place to keep the enum such as a namespace.
If you put the enum into a class or a namespace, intellisense will be able to give you guidance when you're trying to remember the enum names. A small thing for sure, but sometimes the small things matter.
Visual Studio 2008 does not seem to be able to provide intellisense for nested classes, so I have switched to the PIMPL idiom in most cases where I used to have a nested class. I always put enums either in the class if it is used only by that class, or outside the class in the same namespace as the class when more than one class uses the enum.
I can see a con for nested classes, that one may better use generic programming.
If the little class is defined outside the big one, you can make the big class a class template and use any "little" class you may need in the future with the big class.
Generic programming is a powerful tool, and, IMHO, we should keep it in mind when developing extensible programs. Strange, that no one has mentioned this point.
Only problem with nested classes that I bumped into yet was that C++ does not let us refer to the object of the enclosing class, in the nested class functions. We cannot say "Enclosing::this"
(But maybe there's a way?)