Dealing with circular dependencies in OCaml - ocaml

I'm writing an interpreter for an experimental language. Three of the main constructs of the language are definitions, statements, and expressions. Definitions can contain statements and expressions, statements can contain definitions and expressions, and one kind of expression can contain statements. I represent all of these using union types so I can easily use pattern matching on them. Ideally, I would like to put the code for these in different files, but OMake complains about circular dependency issues. As far as I know, circular type definitions across modules are not allowed.
The only way I know of to solve this is to define all three types at once:
type defn = ...
and stmt = ...
and expr = ...
It seems like this requires all the code for types to be in the same file. Is there any way around this? How do you deal with circular definitions in your code?

Recursive definitions need to appear in the same file. If you want to separate definitions, statements, and expressions into separate modules, you can do so using recursive modules, but they will still need to appear in the same file. DAG-ifying inter-file dependencies is one of the annoyances of OCaml.

This is easily solved by parameterizing your types over the types they refer to:
type ('stmt, 'expr) defn = ...
type ('defn, 'expr) stmt = ...
type ('defn, 'stmt) expr = ...
This technique is called "untying the recursive knot" (in reference to Gordian's knot) and was described in an OCaml Journal article.
Cheers,
Jon Harrop.

Another solution often used is to abstract the types in the interfaces. Since the types are abstract in the interfaces, these interfaces are not recursively dependent. In the implementations, you can specify the types, and since the implementations depend only on the interfaces, they are not recursive either.
The only problem is that, with this solution, you cannot anymore pattern-matching on these types outside of their implementation.
Personally, but it is probably a matter of taste, I like to have all the types of my program defined in one module (I think it helps in the readability of the program). So, this restriction of OCaml is not really a problem for me.

Related

power(C++ - {templates}) = power(C++)?

I want to know whether it is possible to do generic programming in C++ without using templates. Is it possible to write all libraries available in C++ which are written using templates without using templates. Is there any alternative available in C++ for templates?
I want to know Is it possible to write an abstraction over libraries written in C++ using templates which provide me with the same functionality
Theoretically, C++ without templates is still Turing-complete, so you can write a program for every function in that language that can also be written in C++ with templates. To my knowledge, the macro preprocessor in C++ is not Turing-complete, but templates are. So there must exist functions which can be implemented purely as templates, but not with macros.
Practically, I don't think it is possible to re-implement everything with the same semantics. Without templates, you probably will have to sacrifice type-safety and stick to using macros, void* or inheritance-based approaches like the early Java classes did even for simple container libraries.
For more advanced meta-programming libraries, e.g. expression templates, dimensional analysis frameworks, Boost.Spirit Boost.Proto, I doubt that they can be implemented without another form of meta-programming. Macros may work, but this will be more like a code-generator and defer type-checking to the compiler and error messages will be even worse than what we have right now with templates. In addition, the semantics are different w.r.t parameter passing.
Well, templates are just that — templates. They are blueprints for actual types and functions. So, theoretically, you can make all of those template instantiations by hand. But that wouldn't be generic programming any more.
Answer for question:
Is there any alternative available in C++ for templates?
Macroses is alternative of templates. (Not good alternative, but alternative)
Related links [1],[2]
Compare:
#define min(i, j) (((i) < (j)) ? (i) : (j))
template<class T> T min (T i, T j) { return ((i < j) ? i : j) }
Problems of macroses:
no type checking,
not so understandable compiler errors becouse macroses expanding
sideffects of multiple computing expressions
About question:
Is it possible to write all libraries available in C++ which are written using templates without using templates.
It is possible to use macroses in some cases. It is possible to write or generate implementation for each library type. But for user defined type library can not have implementation, except simple cases when macroses may be useful.
Earlier pure C (not C++) programs have contain special tools in sources that was used at build stage to generate some sources from some "presource" templates.

A compile time ordering on types

I've been looking for a way to get an ordering on types at compile time. This would be useful, for example, for implementing (efficient) compile-time type-sets.
One obvious way to do it would be if there were a way to map every type to a unique integer. An answer to a previous question on that topic succinctly captures why that's difficult, and it seems like it would apply equally to any other way of trying to get an ordering:
the compiler has no way of knowing all compilation units and the linker has no concept of a type
Indeed, the challenge to the compiler would be considerable: it has to make sure that, in any invocation, for any source file, it returns the same integer for a given type / it returns the same ordering between any two given types, but at the same time, the universe of types is open and it has no knowledge of any types outside of the current file. A hard problem.
The idea I had is that types have names. And by the laws of C++, as far as I know the fully qualified name of a type must be unique across the entire program, otherwise you will get errors or undefined behaviour of some sort or another.
If two types have the same name, then they are the same type.
If two types are the same type, then either they have the same name, or they are typedefs for one another. The compiler has full knowledge of typedefs.
Names are strings, and strings have an ordering. So if I have it right, you could define a globally consistent ordering on types based on their names. More specifically, the ordering between any two types would be the ordering between the names of the types with the typedefs fully resolved. (Having a type behave differently from its typedefs would be problematic.)
Of course, standard C++ doesn't have any facilities for retrieving the names of types.
My questions are:
Do I have anything wrong? Are there any reasons this wouldn't, in theory, work?
Are there any compilers which give you access to the names of types (and ideally their typedef-resolved forms) at compile time as a language extension?
Is there any other way it could be done? Are there any compilers which do?
(I recognize that it's not polite to ask more than one question in the same question, but it seemed strange to post three separate questions with the same basic throat-clearing preceding them.)
the fully qualified name of a type must be unique across the entire program
But of course, that's only true if you consider seperate anonymous namespaces in different translation units to have different names in some sense, and have some way to figure out what they are.
The only sense in which I'm aware they really do have different names is in mangled linker symbols; you may (depending on the compiler) be able to get that from type_info::name(), but it isn't guaranteed, is limited to types with RTTI, and anyway doesn't seem to be declared as a constexpr so you can't use the value at compile time.
The ordering produced by type_info::before() naturally has the same limitations.
Out of interest, what are you trying to achieve with your compile-time type ordering?

Memory management for types in complex languages

I've come across a slight problem for writing memory management with regard to the internal representation of types in a compiler for statically typed, complex languages. Consider a simple snippet in C++ which easily demonstrates a type that refers to itself.
class X {
void f(const X&) {}
};
Types can have nearly infinitely complex relationships to each other. So, as a compiler process, how do you make sure that they are properly collected?
So far, I've decided that garbage collection might be the right way to go, which I wouldn't be too happy with because I want to write the compiler in C++, or alternatively, just leave them and never collect them for the life of the compile phase for which they are needed (which has a very fixed lifetime) and then collect them all afterwards. The problem with that is that if you had a lot of complex types, you could lose a lot of memory that way.
Memory management is easy, just have some table type-name -> type-descriptor for each declaration scopes. Types are uniquely identified by name, no matter how complex the nesting is. Even a recursive type is still only a single type. As tp1 says correctly, you typically perform multiple passes to fill in all blanks. For instance, you might check that a type name is known in the first pass and then compute all links, later on, you compute the type.
Keep in mind that languages like C don't have a really complex type system -- even though they have pointers (which allow for recursive types), there is not much type computation going on.
I think you can remove the cycles from the dependency graph by using separate objects to represent declarations and definitions. Assuming a type system similar to C++, you will then have a hierarchical dependency:
Function definitions depend on type definitions and function declarations
Type definitions depend on function and type declarations (and definitions of contained types)
Function declarations depend on type declarations
In your example, the dependency graph is f_def -> X_def -> f_decl -> X_decl.
With no cycles in the graph, you can manage objects using simple reference counting.

Does a compiler collapse classes which are identical in their structure?

I hope this isn't a duplicate of a question itself, but the search terms are so ambiguous, I can't think of anything better.
Say we have two classes:
class FloatRect
{
float x,y,width,height;
};
and somewhere else
class FloatBox
{
float top,left,bottom,right;
};
From a practical standpoint, they're the same, so does the compiler treat them both as some sort of typedef?
Or will it produce two separate units of code?
I'm curious because I'd like to go beyond typedefs and make a few variants of a type to improve readability.
I don't want needless duplication, though...
This is completely implementation specific.
For example I can use CLang / LLVM to illustrate both point of view at once:
CLang is the C++ front-end, it uses two distinct types to resolve function calls etc... and treats them as completely different values
LLVM is the optimizer backend, it doesn't care (yet) about names, but only structural representation, and will therefore collapse them in a single type... or even entirely remove the time definition if useless.
If the question is about: does introducing a similarly laid-out class creates overhead, then the answer is no, so write the classes that you need.
Note: the same happens for functions, ie the optimizer can merge blocks of functions that are identical to get tighter code, this is not a reason to copy/paste though
They are totally unrelated classes with regards to the compiler.
If they are just POD C-structs, it won't actually generate any real code for them as such. (Yes there is a silent assignment operator and some other functions but I doubt there will be code actually compiled to do it, it will just inline them if they are used).
Since the classes you use as samples are only relevant during compilation, there's nothing to duplicate or collapse. Runtime, the member variables are simply accessed as "the value at at offset N".
This is, of course, hugely implementation-specific.
Any internal collapse here would be completely internal to the mechanism of the compiler, and would not have an effect on the produced translated code.
I would imagine it's very unlikely that this is the case, as I can think of no benefit and several ways in which this would really complicate matters. I can't present any evidence, though.
No. As they are literally two different types.
The compiler must treat them that way.
There is no magic merging going on.
No they are not treated as typedefs, because they are different types and can for example be used for overloading functions.
On the other hand, the types have no code in them so there will be nothing to duplicate.

Why and how should I use namespaces in C++?

I have never used namespaces for my code before. (Other than for using STL functions)
Other than for avoiding name conflicts, is there any other reason to use namespaces?
Do I have to enclose both declarations and definitions in namespace scope?
One reason that's often overlooked is that simply by changing a single line of code to select one namespaces over another you can select an alternative set of functions/variables/types/constants - such as another version of a protocol, or single-threaded versus multi-threaded support, OS support for platform X or Y - compile and run. The same kind of effect might be achieved by including a header with different declarations, or with #defines and #ifdefs, but that crudely affects the entire translation unit and if linking different versions you can get undefined behaviour. With namespaces, you can make selections via using namespace that only apply within the active namespace, or do so via a namespace alias so they only apply where that alias is used, but they're actually resolved to distinct linker symbols so can be combined without undefined behaviour. This can be used in a way similar to template policies, but the effect is more implicit, automatic and pervasive - a very powerful language feature.
UPDATE: addressing marcv81's comment...
Why not use an interface with two implementations?
"interface + implementations" is conceptually what choosing a namespace to alias above is doing, but if you mean specifically runtime polymorphism and virtual dispatch:
the resultant library or executable doesn't need to contain all implementations and constantly direct calls to the selected one at runtime
as one implementation's incorporated the compiler can use myriad optimisations including inlining, dead code elimination, and constants differing between the "implementations" can be used for e.g. sizes of arrays - allowing automatic memory allocation instead of slower dynamic allocation
different namespaces have to support the same semantics of usage, but aren't bound to support the exact same set of function signatures as is the case for virtual dispatch
with namespaces you can supply custom non-member functions and templates: that's impossible with virtual dispatch (and non-member functions help with symmetric operator overloading - e.g. supporting 22 + my_type as well as my_type + 22)
different namespaces can specify different types to be used for certain purposes (e.g. a hash function might return a 32 bit value in one namespace, but a 64 bit value in another), but a virtual interface needs to have unifying static types, which means clumsy and high-overhead indirection like boost::any or boost::variant or a worst case selection where high-order bits are sometimes meaningless
virtual dispatch often involves compromises between fat interfaces and clumsy error handling: with namespaces there's the option to simply not provide functionality in namespaces where it makes no sense, giving a compile-time enforcement of necessary client porting effort
Here is a good reason (apart from the obvious stated by you).
Since namespace can be discontiguous and spread across translation units, they can also be used to separate interface from implementation details.
Definitions of names in a namespace can be provided either in the same namespace or in any of the enclosing namespaces (with fully qualified names).
It can help you for a better comprehension.
eg:
std::func <- all function/class from C++ standard library
lib1::func <- all function/class from specific library
module1::func <-- all function/class for a module of your system
You can also think of it as module in your system.
It can also be usefull for an writing documentation (eg: you can easily document namespace entity in doxygen)
Aren't name collisions enough of a reason? ADL subtleties, especially with operator overloads, are another.
That's the easiest way. You can also prefix names with the namespace, e.g. my_namespace::name, when defining.
You can think of namespaces as logical separated units for your application, and logical here means that suppose we have two different classes, putting these two classes each in a file, but when you notice that these classes share something enough to be categorized under one category, that's one strong reason to use namespaces.
Answer: If you ever want to overload the new, placement new, or delete functions you're going to want to do them in a namespace. No one wants to be forced to use your version of new if they don't require the things you require.
Yes