Compiling SML projects from multiple files - sml

I've got a project with many files in it and I want it to work with most popular compilers.
Unfortunately, PolyML and SML/NJ require use statements, while MosML additionally requires explicitly loading basis library structures using load, which is not recognised by either poly or sml.
On top of that, MLton and MLKit require a completely different .mlb file simply listing filenames and also require an explicit import of basis library, which is done in a different way to MosML:
$(SML_LIB)/basis/basis.mlb
Is there some standard universal "include this file" command, and if it doesn't exist, is there some other way to have all compilers read from one entry-point file?
P.S. Wouldn't mind someone going on a small rant about compiler differences. I'm always interested in what people think and there's not too much info available :-)

The use function is the standard universal "include this file" command,
included in the Top-level environment
val use : string -> unit implementation dependent
I generally maintain the build environment in smlnj's CM,
then convert to mlb with cm2mlb. It will define a flag MLton
when parsing the sources.cm file so that you can use that to work around differences in module loading behavior.
#if(defined(MLton))
runmain.sml
#endif
There is also a set of sml-buildscripts which converts from
mlb to polyml. I am not familiar with them nor polyml however
CM is convenient as the authoritative source, since it provides programmatic access from SML via the structure CM.
This is what cm2mlb uses, So while i'm not aware of anything which exists already that converts from CM to polyml, it should be possible.

Related

Config file location and binaries and build systems like autoconf

Most build systems, like autoconf/automake, allow the user to specify a target directory to install the various files needed to run a program. Usually this includes binaries, configuration files, auxilliary scripts, etc.
At the same time, many executables often need to read from a configuration file in order to allow a user to modify runtime settings.
Ultimately, a program (let's say, a compiled C or C++ program) needs to know where to look to read in a configuration file. A lot of times I will just hardcode the path as something like /etc/MYPROGAM/myprog.conf, which of course is not a great idea.
But in the autoconf world, the user might specify an install prefix, meaning that the C/C++ code needs to somehow be aware of this.
One solution would be to specify a C header file with a .in prefix, which simply is used to define the location of the config file, like:
const char* config_file_path = "#CONFIG_FILE_PATH#"; // `CONFIG_FILE_PATH` is defined in `configure.ac`.
This file would be named something like constants.h.in and it would have to be process by the configure.ac file to output an actual header file, which could then be included by whatever .c or .cpp files need it.
Is that the usual way this sort of thing is handled? It seems a bit cumbersome, so I wonder if there is a better solution.
There are basically two choices for how to handle this.
One choice is to do what you've mentioned -- compile the relevant paths into the resulting executable or library. Here it's worth noting that if files are installed in different sub-parts of the prefix, then each such thing needs its own compile-time path. That's because the user might specify --prefix separately from --bindir, separately from --libexecdir, etc. Another wrinkle here is that if there are multiple installed programs that refer to each other, then this process probably should take into account the program name transform (see docs on --program-transform-name and friends).
That's all if you want full generality of course.
The other approach is to have the program be relocatable at runtime. Many GNU projects (at least gdb and gcc) take this approach. The idea here is for the program to attempt to locate its data in the filesystem at runtime. In the projects I'm most familiar with, this is done with the libiberty function make_relative_prefix; but I'm sure there are other ways.
This approach is often touted as being nicer because it allows the program's install tree to be tared up and delivered to users; but in the days of distros it seems to me that it isn't as useful as it once was. I think the primary drawback of this approach is that it makes it very hard, if not impossible, to support both relocation and the full suite of configure install-time options.
Which one you pick depends, I think, on what your users want.
Also, to answer the above comment: I think changing the prefix between configure- and build time is not really supported, though it may work with some packages. Instead the usual way to handle this is either to require the choice at configure time, or to supported the somewhat more limited DESTDIR feature.

SML more or less large systems: compilers and interpreters interoperability

This is about programming in the large with SML. First a summary of what's seems to be available for that purpose, then a tiny summary, then finally, the simple question.
The use pseudo‑clause
Top-level type, exception, and value identifiers (standardml.org)
Note that the use function is special. Although
not defined precisely, its intended purpose is
to take the pathname of a file and treat the
contents of the file as SML source code typed
in by the user. It can be used as a simple build
mechanism, especially for interactive sessions.
Most implementations will provide a more sophisticated
build mechanism for larger collections of source
files. Implementations are not required to supply
a use function.
Then later
val use : string -> unit (* implementation dependent *)
Its drawbacks are: not supported by MLton at least, and while not standardized, seems to have the same behaviour with all major SML systems, which is to reload a unit as many times as a use is encountered for it, which is not OK due to the generative semantic of SML (defining a structure multiple times, will result into as much different definitions, which is especially wrong with types definitions).
ML Basis Files
There exist so called “ML Basis Files”: MLBasis (mlton.org) and ML‑Kit ML Basis Files (sourceforge.net).
The load pseudo‑clause
MoscowML has load which acts like use which uses only once, i.e. does not reload a unit if it's already loaded, which is what's expected to compose a system.
Summary
load is nice, but only recognized by MoscowML
MLBasis Files may be nice, but it's not recognized by neither Poly/ML nor Moscow ML
MLton does not recognize use
Putting everything in a single big bundle file, is the only one interoperable thing working with all compilers and interpreters; that works, but that quickly become a burden.
The question
Is there a known interoperable way to compose a system made of multiple SML source files?
One system you did not mention is SML/NJ's Compilation Manager (CM), which is quite powerful. And there are a few other, less known systems.
But that notwithstanding, the situation is indeed dire. There simply is no standardised separate compilation mechanism for SML. In practice that means that writing portable Makefiles or something alike is rather painful.
For HaMLet I went through that pain, in order to make it compile with 7 different SML implementations. The approach is to use a restricted (dependency-ordered) CM file and the necessary amount of make + sed hackery to generate meta files for other systems from that. It can also generate a file containing respective 'use' invocations for all the sources, for all other systems that at least support that. All in all it's not pretty, but works sufficiently well.

parser generator that generates stand-alone C++ code

Is there a LALR parser generator that produces stand-alone C++ code? I am hoping that it would generate two files named something like "Parser.cpp" and "Parser.hpp," and the generated parser is implemented in a single class (that I can wrap in whatever namespace) that I can use for my parsing needs.
I want to use it for fun (i.e. small personal projects), and I'd like the output to be stand-alone (without any headers) so that I know I can compile it wherever I have a C++ compiler.
The search so far:
I've looked at flex/bison, but AFAIK they both require special headers and libraries. I've also looked at ANTLR a little bit, but it is not obvious to me that it can generate stand-alone C++ code. If someone can confirm that it can, then I might look more into it.
GOLD Parser (Bart Kiers mentioned the list on Wikipedia) has support for C and C++ languages. It does not generate a completely self-contained C/C++ source code file. All it does is the generation of Lexer/Parser tables which can be consumed by the "parsing engine".
To accomplish your task (or something similar) I did the following:
Prepare your LALR grammar in Gold's format
Generate parsing tables (one binary file)
Use an old trick to convert the binary file into a header file like
unsigned char ParseTable[] = { ... };
Modify the loader from the "parsing engine" sources (or use the C version which supports in-memory loading, as I remember)
Combine the sources for the GPEngine (if it is a C++ version) into the .h/.cpp pair.
Append the ParseTable to .cpp
Sure, it's not that straightforward, but all the steps can in principle be done within a single "combine" script which can be used with a number of grammars.
I guess the major drawback is the fact that GOLD is closed-source and windows-only (it means that to produce the parsing tables you have to use Windows machine).
ANTLR can generate C++ code although IMHO I find the support for C++ is a bit weak, it is more like C code. Still it is a good environment to work with ANTLRWorks giving you a graphical representation of your syntax tree.
The output from flex+bison consists of two .c files and one .h file. These are completely stand-alone, in that they are all you need to compile into your application to make use of the parser. There are no additional libraries or headers needed (beside the standard C ones).
Unless I've misunderstood your requirements, you definitely can do what you want with flex+bison.

Parsing c++ function headers from a file using GNU toolchain

I need to parse function headers from a .i file used by SWIG which contains all sorts of garbage beside the function headers. (final output would be a list of function declarations)
The best option for me would be using the GNU toolchain (GCC, Binutils, etc..) to do so, but i might be missing an easy way of doing it with SWIG. If I am please tell me!
Thanks :]
edit: I also don't know how to do that with GCC toolchain, if you have an idea it will be great.
I would try getting an XML dump of the abstract syntax tree either from clang or from gccxml. From there you can easily extract the function declarations you are interested in.
Our DMS Software Reengineering Toolkit provides general purpose program parsing, analysis, and transformation capability. It has front ends for a wide variety of languages, including C++.
It has been used to analyze and transforms very complex C++ programs and their header files.
You aren't clear as to what you will do after you "parse the function headers"; normally people want to extract some information or produce another artifact. DMS with its C++ front end can do the parsing; you can configure DMS to do the custom stuff.
As a practical matter, this isn't usually an afternoon's exercise; DMS is a complex beast, because it has to deal with complex beasts such as C++. And I'd expect you to face the same kind of complexity for any tool that can handle C++. The GCC toolchain can clearly handle C++, so you might be able to do it with that (at that same level of complexity) but GCC is designed to be a compiler, and IMHO you will find it a fight to get it do what you want.
Your "output function declarations" goal isn't clear. You want just the function names? You want a function signature? You want all the type declarations on which the function depends? You want all the type declarations on which the function depends, if they are not already present in an existing include file you intend to use?
The best way to extract function decls from the garbage which is C header files is to substitute out what constitutes the most smelly garbage: macros. You can do that with:
cpp - The C Preprocessor

Where do I learn "what I need to know" about C++ compilers?

I'm just starting to explore C++, so forgive the newbiness of this question. I also beg your indulgence on how open ended this question is. I think it could be broken down, but I think that this information belongs in the same place.
(FYI -- I am working predominantly with the QT SDK and mingw32-make right now and I seem to have configured them correctly for my machine.)
I knew that there was a lot in the language which is compiler-driven -- I've heard about pre-compiler directives, but it seems like someone would be able to write books the different C++ compilers and their respective parameters. In addition, there are commands which apparently precede make (like qmake, for example (is this something only in QT)).
I would like to know if there is any place which gives me an overview of what compilers are out there, and what their different options are. I'd also like to know how each of them views Makefiles (it seems that there is a difference in syntax between them?).
If there is no website regarding, "Everything you need to know about C++ compilers but were afraid to ask," what would be the best way to go about learning the answers to these questions?
Concerning the "numerous options of the various compilers"
A piece of good news: you needn't worry about the detail of most of these options. You will, in due time, delve into this, only for the very compiler you use, and maybe only for the options that pertain to a particular set of features. But as a novice, generally trust the default options or the ones supplied with the make files.
The broad categories of these features (and I may be missing a few) are:
pre-processor defines (now, you may need a few of these)
code generation (target CPU, FPU usage...)
optimization (hints for the compiler to favor speed over size and such)
inclusion of debug info (which is extra data left in the object/binary and which enables the debugger to know where each line of code starts, what the variables names are etc.)
directives for the linker
output type (exe, library, memory maps...)
C/C++ language compliance and warnings (compatibility with previous version of the compiler, compliance to current and past C Standards, warning about common possible bug-indicative patterns...)
compile-time verbosity and help
Concerning an inventory of compilers with their options and features
I know of no such list but I'm sure it probably exists on the web. However, suggest that, as a novice you worry little about these "details", and use whatever free compiler you can find (gcc certainly a great choice), and build experience with the language and the build process. C professionals may likely argue, with good reason and at length on the merits of various compilers and associated runtine etc., but for generic purposes -and then some- the free stuff is all that is needed.
Concerning the build process
The most trivial applications, such these made of a single unit of compilation (read a single C/C++ source file), can be built with a simple batch file where the various compiler and linker options are hardcoded, and where the name of file is specified on the command line.
For all other cases, it is very important to codify the build process so that it can be done
a) automatically and
b) reliably, i.e. with repeatability.
The "recipe" associated with this build process is often encapsulated in a make file or as the complexity grows, possibly several make files, possibly "bundled together in a script/bat file.
This (make file syntax) you need to get familiar with, even if you use alternatives to make/nmake, such as Apache Ant; the reason is that many (most?) source code packages include a make file.
In a nutshell, make files are text files and they allow defining targets, and the associated command to build a target. Each target is associated with its dependencies, which allows the make logic to decide what targets are out of date and should be rebuilt, and, before rebuilding them, what possibly dependencies should also be rebuilt. That way, when you modify say an include file (and if the make file is properly configured) any c file that used this header will be recompiled and any binary which links with the corresponding obj file will be rebuilt as well. make also include options to force all targets to be rebuilt, and this is sometimes handy to be sure that you truly have a current built (for example in the case some dependencies of a given object are not declared in the make).
On the Pre-processor:
The pre-processor is the first step toward compiling, although it is technically not part of the compilation. The purposes of this step are:
to remove any comment, and extraneous whitespace
to substitute any macro reference with the relevant C/C++ syntax. Some macros for example are used to define constant values such as say some email address used in the program; during per-processing any reference to this constant value (btw by convention such constants are named with ALL_CAPS_AND_UNDERSCORES) is replace by the actual C string literal containing the email address.
to exclude all conditional compiling branches that are not relevant (the #IFDEF and the like)
What's important to know about the pre-processor is that the pre-processor directive are NOT part of the C-Language proper, and they serve several important functions such as the conditional compiling mentionned earlier (used for example to have multiple versions of the program, say for different Operating Systems, or indeed for different compilers)
Taking it from there...
After this manifesto of mine... I encourage to read but little more, and to dive into programming and building binaries. It is a very good idea to try and get a broad picture of the framework etc. but this can be overdone, a bit akin to the exchange student who stays in his/her room reading the Webster dictionary to be "prepared" for meeting native speakers, rather than just "doing it!".
Ideally you shouldn't need to care what C++ compiler you are using. The compatability to the standard has got much better in recent years (even from microsoft)
Compiler flags obviously differ but the same features are generally available, it's just a differently named option to eg. set warning level on GCC and ms-cl
The build system is indepenant of the compiler, you can use any make with any compiler.
That is a lot of questions in one.
C++ compilers are a lot like hammers: They come in all sizes and shapes, with different abilities and features, intended for different types of users, and at different price points; ultimately they all are for doing the same basic task as the others.
Some are intended for highly specialized applications, like high-performance graphics, and have numerous extensions and libraries to assist the engineer with those types of problems. Others are meant for general purpose use, and aren't necessarily always the greatest for extreme work.
The technique for using each type of hammer varies from model to model—and version to version—but they all have a lot in common. The macro preprocessor is a standard part of C and C++ compilers.
A brief comparison of many C++ compilers is here. Also check out the list of C compilers, since many programs don't use any C++ features and can be compiled by ordinary C.
C++ compilers don't "view" makefiles. The rules of a makefile may invoke a C++ compiler, but also may "compile" assembly language modules (assembling), process other languages, build libraries, link modules, and/or post-process object modules. Makefiles often contain rules for cleaning up intermediate files, establishing debug environments, obtaining source code, etc., etc. Compilation is one link in a long chain of steps to develop software.
Also, many development environments abstract the makefile into a "project file" which is used by an integrated development environment (IDE) in an attempt to simplify or automate many programming tasks. See a comparison here.
As for learning: choose a specific problem to solve and dive in. The target platform (Linux/Windows/etc.) and problem space will narrow the choices pretty well. Which you choose is often linked to other considerations, such as working for a particular company, or being part of a team. C++ has something like 95% commonality among all its flavors. Learn any one of them well, and learning the next is a piece of cake.