Writing a function with type 'a -> string - ocaml

For debugging purposes I'd like to have a function in OCaml that converts to string arbitrary type, the debugger currently has one, but it'd be cool to have one.
The sexplib library would be perfect, but the fact is that I can't modify all the types I need to add with sexp, and I can't use camlp4 either.
Is there any such function? (It won't be on production code so I accept dirty solutions)
Something like Haskell's Show typeclass would be exactly what I mean.
Thanks for your time

The Std module in Batteries Included provides a dump function which converts arbitrary types to readable strings. It is somewhat limited - as it does not know about types, it cannot print constructors for variant types properly and replaces them with numbers - but it can still be pretty helpful. Since type information is not available at runtime, that's about as good as you can do. The debugger and toplevel use compiler trickery to obtain better representations, but that is difficult if not impossible to do in a general library.
I seem to remember also seeing a more sophisticated dumping library somewhere, but I do not recall where.

Related

What is the purpose of boost::fusion?

Ive spent the day reading notes and watching a video on boost::fusion and I really don't get some aspects to it.
Take for example, the boost::fusion::has_key<S> function. What is the purpose of having this in boost::fusion? Is the idea that we just try and move as much programming as possible to happen at compile-time? So pretty much any boost::fusion function is the same as the run-time version, except it now evaluates at compile time? (and we assume doing more at compile-time is good?).
Related to boost::fusion, i'm also a bit confused why metafunctions always return types. Why is this?
Another way to look at boost::fusion is to think of it as "poor man introspection" library. The original motivation for boost::fusion comes from the direction of boost::spirit parser/generator framework, in particular the need to support what is called "parser attributes".
Imagine, you've got a CSV string to parse:
aaaa, 1.1
The type, this string parses into, can be described as "tuple of string and double". We can define such tuples in "plain" C++, either with old school structs (struct { string a; double b; } or newer tuple<string, double>). The only thing we miss is some sort of adapter, which will allow to pass tuples (and some other types) of arbitrary composition to a unified parser interface and expect it to make sense of it without passing any out of band information (such as string parsing templates used by scanf).
That's where boost::fusion comes into play. The most straightforward way to construct a "fusion sequence" is to adapt a normal struct:
struct a {
string s;
double d;
};
BOOST_FUSION_ADAPT_STRUCT(a, (string, s)(double, d))
The "ADAPT_STRUCT" macro adds the necessary information for parser framework (in this example) to be able to "iterate" over members of struct a to the tune of the following questions:
I just parsed a string. Can I assign it to first member of struct a?
I just parsed a double. Can I assign it to second member of struct a?
Are there any other members in struct a or should I stop parsing?
Obviously, this basic example can be further extended (and boost::fusion supplies the capability) to address much more complex cases:
Variants - let's say parser can encounter either sting or double and wants to assign it to the right member of struct a. BOOST_FUSION_ADAPT_ASSOC_STRUCT comes to the rescue (now our parser can ask questions like "which member of struct a is of type double?").
Transformations - our parser can be designed to accept certain types as parameters but the rest of the programs had changed quite a bit. Yet, fusion metafunctions can be conveniently used to adapt new types to old realities (or vice versa).
The rest of boost::fusion functionality naturally follows from the above basics. fusion really shines when there's a need for conversion (in either direction) of "loose IO data" to strongly typed/structured data C++ programs operate upon (if efficiency is of concern). It is the enabling factor behind spirit::qi and spirit::karma being such an efficient (probably the fastest) I/O frameworks .
Fusion is there as a bridge between compile-time and run-time containers and algorithms. You may or may not want to move some of your processing to compile-time, but if you do want to then Fusion might help. I don't think it has a specific manifesto to move as much as possible to compile-time, although I may be wrong.
Meta-functions return types because template meta-programming wasn't invented on purpose. It was discovered more-or-less by accident that C++ templates can be used as a compile-time programming language. A meta-function is a mapping from template arguments to instantiations of a template. As of C++03 there were are two kinds of template (class- and function-), therefore a meta-function has to "return" either a class or a function. Classes are more useful than functions, since you can put values etc. in their static data members.
C++11 adds another kind of template (for typedefs), but that is kind of irrelevant to meta-programming. More importantly for compile-time programming, C++11 adds constexpr functions. They're properly designed for the purpose and they return values just like normal functions. Of course, their input is not a type, so they can't be mappings from types to something else in the way that templates can. So in that sense they lack the "meta-" part of meta-programming. They're "just" compile-time evaluation of normal C++ functions, not meta-functions.

Creating serializeable unique compile-time identifiers for arbitrary UDT's

I would like a generic way to create unique compile-time identifiers for any C++ user defined types.
for example:
unique_id<my_type>::value == 0 // true
unique_id<other_type>::value == 1 // true
I've managed to implement something like this using preprocessor meta programming, the problem is, serialization is not consistent. For instance if the class template unique_id is instantiated with other_type first, then any serialization in previous revisions of my program will be invalidated.
I've searched for solutions to this problem, and found several ways to implement this with non-consistent serialization if the unique values are compile-time constants. If RTTI or similar methods, like boost::sp_typeinfo are used, then the unique values are obviously not compile-time constants and extra overhead is present. An ad-hoc solution to this problem would be, instantiating all of the unique_id's in a separate header in the correct order, but this causes additional maintenance and boilerplate code, which is not different than using an enum unique_id{my_type, other_type};.
A good solution to this problem would be using user-defined literals, unfortunately, as far as I know, no compiler supports them at this moment. The syntax would be 'my_type'_id; 'other_type'_id; with udl's.
I'm hoping somebody knows a trick that allows implementing serialize-able unique identifiers in C++ with the current standard (C++03/C++0x), I would be happy if it works with the latest stable MSVC and GNU-G++ compilers, although I expect if there is a solution, it's not portable.
I would like to make clear, that using mpl::set or similar constructs like mpl::vector and filtering, does not solve this problem, because the scope of the meta-set/vector is limited and actually causes more problems than just preprocessor meta programming.
A while back I added a build step to one project of mine, which allowed me to write #script_name(args) in a C++ source file and have it automatically replaced with the output of the associated script, for instance ./script_name.pl args or ./script_name.py args.
You may balk at the idea of polluting the language into nonstandard C++, but all you'd have to do is write #sha1(my_type) to get the unique integer hash of the class name, regardless of build order and without the need for explicit instantiation.
This is just one of many possible nonstandard solutions, and I think a fairly clean one at that. There's currently no great way to impose an arbitrary, consistent ordering on your classes without just specifying it explicitly, so I recommend you simply give in and go the explicit instantiation route; there's nothing really wrong with centralising the information, but as you said it's not all that different from an enumeration, which is what I'd actually use in this situation.
Persistence of data is a very interesting problem.
My first question would be: do you really want serialization ? If you are willing to investigate an alternative, then jump to the next section.
If you're still there, I think you have not given the typeid solution all its due.
// static detection
template <typename T>
size_t unique_id()
{
static size_t const id = some_hash(typeid(T)); // or boost::sp_typeinfo
return id;
}
// dynamic detection
template <typename T>
size_t unique_id(T const& t)
{
return some_hash(typeid(t)); // no memoization possible
}
Note: I am using a local static to avoid the order of initialization issue, in case this value is required before main is entered
It's pretty similar to your unique_id<some_type>::value, and even though it's computed at runtime, it's only computed once, and the result (for the static detection) is then memoized for future calls.
Also note that it's fully generic: no need to explicitly write the function for each type.
It may seem silly, but the issue of serialization is that you have a one-to-one mapping between the type and its representation:
you need to version the representation, so as to be able to decode "older" versions
dealing with forward compatibility is pretty hard
dealing with cyclic reference is pretty hard (some framework handle it)
and then there is the issue of moving information from one to another --> deserializing older versions becomes messy and frustrating
For persistent saves, I usually recommend using a dedicated BOM. Think of the saved data as a message to your future self. And I usually go the extra mile and proposes the awesome Google Proto Buffer library:
Backward and Forward compatibility baked-in
Several format outputs -> human readable (for debug) or binary
Several languages can read/write the same messages (C++, Java, Python)
Pretty sure that you will have to implement your own extension to make this happen, I've not seen nor heard of any such construct for compile-time. MSVC offers __COUNTER__ for the preprocessor but I know of no template equivalent.

Is boost::variant rocket science? (And should I therefore avoid it for simple problems?)

OK, so I have this tiny little corner of my code where I'd like my function return either of (int, double, CString) to clean up the code a bit.
So I think: No problem to write a little union-like wrapper struct with three members etc. But wait! Haven't I read of boost::variant? Wouldn't this be exactly what I need? This would save me from messing around with a wrapper struct myself! (Note that I already have the boost library available in my project.)
So I fire up my browser, navigate to Chapter 28. Boost.Variant and lo and behold:
The variant class template is a safe, generic, stack-based discriminated union container, offering a simple solution for manipulating an object from a heterogeneous set of types [...]
Great! Exactly what I need!
But then it goes on:
Boost.Variant vs. Boost.Any
Boost.Any makes little use of template metaprogramming techniques (avoiding potentially hard-to-read error messages and significant compile-time processor and memory demands).
[...]
Troubleshooting
"Internal heap limit reached" -- Microsoft Visual C++ -- The compiler option /ZmNNN can increase the memory allocation limit. The NNN is a scaling percentage (i.e., 100 denotes the default limit). (Try /Zm200.)
[...]
Uh oh. So using boost::variant may significantly increase compile-time and generate hard-to-read error messages. What if someone moves my use of boost::variant to a common header, will our project suddenly take lots longer to compile? Am I introducing an (unnecessarily) complex type?
Should I use boost::variant for my simple tiny problem?
Generally, use boost::variant if you do want a discriminated union (any is for unknown types -- think of it as some kind of equivalent to how void* is used in C).
Some advantages include exception handling, potential usage of less space than the sum of the type sizes, type discriminated "visiting". Basically, stuff you'd want to perform on the discriminated union.
However, for boost::variant to be efficient, at least one of the types used must be "easily" constructed (read the documentation for more details on what "easily" means).
Boost.variant is not that complex, IMHO. Yes, it is template based, but it doesn't use any really complex feature of C++. I've used quite a bit and no problem at all. I think in your case it would help better describing what your code is doing.
Another way of thinking is transforming what that function returns into a more semantically rich structure/class that allows interpreting which inner element is interesting, but that depends on your design.
This kind of boost element comes from functional programming, where you have variants around every corner.
It should be a way to have a type-safe approach to returning a kind of value that can be of many precise types. This means that is useful to solve your problem BUT you should consider if it's really what you need to do.
The added value compared to other approaches that tries to solve the same problem should be the type-safety (you won't be able to place whatever you want inside a variant without noticing, in opposition to a void*)
I don't use it because, to me, it's a symptom of bad design.
Either your method should return an object that implements a determinated interface or it should be split in more than one method. Design should be reviewed, anyway.

Clojure static typing

I know that this may sound like blasphemy to Lisp aficionados (and other lovers of dynamic languages), but how difficult would it be to enhance the Clojure compiler to support static (compile-time) type checking?
Setting aside the arguments for and against static and dynamic typing, is this possible (not "is this advisable")?
I was thinking that adding a new reader macro to force a compile-time type (an enhanced version of the #^ macro) and adding the type information to the symbol table would allow the compiler to flag places where a variables was misused. For example, in the following code, I would expect a compile-time error (#* is the "compile-time" type macro):
(defn get-length [#*String s] (.length s))
(defn test-get-length [] (get-length 2.0))
The #^ macro could even be reused with a global variable (*compile-time-type-checking*) to force the compiler the do the checks.
Any thoughts on the feasibility?
It certain possible. However I do not think that Clojure will ever get any form of weak static typing - it's benefits are too few.
Rich Hickey has however expressed on several occasions his like for the strong, optional, and expressive typing feature of the Qi language, http://www.lambdassociates.org/qilisp.htm
It's certainly possible. The compiler already does some static type checking around primitive argument types in the 1.3 development branch.
Yes! It looks like there is a project underway, core.typed, to make optional static type checking a reality. See the Github project and its
documentation
This work grew out of an undergraduate honours dissertation (PDF) by Ambrose Bonnaire-Sergeant, and is related to the Typed Racket system.
Since one form is read AND evaluated at a time you cannot have forward references making this somewhat limited.
Old question but two important points: I don't think Clojure supports reader macros, only ordinary lisp macros. And now we have core.typed option for typing in Clojure.
declare can have type hints, so it is possible to declare a var that "is" the type which has not been defined yet but contains data about the structure, but this would be really clunky and you would have to do it before any code path that could be executed before the type is defined. Basically, you would want to define all of your user defined types up front and then use them like normal. I think that makes library writing somewhat hackish.
I didn't mean to suggest earlier that this isn't possible, just that for user defined types it is a lot more complicated than for pre-defined types. The benefit of doing this vs. the cost is something that should be seriously considered. But I encourage anyone who is interested to try it out and see if they can make it work!

How to explain C++ templates to junior developers?

One could break the question into two: how to read and to write templated code.
It is very easy to say, "it you want an array of doubles, write std::vector<double>", but it won't teach them how the templates work.
I'd probably try to demonstrate the power of templates, by demonstrating the annoyance of not using them.
A good demonstration would be to write something simple like a stack of doubles (hand-written, not STL), with methods push, pop, and foldTopTwo, which pops off and adds together the top two values in the stack, and pushes the new value back on.
Then tell them to do the same for ints (or whatever, just some different numeric type).
Then show them how, by writing this stack as a template, you can significantly reduce the number of lines of code, and all of that horrible duplication.
There is a saying: "If you can't explain it, you don't understand it."
You can break it down further: How to write code that uses templated code, and how to write code that provides a templated service to others.
The basic explanation is that templates generated code based on a template. That is the source of the term "meta programming". It is programming how programming should be done.
The essential complexity of a vector is not that it is a vector of doubles (or type T), but that it is a vector. The basic structure is the same and templates separate that which is consistent from that which is not.
Further explanation depends on how much of that makes sense to you!
IMHO it is best to explain them as (very) fancy macros. They just work at higher level than C-style text substitution macros.
I found it very instructive to look at duck-typed languages. It doesn't matter, there, what type of argument you give a function, as long as they offer the right interface.
Templates allows to do the same thing: you can take any type, as long as the right interface is present. The additional benefit over duck-typing is, that the interface is checked at compile-time.
Present them as advanced macros. It's a programming language on its own that is executed during compliation.
I would get them to implement something themselves, then experiment with different variations until they understand it. Learning by doing is almost always the better option with programming.
For example, get them to make a template which compares two values and returns the higher one. Then have them see has passing ints or doubles or whatever still allows it to work. Then get them to tweak the the code / copy it and have it return the minimum value. Again, experiment with variations - will the template allow them to pass an int and a double, or will it complain?
From there, you can have them pass in arrays of whatever type (int, double etc), and have it sort the array from highest to lowest, again encouraging experimentation. From there, start to move into templated class definitions, using the same kind of ideas but on a larger scale. This is pretty much how I learnt about templates, ending up with complex array manipulation classes for generic types.
When I was teaching myself C++ I used this site a lot. It explains templates in depth and very well. I would recommend having them read that and try implementing something simple.
For a shorter explanation: Templates are frameworks for complicated constructs that act on data without having to know what that data is. Give them some examples of a simple template (like a linked-list) and walk through how the template is used to generate the final class.
You can tell that a template is a half-written source with parameters to be filled while instatiating the template.