What's the OCaml naming convention for "constructors"? - ocaml

An OCaml module usually contains at least one abstract type whose idiomatic name is t. Also, there's usually a function that constructs a value of that type.
What is the usual / idiomatic name for this?
The StdLib is not consistent here. For example:
There's Array.make and a deprecated function Array.create. So that function should be named make?
On the other hand, there's Buffer.create but not Buffer.make. So that function should be named create?

Some people find this way of module design makes OCaml programming easier, but this is not a mandatory OCaml programming style, and I do not think there is no official name for it. I personally call it "1-data-type-per-1-module" style. (I wrote a blog post about this but it is in Japanese. I hope some autotranslator gives some useful information to you ...)
Defining a module dedicated to one data type and fix the name of the type t has some values:
Nice namespacing
Module names explain about what its type and values are, therefore you do not need to repeat type names inside: Buffer.add_string instead of add_string_to_buffer, and Buffer.create instead of create_buffer. You can also avoid typing the same module names with local module open:
let f () =
let open Buffer in
let b = create 10 in (* instead of Buffer.create *)
add_string b "hello"; (* instead of Buffer.add_string *)
contents b (* instead of Buffer.contents *)
Easy ML functor application
If an ML functor takes an argument module with a data type, we have a convention that the type should be called t. Modules with data type t are easily applied to these functors without renaming of the type.
For Array.create and Array.make, I think this is to follow the distinction of String.create and String.make.
String.create is to create a string with uninitialized contents. The created string contains random bytes.
String.make is to create a string filled with the given char.
We had Array.create for long, to create an array whose contents are filled with the given value. This behavior corresponds with String.make rather than String.create. That's why it is now renamed to Array.make, and Array.create is obsolete.
We cannot have Array.create in OCaml with the same behaviour of String.create. Unlike strings, arrays cannot be created without initialization, since random bytes may not represent a valid OCaml value for the content in general, which leads to a program crash.
Following this, personally I use X.create for a function to create an X.t which does not require an initial value to fill it. I use X.make if it needs something to fill.

I had the same question when I picked up the language a long time ago. I never use make and I think few people do.
Nowadays I use create for heavy, often imperative or stateful values, e.g. a Unicode text segmenter. And I use v for, functional, lighter values in DSL/combinator based settings, e.g. the various constructors in Gg, for example for 2D vectors, or colors.
As camlspotter mentions in his answer the standard library distinguishes make and create for values that need an initial value to fill in. I think it's better to be regular here and always use create regardless. If your values support an optional initial fill value, add an optional argument to create rather than multiply the API entry points.

Related

Is there a function that can make a string representation of any type?

I was desperately looking for the last hour for a method in the OCaml Library which converts an 'a to a string:
'a -> string
Is there something in the library which I just haven't found? Or do I have to do it different (writing everything by my own)?
It is not possible to write a printing function show of type 'a -> string in OCaml.
Indeed, types are erased after compilation in OCaml. (They are in fact erased after the typechecking which is one of the early phase of the compilation pipeline).
Consequently, a function of type 'a -> _ can either:
ignore its argument:
let f _ = "<something>"
peek at the memory representation of a value
let f x = if Obj.is_block x then "<block>" else "<immediate>"
Even peeking at the memory representation of a value has limited utility since many different types will share the same memory representation.
If you want to print a type, you need to create a printer for this type. You can either do this by hand using the Fmt library (or the Format module in the standard library)
type tree = Leaf of int | Node of { left:tree; right: tree }
let pp ppf tree = match tree with
| Leaf d -> Fmt.fp ppf "Leaf %d" d
| Node n -> Fmt.fp ppf "Node { left:%a; right:%a}" pp n.left pp n.right
or by using a ppx (a small preprocessing extension for OCaml) like https://github.com/ocaml-ppx/ppx_deriving.
type tree = Leaf of int | Node of { left:tree; right: tree } [##deriving show]
If you just want a quick hacky solution, you can use dump from theBatteries library. It doesn't work for all cases, but it does work for primitives, lists, etc. It accesses the underlying raw memory representation, hence is able to overcome (to some extent) the difficulties mentioned in the other answers.
You can use it like this (after installing it via opam install batteries):
# #require "batteries";;
# Batteries.dump 1;;
- : string = "1"
# Batteries.dump 1.2;;
- : string = "1.2"
# Batteries.dump [1;2;3];;
- : string = "[1; 2; 3]"
If you want a more "proper" solution, use ppx_deriving as recommended by #octachron. It is much more reliable/maintainable/customizable.
What you are looking for is a meaningful function of type 'a. 'a -> string, with parametric polymorphism (i.e. a single function that can operate the same for all possible types 'a, even those that didn’t exist when the function was created). This is not possible in OCaml. Here are explications depending on your programming background.
Coming from Haskell
If you were expecting such a function because you are familiar with the Haskell function show, then notice that its type is actually show :: Show a => a -> String. It uses an instance of the typeclass Show a, which is implicitly inserted by the compiler at call sites. This is not parametric polymorphism, this is ad-hoc polymorphism (show is overloaded, if you want). There is no such feature in OCaml (yet? there are projects for the future of the language, look for “modular implicits” or “modular explicits”).
Coming from OOP
If you were expecting such a function because you are familiar with OO languages in which every value is an object with a method toString, then this is not the case of OCaml. OCaml does not use the object model pervasively, and run-time representation of OCaml values retains no (or very few) notion of type. I refer you to #octachron’s answer.
Again, toString in OOP is not parametric polymorphism but overloading: there is not a single method toString which is defined for all possible types. Instead there are multiple — possibly very different — implementations of a method of the same name. In some OO languages, programmers try to follow the discipline of implementing a method by that name for every class they define, but it is only a coding practice. One could very well create objects that do not have such a method.
[ Actually, the notions involved in both worlds are pretty similar: Haskell requires an instance of a typeclass Show a providing a function show; OOP requires an object of a class Stringifiable (for instance) providing a method toString. Or, of course, an instance/object of a descendent typeclass/class. ]
Another possibility is to use https://github.com/ocaml-ppx/ppx_deriving with will create the function of Path.To.My.Super.Type.t -> string you can then use with your value. However you still need to track the path of the type by hand but it is better than nothing.
Another project provide feature similar to Batterie https://github.com/reasonml/reason-native/blob/master/src/console/README.md (I haven't tested Batterie so can't give opinion) They have the same limitation: they introspect the runtime encoding so can't get something really useable. I think it was done with windows/browser in mind so if cross plat is required I will test this one before (unless batterie is already pulled). and even if the code source is in reason you can use with same API in OCaml.

Function definition syntax

I'm trying to implement a particular algorithm. The algorithm isn't very well described but I do have an OCaml implementation. Problem is I don't know OCaml and I'm finding the syntax strange. So here's the first of what might be many questions. Apologies for any mistakes in terminolgy.
One part of the code I have looks like this
type alternative_text = string
type indent = int
module Line =
struct
type t = {s:alternative_text; i:indent}
let make s i = {s;i}
let text (l:t): alternative_text = l.s
let length l = String.length l.s
let indent l = l.i
end
My question concerns the line let text (l:t): alternative_text = l.s. I think I know what this is, a function Line.text which takes a Line.t object and returns the s field, which is a string.
My question concerns the (l:t): alternative_text syntax. This looks like it's specifying the type of the parameter and function result, but why is it necessary? As far as I know let text l = l.s would do exactly the same thing and the other functions are defined without using this extra syntax. So why is it being used here?
Thanks in advance.
The problem with records is that their field names have a scope that's outside the record. So if you have two records with the same field name a, they will clash. I.e., it won't be possible in general to tell whether x.a refers to a field in one record type or the other record type. Depending on the type of x, it could be either.
OCaml tries to give a lot of flexibility in this area by inferring the record type (of x in this example). But if it can't be inferred you need to specify which type you're talking about.
As a side note #glennsl is correct. If you have a non-trivial amount of OCaml to figure out, and you're learning OCaml from scratch, it will be faster to learn OCaml from a book or an online tutorial than to ask individual questions here on StackOverflow.

Tuple Concatenation in Chapel

Let's say I'm generating tuples and I want to concatenate them as they come. How do I do this? The following does element-wise addition:
if ts = ("foo", "cat"), t = ("bar", "dog")
ts += t gives ts = ("foobar", "catdog"),
but what I really want is ts = (("foo","cat"),("bar","dog")).
So I guess the first question is "does Chapel support tuple concatention?", then "is there a binary operator/function for it?", then "if not, what is a good way to do it?", and lastly "make my life easier if you know a better way of living".
Please address the questions in order.
I appreciate the help!!
the first question is "does Chapel support tuple concatention?"
I believe the answer here is "no" for the following reason: (1) A Chapel variable has a single static type that cannot change over its lifetime, and (2) a tuple's type is defined as its static number of elements as well as the type of each element. Thus, given your variable ts
ts = ("foo", "cat")
its type is 2*string ("a 2-tuple of strings") and this would prevent it from ever being able to store the value (("foo","cat"),("bar","dog")) since its type is 2*(2*string) ("a 2-tuple of 2-tuples of strings"). So while these two tuples have the same number of elements (2), they differ in their element types ("string" vs. "2-tuple of string") and therefore aren't the same type (aren't compatible).
"is there a binary operator/function for it?"
Due to the above, no.
then "if not, what is a good way to do it?"
A few things come to mind, but may or may not be helpful depending on your specific situation. Rather than trying to re-assign ts, you could create a new tuple that was a tuple-of-tuples:
const ts2 = (ts, t);
and you could even do this recursively in a routine, though that would likely end up blowing up your code size if the tuple grew to any significant length (because each call to the function would generate a tuple of a different type and unique code for it).
From what I'm seeing in your question, I think you may want to use a list of tuples or a 1D array (vector) of tuples. Here's a list-based approach:
use List;
var listOfTups: list(2*string);
listOfTups.append(("foo", "cat"));
listOfTups.append(("bar", "dog"));
writeln(listOfTups);
And here's an array-based approach:
var arrOfTups: [1..0] 2*string;
arrOfTups.push_back(("foo", "cat"));
arrOfTups.push_back(("bar", "dog"));
writeln(arrOfTups);
Of the two, I would recommend the array-based approach because arrays are much more first-class and powerful in Chapel (they enjoy syntactic support, permit data parallelism, support promotion of scalar functions and operators, etc.) whereas lists are just a convenience library.
and lastly "make my life easier if you know a better way of living".
One other related thing I can think of to mention if you're not aware of it is that "varargs" functions in Chapel effectively convert those arguments to tuples. So given:
proc myFunc(x...) {
writeln(x.type:string);
}
myFunc(("foo", "cat"), ("bar", "dog"));
the output is:
2*2*string

Is it possible to suppress the top level in a reference to a derived type?

This is hard to describe in words but an example should make it clear. Let's say I have a variable of a derived type, with the following components.
x%length
x%width
Is there any automatic way to refer to these without the top level? In other words to refer to them as simply
length
width
Of course, I could first do
length => x%length
width => x%width
for ALL individual components of the derived type. But my use case involves thousands of variables, so I'd prefer not to do it that way.
As an example from another language, python will essentially allow this suppression with:
from x import *
There is no such a functionality in fortran as far as I know, at least in the implementations that I have at hand. Beside that, the objectives of my post is to make some other thinks clear.
The python from x import * is the equivalent of use x in fortran. I am not very pythonic, but I do not think that you can import member of a class directly. So, that works as long as x is a pyton module, not a python class to my limited knowledge. use x will also works as long as x is a fortran module.
One of the programming language that I know of and that implements the feature that you are after is pascal. There is this handy construct with that allows you to do that.
with x do
begin
lenght ....
width ....
end
Indeed, it is very helpful in that it allows you to strip a part of the object name and get directly to fields. I loved it when I was using pascal, but it's been a long time.
Delphi certainly allows that too.
How about the Fortran 2003 associate construct? This will, in a sense, manage for you the pointer assignments that you listed:
Program test
Type :: t
Integer :: length
Integer :: width
End Type
Type (t) :: x = t(42, 43)
Associate (length=>x%length, width=>x%width)
Print *, length, width
End Associate
End Program
Quoting from Fortran 2003 (e.g., at http://www.j3-fortran.org/doc/year/04/04-007.pdf): "The ASSOCIATE construct associates named entities with expressions or variables during the execution of its block."
The December 2015 ACM Fortran Forum "Compiler Support" article lists the associate construct as being fully supported by Cray, IBM, Intel and NAG and partially supported by gfortran.
I don't think there is any way to simplify this though if you have many type components to alias in this way.

Named parameter string formatting in C++

I'm wondering if there is a library like Boost Format, but which supports named parameters rather than positional ones. This is a common idiom in e.g. Python, where you have a context to format strings with that may or may not use all available arguments, e.g.
mouse_state = {}
mouse_state['button'] = 0
mouse_state['x'] = 50
mouse_state['y'] = 30
#...
"You clicked %(button)s at %(x)d,%(y)d." % mouse_state
"Targeting %(x)d, %(y)d." % mouse_state
Are there any libraries that offer the functionality of those last two lines? I would expect it to offer a API something like:
PrintFMap(string format, map<string, string> args);
In Googling I have found many libraries offering variations of positional parameters, but none that support named ones. Ideally the library has few dependencies so I can drop it easily into my code. C++ won't be quite as idiomatic for collecting named arguments, but probably someone out there has thought more about it than me.
Performance is important, in particular I'd like to keep memory allocations down (always tricky in C++), since this may be run on devices without virtual memory. But having even a slow one to start from will probably be faster than writing it from scratch myself.
The fmt library supports named arguments:
print("You clicked {button} at {x},{y}.",
arg("button", "b1"), arg("x", 50), arg("y", 30));
And as a syntactic sugar you can even (ab)use user-defined literals to pass arguments:
print("You clicked {button} at {x},{y}.",
"button"_a="b1", "x"_a=50, "y"_a=30);
For brevity the namespace fmt is omitted in the above examples.
Disclaimer: I'm the author of this library.
I've always been critic with C++ I/O (especially formatting) because in my opinion is a step backward in respect to C. Formats needs to be dynamic, and makes perfect sense for example to load them from an external resource as a file or a parameter.
I've never tried before however to actually implement an alternative and your question made me making an attempt investing some weekend hours on this idea.
Sure the problem was more complex than I thought (for example just the integer formatting routine is 200+ lines), but I think that this approach (dynamic format strings) is more usable.
You can download my experiment from this link (it's just a .h file) and a test program from this link (test is probably not the correct term, I used it just to see if I was able to compile).
The following is an example
#include "format.h"
#include <iostream>
using format::FormatString;
using format::FormatDict;
int main()
{
std::cout << FormatString("The answer is %{x}") % FormatDict()("x", 42);
return 0;
}
It is different from boost.format approach because uses named parameters and because
the format string and format dictionary are meant to be built separately (and for
example passed around). Also I think that formatting options should be part of the
string (like printf) and not in the code.
FormatDict uses a trick for keeping the syntax reasonable:
FormatDict fd;
fd("x", 12)
("y", 3.141592654)
("z", "A string");
FormatString is instead just parsed from a const std::string& (I decided to preparse format strings but a slower but probably acceptable approach would be just passing the string and reparsing it each time).
The formatting can be extended for user defined types by specializing a conversion function template; for example
struct P2d
{
int x, y;
P2d(int x, int y)
: x(x), y(y)
{
}
};
namespace format {
template<>
std::string toString<P2d>(const P2d& p, const std::string& parms)
{
return FormatString("P2d(%{x}; %{y})") % FormatDict()
("x", p.x)
("y", p.y);
}
}
after that a P2d instance can be simply placed in a formatting dictionary.
Also it's possible to pass parameters to a formatting function by placing them between % and {.
For now I only implemented an integer formatting specialization that supports
Fixed size with left/right/center alignment
Custom filling char
Generic base (2-36), lower or uppercase
Digit separator (with both custom char and count)
Overflow char
Sign display
I've also added some shortcuts for common cases, for example
"%08x{hexdata}"
is an hex number with 8 digits padded with '0's.
"%026/2,8:{bindata}"
is a 24-bit binary number (as required by "/2") with digit separator ":" every 8 bits (as required by ",8:").
Note that the code is just an idea, and for example for now I just prevented copies when probably it's reasonable to allow storing both format strings and dictionaries (for dictionaries it's however important to give the ability to avoid copying an object just because it needs to be added to a FormatDict, and while IMO this is possible it's also something that raises non-trivial problems about lifetimes).
UPDATE
I've made a few changes to the initial approach:
Format strings can now be copied
Formatting for custom types is done using template classes instead of functions (this allows partial specialization)
I've added a formatter for sequences (two iterators). Syntax is still crude.
I've created a github project for it, with boost licensing.
The answer appears to be, no, there is not a C++ library that does this, and C++ programmers apparently do not even see the need for one, based on the comments I have received. I will have to write my own yet again.
Well I'll add my own answer as well, not that I know (or have coded) such a library, but to answer to the "keep the memory allocation down" bit.
As always I can envision some kind of speed / memory trade-off.
On the one hand, you can parse "Just In Time":
class Formater:
def __init__(self, format): self._string = format
def compute(self):
for k,v in context:
while self.__contains(k):
left, variable, right = self.__extract(k)
self._string = left + self.__replace(variable, v) + right
This way you don't keep a "parsed" structure at hand, and hopefully most of the time you'll just insert the new data in place (unlike Python, C++ strings are not immutable).
However it's far from being efficient...
On the other hand, you can build a fully constructed tree representing the parsed format. You will have several classes like: Constant, String, Integer, Real, etc... and probably some subclasses / decorators as well for the formatting itself.
I think however than the most efficient approach would be to have some kind of a mix of the two.
explode the format string into a list of Constant, Variable
index the variables in another structure (a hash table with open-addressing would do nicely, or something akin to Loki::AssocVector).
There you are: you're done with only 2 dynamically allocated arrays (basically). If you want to allow a same key to be repeated multiple times, simply use a std::vector<size_t> as a value of the index: good implementations should not allocate any memory dynamically for small sized vectors (VC++ 2010 doesn't for less than 16 bytes worth of data).
When evaluating the context itself, look up the instances. You then parse the formatter "just in time", check it agaisnt the current type of the value with which to replace it, and process the format.
Pros and cons:
- Just In Time: you scan the string again and again
- One Parse: requires a lot of dedicated classes, possibly many allocations, but the format is validated on input. Like Boost it may be reused.
- Mix: more efficient, especially if you don't replace some values (allow some kind of "null" value), but delaying the parsing of the format delays the reporting of errors.
Personally I would go for the One Parse scheme, trying to keep the allocations down using boost::variant and the Strategy Pattern as much I could.
Given that Python it's self is written in C and that formatting is such a commonly used feature, you might be able (ignoring copy write issues) to rip the relevant code from the python interpreter and port it to use STL maps rather than Pythons native dicts.
I've writen a library for this puporse, check it out on GitHub.
Contributions are wellcome.