Polymorphic coersion to Word64 in Standard ML - sml

I would like create a polymorphic function that converts 8,16,32 bit words into 64 bit word. How can I do it?
UPDATE1
In the basis library all word structures have functions toLarge and fromLarge to convert to/from the LargeWord, which is as far as I understand just a synonym for Word32.
UPDATE2
According to the spec, word size must be power of two, but in SML/NJ I have
Standard ML of New Jersey v110.84 [built: Mon Dec 03 10:23:14 2018]
- Word.wordSize;
val it = 31 : int
- Word32.wordSize;
val it = 32 : int
- Word.toLarge;
val it = fn : word -> Word32.word
> LargeWord.wordSize;
val it = 32 : int
while in PolyML
Poly/ML 5.7.1 Release
> Word.wordSize;
val it = 63: int
> Word64.wordSize;
val it = 64: int
> Word.toLarge;
val it = fn: word -> ?.word
> LargeWord.wordSize;
val it = 64: int
How is that? Why Word.wordSize is not power of two? And why Word representation differs in these SML implementations?
UPDATE3
Actually, I want to be able "promote" smaller words into the larger ones using (<<) operator, but cannot figure it out how to do it.
UPDATE4
It seems that Word and LargeWord depend on the architecture and represent a machine word. Because SML/NJ does not support 64-bit arch, it has different word size.

You are right in that the types Word8.word, Word32.word and Word64.word only share the common type 'a which cannot generally be converted a Word64.word via parametric polymorphism.
The exact function you are looking for could (and should) have been:
Word<N>.toLargeWord : word -> LargeWord.word
Unfortunately, as you have discovered, it appears that LargeWord.word is an alias to Word32 and not Word64 in SML/NJ. It doesn't look like Basis specifies that LargeWord.word must do this, but reality. In Poly/ML it appears that LargeWord.wordSize is 126 and in Moscow ML there is no LargeWord structure! Sigh. But at least in Poly/ML it can contain a Word64.word.
In light of this, I'd suggest one of two things:
You can use ad-hoc polymorphism: Since all three modules share the signature WORD and this signature holds, among other things:
val toLargeInt : word -> LargeInt.int
So a hack may be to convert to a LargeInt.int and then down to a Word64.word: You can build a functor that takes one module with the WORD signature and return a structure that contains the conversion to Word64.
functor ToWord64 (WordN : WORD) = struct
fun toWord64 (n : WordN.word) : Word64.word =
Word64.fromLargeInt (WordN.toLargeInt n)
end
You can then instantiate this functor for each of your cases:
structure Word8ToWord64 = ToWord64(Word8)
val myWord64 = Word8ToWord64.toWord64 myWord8
This is a bit messy and the hierarchy of existing modules that includes LargeWord was meant to avoid it.
Alternatively, if you'd rather avoid this extra functor and arbitrary-precision integers as an intermediate representation, since this is both inefficient and unnecessary, you could change your standard library's LargeWord :> WORD so that it assumes the use of Word64 instead.
This could have been avoided if the standard library had been written in a functorial style with LargeWord having/being a parameter fixed somewhere where you could override it. But it would also make the standard library more complex.
With regards to ML module system design, I think the choice of placing toLargeWord in the WORD signature is one approach which is very convenient because you don't need a lot of functor instances, but, as you have witnessed, not very extensible. You can see the different philosophies applied in Jane Street's OCaml libraries Base and Core, where in Core you have e.g. Char.Map.t (convenient) and in Base you have Map.M(Char).t (extensible).
I've assumed that your words are all unsigned.

I concluded, that there's no way to make this in polymorphic way. Instead, have to use an appropriate toLarge/fromLarge methods like this:
fun toWord64 (w : Word8.word) : Word64.word = Word64.fromLarge (Word8.toLarge w)
I could use toLarge directly, but I want to ensure that resulting value will be Word64. This will compile with SML/NJ, but call of this function will cause a runtime exception.
By the way, I did not find a way how to extract Word64 from byte array in 32-bit SML/NJ.

Related

Is there a function that can make a string representation of any type?

I was desperately looking for the last hour for a method in the OCaml Library which converts an 'a to a string:
'a -> string
Is there something in the library which I just haven't found? Or do I have to do it different (writing everything by my own)?
It is not possible to write a printing function show of type 'a -> string in OCaml.
Indeed, types are erased after compilation in OCaml. (They are in fact erased after the typechecking which is one of the early phase of the compilation pipeline).
Consequently, a function of type 'a -> _ can either:
ignore its argument:
let f _ = "<something>"
peek at the memory representation of a value
let f x = if Obj.is_block x then "<block>" else "<immediate>"
Even peeking at the memory representation of a value has limited utility since many different types will share the same memory representation.
If you want to print a type, you need to create a printer for this type. You can either do this by hand using the Fmt library (or the Format module in the standard library)
type tree = Leaf of int | Node of { left:tree; right: tree }
let pp ppf tree = match tree with
| Leaf d -> Fmt.fp ppf "Leaf %d" d
| Node n -> Fmt.fp ppf "Node { left:%a; right:%a}" pp n.left pp n.right
or by using a ppx (a small preprocessing extension for OCaml) like https://github.com/ocaml-ppx/ppx_deriving.
type tree = Leaf of int | Node of { left:tree; right: tree } [##deriving show]
If you just want a quick hacky solution, you can use dump from theBatteries library. It doesn't work for all cases, but it does work for primitives, lists, etc. It accesses the underlying raw memory representation, hence is able to overcome (to some extent) the difficulties mentioned in the other answers.
You can use it like this (after installing it via opam install batteries):
# #require "batteries";;
# Batteries.dump 1;;
- : string = "1"
# Batteries.dump 1.2;;
- : string = "1.2"
# Batteries.dump [1;2;3];;
- : string = "[1; 2; 3]"
If you want a more "proper" solution, use ppx_deriving as recommended by #octachron. It is much more reliable/maintainable/customizable.
What you are looking for is a meaningful function of type 'a. 'a -> string, with parametric polymorphism (i.e. a single function that can operate the same for all possible types 'a, even those that didn’t exist when the function was created). This is not possible in OCaml. Here are explications depending on your programming background.
Coming from Haskell
If you were expecting such a function because you are familiar with the Haskell function show, then notice that its type is actually show :: Show a => a -> String. It uses an instance of the typeclass Show a, which is implicitly inserted by the compiler at call sites. This is not parametric polymorphism, this is ad-hoc polymorphism (show is overloaded, if you want). There is no such feature in OCaml (yet? there are projects for the future of the language, look for “modular implicits” or “modular explicits”).
Coming from OOP
If you were expecting such a function because you are familiar with OO languages in which every value is an object with a method toString, then this is not the case of OCaml. OCaml does not use the object model pervasively, and run-time representation of OCaml values retains no (or very few) notion of type. I refer you to #octachron’s answer.
Again, toString in OOP is not parametric polymorphism but overloading: there is not a single method toString which is defined for all possible types. Instead there are multiple — possibly very different — implementations of a method of the same name. In some OO languages, programmers try to follow the discipline of implementing a method by that name for every class they define, but it is only a coding practice. One could very well create objects that do not have such a method.
[ Actually, the notions involved in both worlds are pretty similar: Haskell requires an instance of a typeclass Show a providing a function show; OOP requires an object of a class Stringifiable (for instance) providing a method toString. Or, of course, an instance/object of a descendent typeclass/class. ]
Another possibility is to use https://github.com/ocaml-ppx/ppx_deriving with will create the function of Path.To.My.Super.Type.t -> string you can then use with your value. However you still need to track the path of the type by hand but it is better than nothing.
Another project provide feature similar to Batterie https://github.com/reasonml/reason-native/blob/master/src/console/README.md (I haven't tested Batterie so can't give opinion) They have the same limitation: they introspect the runtime encoding so can't get something really useable. I think it was done with windows/browser in mind so if cross plat is required I will test this one before (unless batterie is already pulled). and even if the code source is in reason you can use with same API in OCaml.

Is it possible to suppress the top level in a reference to a derived type?

This is hard to describe in words but an example should make it clear. Let's say I have a variable of a derived type, with the following components.
x%length
x%width
Is there any automatic way to refer to these without the top level? In other words to refer to them as simply
length
width
Of course, I could first do
length => x%length
width => x%width
for ALL individual components of the derived type. But my use case involves thousands of variables, so I'd prefer not to do it that way.
As an example from another language, python will essentially allow this suppression with:
from x import *
There is no such a functionality in fortran as far as I know, at least in the implementations that I have at hand. Beside that, the objectives of my post is to make some other thinks clear.
The python from x import * is the equivalent of use x in fortran. I am not very pythonic, but I do not think that you can import member of a class directly. So, that works as long as x is a pyton module, not a python class to my limited knowledge. use x will also works as long as x is a fortran module.
One of the programming language that I know of and that implements the feature that you are after is pascal. There is this handy construct with that allows you to do that.
with x do
begin
lenght ....
width ....
end
Indeed, it is very helpful in that it allows you to strip a part of the object name and get directly to fields. I loved it when I was using pascal, but it's been a long time.
Delphi certainly allows that too.
How about the Fortran 2003 associate construct? This will, in a sense, manage for you the pointer assignments that you listed:
Program test
Type :: t
Integer :: length
Integer :: width
End Type
Type (t) :: x = t(42, 43)
Associate (length=>x%length, width=>x%width)
Print *, length, width
End Associate
End Program
Quoting from Fortran 2003 (e.g., at http://www.j3-fortran.org/doc/year/04/04-007.pdf): "The ASSOCIATE construct associates named entities with expressions or variables during the execution of its block."
The December 2015 ACM Fortran Forum "Compiler Support" article lists the associate construct as being fully supported by Cray, IBM, Intel and NAG and partially supported by gfortran.
I don't think there is any way to simplify this though if you have many type components to alias in this way.

What's the OCaml naming convention for "constructors"?

An OCaml module usually contains at least one abstract type whose idiomatic name is t. Also, there's usually a function that constructs a value of that type.
What is the usual / idiomatic name for this?
The StdLib is not consistent here. For example:
There's Array.make and a deprecated function Array.create. So that function should be named make?
On the other hand, there's Buffer.create but not Buffer.make. So that function should be named create?
Some people find this way of module design makes OCaml programming easier, but this is not a mandatory OCaml programming style, and I do not think there is no official name for it. I personally call it "1-data-type-per-1-module" style. (I wrote a blog post about this but it is in Japanese. I hope some autotranslator gives some useful information to you ...)
Defining a module dedicated to one data type and fix the name of the type t has some values:
Nice namespacing
Module names explain about what its type and values are, therefore you do not need to repeat type names inside: Buffer.add_string instead of add_string_to_buffer, and Buffer.create instead of create_buffer. You can also avoid typing the same module names with local module open:
let f () =
let open Buffer in
let b = create 10 in (* instead of Buffer.create *)
add_string b "hello"; (* instead of Buffer.add_string *)
contents b (* instead of Buffer.contents *)
Easy ML functor application
If an ML functor takes an argument module with a data type, we have a convention that the type should be called t. Modules with data type t are easily applied to these functors without renaming of the type.
For Array.create and Array.make, I think this is to follow the distinction of String.create and String.make.
String.create is to create a string with uninitialized contents. The created string contains random bytes.
String.make is to create a string filled with the given char.
We had Array.create for long, to create an array whose contents are filled with the given value. This behavior corresponds with String.make rather than String.create. That's why it is now renamed to Array.make, and Array.create is obsolete.
We cannot have Array.create in OCaml with the same behaviour of String.create. Unlike strings, arrays cannot be created without initialization, since random bytes may not represent a valid OCaml value for the content in general, which leads to a program crash.
Following this, personally I use X.create for a function to create an X.t which does not require an initial value to fill it. I use X.make if it needs something to fill.
I had the same question when I picked up the language a long time ago. I never use make and I think few people do.
Nowadays I use create for heavy, often imperative or stateful values, e.g. a Unicode text segmenter. And I use v for, functional, lighter values in DSL/combinator based settings, e.g. the various constructors in Gg, for example for 2D vectors, or colors.
As camlspotter mentions in his answer the standard library distinguishes make and create for values that need an initial value to fill in. I think it's better to be regular here and always use create regardless. If your values support an optional initial fill value, add an optional argument to create rather than multiply the API entry points.

user defined type for strings which starts with Letter

I want to have user-defined type in Ocaml which represents strings which starts with English letter and afterwards can have letters or digits. Is it possible to define such custom type?
Jeffrey Scofield is right: there is no way in OCaml to define a type that would be the subset of strings verifying a given condition. You might however simulate that to some extent with a module and abstract or private data type, as in:
module Ident : sig
type t = private string
val create: string -> t
val (^): t -> t -> t
(* declare, and define below other functions as needed *)
end = struct
type t = string
let create s = (* do some check *) s
let (^) s1 s2 = create (s1 ^ s2)
end;;
Of course, the create function should check that the first char of s is a letter and the other ones letters or digits and raise an exception if this is not the case, but this is left a an exercise. This way, you know that any s of type Ident.t respects the conditions checked in create: by making the type synonym private in the signature, you ensure that you must go through one of the functions of Ident to create such value. Conversely (s:>string) is recognized as a string, hence you can still use all built-in functions over it (but you'll get back string, not Ident.t).
Note however that there is particular issue with string: they are mutable (although that is bound to change in the upcoming 4.02 version), so that you can alter an element of Ident.t afterwards:
let foo = "x0";;
let bar = Ident.create foo;;
foo.[0] <- '5';;
bar;;
will produce
- : Ident.t = "50"
If you restrict yourself to never modify a string in place (again this will be the default in the next OCaml's version), this cannot happen.
It's a little hard to answer, but I think the most straightforward answer is no. You want the type to be constrained by values, and this isn't something that's possible in OCaml. You need a language with dependent types for that.
You can define an OCaml type that represents such strings, but its values wouldn't also be strings. You couldn't use strings like "a15" as values of the type, or use the built-in ^ operator on them, etc. A value might look like S(Aa, [B1; B5]) (say). This is far too cumbersome to be useful.

Named parameter string formatting in C++

I'm wondering if there is a library like Boost Format, but which supports named parameters rather than positional ones. This is a common idiom in e.g. Python, where you have a context to format strings with that may or may not use all available arguments, e.g.
mouse_state = {}
mouse_state['button'] = 0
mouse_state['x'] = 50
mouse_state['y'] = 30
#...
"You clicked %(button)s at %(x)d,%(y)d." % mouse_state
"Targeting %(x)d, %(y)d." % mouse_state
Are there any libraries that offer the functionality of those last two lines? I would expect it to offer a API something like:
PrintFMap(string format, map<string, string> args);
In Googling I have found many libraries offering variations of positional parameters, but none that support named ones. Ideally the library has few dependencies so I can drop it easily into my code. C++ won't be quite as idiomatic for collecting named arguments, but probably someone out there has thought more about it than me.
Performance is important, in particular I'd like to keep memory allocations down (always tricky in C++), since this may be run on devices without virtual memory. But having even a slow one to start from will probably be faster than writing it from scratch myself.
The fmt library supports named arguments:
print("You clicked {button} at {x},{y}.",
arg("button", "b1"), arg("x", 50), arg("y", 30));
And as a syntactic sugar you can even (ab)use user-defined literals to pass arguments:
print("You clicked {button} at {x},{y}.",
"button"_a="b1", "x"_a=50, "y"_a=30);
For brevity the namespace fmt is omitted in the above examples.
Disclaimer: I'm the author of this library.
I've always been critic with C++ I/O (especially formatting) because in my opinion is a step backward in respect to C. Formats needs to be dynamic, and makes perfect sense for example to load them from an external resource as a file or a parameter.
I've never tried before however to actually implement an alternative and your question made me making an attempt investing some weekend hours on this idea.
Sure the problem was more complex than I thought (for example just the integer formatting routine is 200+ lines), but I think that this approach (dynamic format strings) is more usable.
You can download my experiment from this link (it's just a .h file) and a test program from this link (test is probably not the correct term, I used it just to see if I was able to compile).
The following is an example
#include "format.h"
#include <iostream>
using format::FormatString;
using format::FormatDict;
int main()
{
std::cout << FormatString("The answer is %{x}") % FormatDict()("x", 42);
return 0;
}
It is different from boost.format approach because uses named parameters and because
the format string and format dictionary are meant to be built separately (and for
example passed around). Also I think that formatting options should be part of the
string (like printf) and not in the code.
FormatDict uses a trick for keeping the syntax reasonable:
FormatDict fd;
fd("x", 12)
("y", 3.141592654)
("z", "A string");
FormatString is instead just parsed from a const std::string& (I decided to preparse format strings but a slower but probably acceptable approach would be just passing the string and reparsing it each time).
The formatting can be extended for user defined types by specializing a conversion function template; for example
struct P2d
{
int x, y;
P2d(int x, int y)
: x(x), y(y)
{
}
};
namespace format {
template<>
std::string toString<P2d>(const P2d& p, const std::string& parms)
{
return FormatString("P2d(%{x}; %{y})") % FormatDict()
("x", p.x)
("y", p.y);
}
}
after that a P2d instance can be simply placed in a formatting dictionary.
Also it's possible to pass parameters to a formatting function by placing them between % and {.
For now I only implemented an integer formatting specialization that supports
Fixed size with left/right/center alignment
Custom filling char
Generic base (2-36), lower or uppercase
Digit separator (with both custom char and count)
Overflow char
Sign display
I've also added some shortcuts for common cases, for example
"%08x{hexdata}"
is an hex number with 8 digits padded with '0's.
"%026/2,8:{bindata}"
is a 24-bit binary number (as required by "/2") with digit separator ":" every 8 bits (as required by ",8:").
Note that the code is just an idea, and for example for now I just prevented copies when probably it's reasonable to allow storing both format strings and dictionaries (for dictionaries it's however important to give the ability to avoid copying an object just because it needs to be added to a FormatDict, and while IMO this is possible it's also something that raises non-trivial problems about lifetimes).
UPDATE
I've made a few changes to the initial approach:
Format strings can now be copied
Formatting for custom types is done using template classes instead of functions (this allows partial specialization)
I've added a formatter for sequences (two iterators). Syntax is still crude.
I've created a github project for it, with boost licensing.
The answer appears to be, no, there is not a C++ library that does this, and C++ programmers apparently do not even see the need for one, based on the comments I have received. I will have to write my own yet again.
Well I'll add my own answer as well, not that I know (or have coded) such a library, but to answer to the "keep the memory allocation down" bit.
As always I can envision some kind of speed / memory trade-off.
On the one hand, you can parse "Just In Time":
class Formater:
def __init__(self, format): self._string = format
def compute(self):
for k,v in context:
while self.__contains(k):
left, variable, right = self.__extract(k)
self._string = left + self.__replace(variable, v) + right
This way you don't keep a "parsed" structure at hand, and hopefully most of the time you'll just insert the new data in place (unlike Python, C++ strings are not immutable).
However it's far from being efficient...
On the other hand, you can build a fully constructed tree representing the parsed format. You will have several classes like: Constant, String, Integer, Real, etc... and probably some subclasses / decorators as well for the formatting itself.
I think however than the most efficient approach would be to have some kind of a mix of the two.
explode the format string into a list of Constant, Variable
index the variables in another structure (a hash table with open-addressing would do nicely, or something akin to Loki::AssocVector).
There you are: you're done with only 2 dynamically allocated arrays (basically). If you want to allow a same key to be repeated multiple times, simply use a std::vector<size_t> as a value of the index: good implementations should not allocate any memory dynamically for small sized vectors (VC++ 2010 doesn't for less than 16 bytes worth of data).
When evaluating the context itself, look up the instances. You then parse the formatter "just in time", check it agaisnt the current type of the value with which to replace it, and process the format.
Pros and cons:
- Just In Time: you scan the string again and again
- One Parse: requires a lot of dedicated classes, possibly many allocations, but the format is validated on input. Like Boost it may be reused.
- Mix: more efficient, especially if you don't replace some values (allow some kind of "null" value), but delaying the parsing of the format delays the reporting of errors.
Personally I would go for the One Parse scheme, trying to keep the allocations down using boost::variant and the Strategy Pattern as much I could.
Given that Python it's self is written in C and that formatting is such a commonly used feature, you might be able (ignoring copy write issues) to rip the relevant code from the python interpreter and port it to use STL maps rather than Pythons native dicts.
I've writen a library for this puporse, check it out on GitHub.
Contributions are wellcome.