How to convert SML/NJ HTML4 representation to a string - sml

When using SML/NJ library's HTML4 library, how do I convert the Standard ML representation of HTML4 into a string?
For example, if I have the HTML representation below, what function can I use to get a string similar to <html><head><title>Example</title></head><body><h1>Hello!</h1></body></html>?
(* CM.make "$/html4-lib.cm"; *)
open HTML4;
val myHTML = HTML {
version=NONE,
head=[Head_TITLE ([], [PCDATA "Example"])],
content=BodyOrFrameset_BODY (BODY ([], [
BlockOrScript_BLOCK (H1 ([], [CDATA [PCDATA "Hello!"]]))]))
};
(SML/NJ version: 110.99.2)

According to the SML/NJ bug tracker, the following function can be used to convert HTML4.html to a string:
fun toString html =
let
val buf = CharBuffer.new 1024
in
HTML4Print.prHTML {
putc = fn c => CharBuffer.add1 (buf, c),
puts = fn s => CharBuffer.addVec (buf, s)
} html;
CharBuffer.contents buf
end
To be able to use HTML4Print.prHTML in the SML/NJ REPL, the REPL should be started using sml '$/html4-lib.cm'. Alternatively, enter CM.make "$/html4-lib.cm"; after starting the REPL.
The function has signature val toString = fn : HTML4.html -> CharBuffer.vector. CharBuffer is an extension to the Basis Library (reference: 2018 001 Addition of monomorphic buffers). CharBuffer.vector is the same type as CharVector.vector, which is the same type as String.string, which is the same type as string.

It seems you could use the HTML4Print structure (which appears in the export list in the CM file):
$ sml '$/html4-lib.cm'
Standard ML of New Jersey (64-bit) v110.99.2 [built: Thu Sep 23 13:44:44 2021]
[library $/html4-lib.cm is stable]
- open HTML4Print;
[autoloading]
[library $SMLNJ-LIB/Util/smlnj-lib.cm is stable]
[autoloading done]
opening HTML4Print
val prHTML : {putc:char -> unit, puts:string -> unit} -> HTML4.html -> unit
val prBODY : {putc:char -> unit, puts:string -> unit} -> HTML4.body -> unit
So, with your value, it produces:
- HTML4Print.prHTML { putc = print o String.str, puts = print } myHTML;
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<HTML>
<HEAD>
<TITLE>
Example
</TITLE>
</HEAD>
<BODY>
<H1>Hello!</H1>
</BODY>
</HTML>
val it = () : unit

Related

why do some functions in the List structure require the "List" prefix and some do not?

(I am using SML/NJ)
The list structure http://sml-family.org/Basis/list.html includes
#, hd, tl, null, concat, etc.
Some of them are available without a prefix: #, hd, tl, [], concat.
But others, such as exists, and nth require the List prefix. see below:
Standard ML of New Jersey v110.79 [built: Tue Aug 8 23:21:20 2017]
- op #;
val it = fn : 'a list * 'a list -> 'a list
- concat;
val it = fn : string list -> string
- nth;
stdIn:3.1-3.4 Error: unbound variable or constructor: nth
- exists;
stdIn:1.2-2.1 Error: unbound variable or constructor: exists
- List.nth;
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[library $SMLNJ-BASIS/(basis.cm):basis-common.cm is stable]
[autoloading done]
val it = fn : 'a list * int -> 'a
- List.exists;
val it = fn : ('a -> bool) -> 'a list -> bool
Why? I tried to find an answer in the "Definition of Standard ML (1997)"
but I could not find anything related to this.
Some names are available unqualified because they are also bound in the top-level environment of the SML Basis library, including the ones you listed. See here for a complete list.

SML: Error: non-constructor applied to argument in pattern: -

I'am writing this function for a MOOC. It's job is to remove a string from the list and return that list without the string as a SOME or return NONE is the string is not there.
I wrote the code below but whenever I try to run it I get the following error: Error: non-constructor applied to argument in pattern: -.
exception NotFound
fun all_except_option (str : string, strs : string list) =
let
fun remove_str (strs : string list) =
case strs of
[] => raise NotFound
| str'::strs' => if same_string(str, str') then strs' else str'::remove_str strs'
in
SOME (remove_str strs) handle NotFound => NONE
end
And where's one test to run it:
val test01-01 = all_except_option ("string", ["string"]) = SOME []
edit
forgot to include the same_string function that was provided to us to simplify types
fun same_string(s1 : string, s2 : string) =
s1 = s2
Figured out the problem. Seems like SML doesn't like hyphens, like the one I had in the test:
val test01-01 = all_except_option ("string", ["string"]) = SOME []
I changed to underscore instead and now it works.
val test01_01 = all_except_option ("string", ["string"]) = SOME []
Since you've already solved this task, here's a way to write it without using exceptions:
fun all_except_option (_, []) = NONE
| all_except_option (t, s :: ss) =
if s = t
then SOME ss (* don't include s in result, and don't recurse further *)
else case all_except_option (t, ss) of
SOME ss' => SOME (s :: ss')
| NONE => NONE
Having a recursive function return t option rather than t makes it more difficult to deal with, since upon every recursive call, you must inspect if it returned SOME ... or NONE. This can mean a lot of case ... of ... s!
They can be abstracted away using the library function Option.map. The definition is found in the standard library and translates into:
fun (*Option.*)map f opt =
case opt of
SOME v => SOME (f v)
| NONE => NONE
This bit resembles the case ... of ... in all_except_option; rewriting it would look like:
fun all_except_option (_, []) = NONE
| all_except_option (t, s :: ss) =
if s = t
then SOME ss (* don't include s in result, and don't recurse further *)
else Option.map (fn ss' => s :: ss') (all_except_option (t, ss))

What do questions marks mean in Standard ML types?

For instance:
vagrant#precise32:/vagrant$ rlwrap sml
Standard ML of New Jersey v110.76 [built: Mon May 12 17:11:57 2014]
- TextIO.StreamIO.inputLine ;
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[autoloading done]
val it = fn : ?.TextIO.instream -> (string * ?.TextIO.instream) option
- val s = TextIO.openIn "README.md" ;
val s = - : TextIO.instream
- TextIO.StreamIO.inputLine s ;
stdIn:3.1-3.28 Error: operator and operand don't agree [tycon mismatch]
operator domain: ?.TextIO.instream
operand: TextIO.instream
in expression:
TextIO.StreamIO.inputLine s
-
I know that dummy type variables created due to the value restriction will have question marks in them, e.g.
- [] # [];
stdIn:17.1-17.8 Warning: type vars not generalized because of
value restriction are instantiated to dummy types (X1,X2,...)
val it = [] : ?.X1 list
... but this doesn't apply to the example above as the value restriction isn't involved.
In these lecture notes, I found the following comment, on page 23:
In fact, as indicated by the question marks ? in the error
message, it now has a type that cannot even be named anymore,
since the new but identically named definition of mylist shadows
it.
But this is referring to a type checking error, and anyway my example with TextIO.StreamIO this shouldn't apply as nothing is being shadowed.
edited to add
So I figured out my actual problem, which was how to get a ?.TextIO.instream from a filename, but I still don't really know what the question marks are about:
vagrant#precise32:/vagrant$ rlwrap sml
Standard ML of New Jersey v110.76 [built: Mon May 12 17:11:57 2014]
val fromFile : string -> TextIO.StreamIO.instream =
= TextIO.getInstream o TextIO.openIn ;
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[autoloading done]
val fromFile = fn : string -> ?.TextIO.instream
- TextIO.getInstream ;
val it = fn : TextIO.instream -> ?.TextIO.instream
- TextIO.StreamIO.input1 (fromFile "README.md") ;
val it = SOME (#"#",-) : (TextIO.StreamIO.elem * ?.TextIO.instream) option
-
second edit
I discovered that Poly/ML doesn't use these question marks when printing types, so I assume this is something specific to SML/NJ:
Poly/ML 5.5.1 Release
> TextIO.StreamIO.inputLine ;
val it = fn:
TextIO.StreamIO.instream -> (string * TextIO.StreamIO.instream) option
> val fromFile : string -> TextIO.StreamIO.instream =
TextIO.getInstream o TextIO.openIn ;
# val fromFile = fn: string -> TextIO.StreamIO.instream
> TextIO.getInstream ;
val it = fn: TextIO.instream -> TextIO.StreamIO.instream
>
I'd still be curious if anyone knows under what circumstances SML/NJ prints these questions marks and what the story is behind them...
I believe this is specific to SML/NJ, and they are used when printing a type that does not have an accessible name (or probably, when the name that SML/NJ comes up with it is not accessible, since SML/NJ seems to just use some heuristic for printing types at the REPL). The value restriction is one way that such types arise (here SML/NJ chooses to unfiy the type with some useless new type). Here's another simple interaction that demonstrates another way, when the only name for a type (S.t) is shadowed by a new declaration of S:
- structure S = struct datatype t = X end;
structure S :
sig
datatype t = X
end
- val y = S.X;
val y = X : S.t
- structure S = struct end;
structure S : sig end
- y;
val it = X : ?.S.t
I think that in your example, there are multiple substructures called TextIO in the basis, and the toplevel TextIO structure is probably shadowing the one you're accessing. SML/NJ may also just be choosing a bad name for the type and not realizing that there's a sharing declaration or something that makes it possible to write the type down.

Passing command-line arguments to an SML script

How do I go about passing command-line arguments to an SML script? I'm aware that there is a CommandLine.arguments() function of the right type (unit -> string list), but invoking the interpreter like so:
$ sml script_name.sml an_argument another_one
doesn't give me anything. Pointers?
Try this.
(* arg.sml *)
val args = CommandLine.arguments()
fun sum l = foldr op+ 0 (map (valOf o Int.fromString) l)
val _ = print ("size: " ^ Int.toString (length args) ^ "\n")
val _ = print ("sum: " ^ Int.toString (sum args) ^ "\n")
val _ = OS.Process.exit(OS.Process.success)
The exit is important, otherwise you get a bunch of warnings treating the arguments as extensions. That is, it tries to parse the remaining arguments as files, but since they don't have an sml extension, they are treated as compiler extensions.
$ sml arg.sml 1 2 3
Standard ML of New Jersey v110.74 [built: Thu Jan 10 18:06:35 2013]
[opening arg.sml]
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[autoloading done]
size: 3
sum: 6
val args = ["1","2","3"] : string list
val sum = fn : string list -> int
In programs compiled with MLton, commandline args are straightforward:
$ mlton arg.sml
$ ./arg a b c
size: 3
sum: 6
In SML/NJ it's more of a hassle to create a standalone executable.

Loading module with dynlink re-initialises top-level values

I have a problem where I have a global hashtable, and then I load a .cma file with Dynlink, which registers a function in the hashtable.
However, the behaviour I seem to be see is that when the module is dynamically linked, all the global bindings get re-initialised, such that my hashtable is empty.
E.g.:
Table.extensions : (string, string -> string) Hashtbl.t
Extensions.load : unit -> unit (* loads the specified .cma files *)
Extensions.register : string -> (string -> string) -> unit
(* adds entry to Table.extensions, prints name of extension registered *)
Main:
let () =
Extensions.load ();
Hashtbl.iter (fun x _ -> print_endline x) Table.extensions;
Printf.printf "%d extensions loaded\n" (Hashtbl.length Table.extensions)
My program loads one .cma file, so it should print:
Registered extension 'test'
test
1 extensions loaded
Instead I get:
Registered extension 'test'
0 extensions loaded
I've been fighting this for several hours now; no matter how I refactor my code, I get no closer to a working solution.
EDIT: Extensions.load:
Dynlink.allow_unsafe_modules true;;
let load () =
try
let exts = Sys.readdir "exts" in
Array.iter begin fun name ->
try
Dynlink.loadfile (Filename.concat "exts" name);
Printf.printf "Loaded %s\n" name;
with
| Dynlink.Error error -> print_endline (Dynlink.error_message error)
| exn -> print_endline (Printexc.to_string exn)
end exts
with _ -> ()
#ygrek, you were right, there were two instances.
The solution was to build/load just the .cmo, not a .cma.