How to convert char list to string in OCaml? - list

I have a char list ['a';'b';'c']
How do I convert this to the string "abc"?
thanks x

You can create a string of a length, equal to the length of the list, and then fold over the list, with a counter and initialize the string with the contents of the list... But, since OCaml 4.02, the string type started to shift in the direction of immutability (and became immutable in 4.06), you should start to treat strings, as an immutable data structure. So, let's try another solution. There is the Buffer module that is use specifically for the string building:
# let buf = Buffer.create 16;;
val buf : Buffer.t = <abstr>
# List.iter (Buffer.add_char buf) ['a'; 'b'; 'c'];;
- : unit = ()
# Buffer.contents buf;;
- : string = "abc"
Or, as a function:
let string_of_chars chars =
let buf = Buffer.create 16 in
List.iter (Buffer.add_char buf) chars;
Buffer.contents buf

let cl2s cl = String.concat "" (List.map (String.make 1) cl)

Since OCaml 4.07, you can use sequences to easily do that.
let l = ['a';'b';'c'] in
let s = String.of_seq (List.to_seq l) in
assert ( s = "abc" )

Commonly used Base library also offers Base.String.of_char_list

Related

How to read lists of numbers from a file using string formats in OCaml

I want to get the list of numbers present in a file in a specific format. But I did not get any format (like %s %d) for list of numbers.
My file contains text as follows:
[1;2] [2] 5
[45;37] [9] 33
[3] [2;4] 1000
I tried the following
value split_input str fmt = Scanf.sscanf str fmt (fun x y z -> (x,y,z));
value rec read_file chin acc fmt =
try let line = input_line chin in
let (a,b,c) = split_input line fmt in
let acc = List.append acc [(a,b,c)] in
read_file chin acc fmt
with
[ End_of_file -> do { close_in chin; acc}
];
value read_list =
let chin = open_in "filepath/filename" in
read_file chin [] "%s %s %d";
The problem is with the format that is specified towards the end. I used the same code for getting data from some other file, where the data was in the format (string * string * int).
To reuse the same code I have to receive the above text in string and then split according to my requirement. My question is: is there a format like %s %d for a list of integers, so that I get the list directly from the file instead of writing another code to convert string to list.
There is no built-in specifier for lists in Scanf. It is possible to use the %r specifier to delegate parsing to custom scanner, but Scanf is not really designed for parsing complex format:
let int_list b = Scanf.bscanf b "[%s#]" (fun s ->
List.map int_of_string ## String.split_on_char ';' s
)
Then with this int_list parser, we can write
let test = Scanf.sscanf "[1;2]#[3;4]" "%r#%r" int_list int_list (#)
and obtain
val test : int list = [1; 2; 3; 4]
as expected. But at the same time, it was easier to use String.split_on_char to do the splitting. In general parsing complicated format is better done with
a regexp library, a parser combinator library or a parser generator.
P.S: you should probably avoid the revised syntax, it has fallen into disuse.

SML program to delete char from string

I am newbie to SML, trying to write recursive program to delete chars from a string:
remCharR: char * string -> string
So far wrote this non-recursive prog. Need help to write recursive one.
- fun stripchars(string,chars) = let
= fun aux c =
= if String.isSubstring(str c) chars then
= ""
= else
= str c
= in
= String.translate aux string
= end
= ;
You have already found a very idiomatic way to do this. Explicit recursion is not a goal in itself, except perhaps in a learning environment. That is, explicit recursion is, compared to your current solution, encumbered with a description of the mechanics of how you achieve the result, but not what the result is.
Here is one way you can use explicit recursion by converting to a list:
fun remCharR (c, s) =
let fun rem [] = []
| rem (c'::cs) =
if c = c'
then rem cs
else c'::rem cs
in implode (rem (explode s)) end
The conversion to list (using explode) is inefficient, since you can iterate the elements of a string without creating a list of the same elements. Generating a list of non-removed chars is not necessarily a bad choice, though, since with immutable strings, you don't know exactly how long your end-result is going to be without first having traversed the string. The String.translate function produces a list of strings which it then concatenates. You could do something similar.
So if you replace the initial conversion to list with a string traversal (fold),
fun fold_string f e0 s =
let val max = String.size s
fun aux i e =
if i < max
then let val c = String.sub (s, i)
in aux (i+1) (f (c, e))
end
else e
in aux 0 e0 end
you could then create a string-based filter function (much alike the String.translate function you already found, but less general):
fun string_filter p s =
implode (fold_string (fn (c, res) => if p c then c::res else res) [] s)
fun remCharR (c, s) =
string_filter (fn c' => c <> c') s
Except, you'll notice, it accidentally reverses the string because it folds from the left; you can fold from the right (efficient, but different semantics) or reverse the list (inefficient). I'll leave that as an exercise for you to choose between and improve.
As you can see, in avoiding String.translate I've built other generic helper functions so that the remCharR function does not contain explicit recursion, but rather depends on more readable high-level functions.
Update: String.translate actually does some pretty smart things wrt. memory use.
Here is Moscow ML's version of String.translate:
fun translate f s =
Strbase.translate f (s, 0, size s);
with Strbase.translate looking like:
fun translate f (s,i,n) =
let val stop = i+n
fun h j res = if j>=stop then res
else h (j+1) (f(sub_ s j) :: res)
in revconcat(h i []) end;
and with the helper function revconcat:
fun revconcat strs =
let fun acc [] len = len
| acc (v1::vr) len = acc vr (size v1 + len)
val len = acc strs 0
val newstr = if len > maxlen then raise Size else mkstring_ len
fun copyall to [] = () (* Now: to = 0. *)
| copyall to (v1::vr) =
let val len1 = size v1
val to = to - len1
in blit_ v1 0 newstr to len1; copyall to vr end
in copyall len strs; newstr end;
So it first calculates the total length of the final string by summing the length of each sub-string generated by String.translate, and then it uses compiler-internal, mutable functions (mkstring_, blit_) to copy the translated strings into the final result string.
You can achieve a similar optimization when you know that each character in the input string will result in 0 or 1 characters in the output string. The String.translate function can't, since the result of a translate can be multiple characters. So an alternative implementation uses CharArray. For example:
Find the number of elements in the new string,
fun countP p s =
fold_string (fn (c, total) => if p c
then total + 1
else total) 0 s
Construct a temporary, mutable CharArray, update it and convert it to string:
fun string_filter p s =
let val newSize = countP p s
val charArr = CharArray.array (newSize, #"x")
fun update (c, (newPos, oldPos)) =
if p c
then ( CharArray.update (charArr, newPos, c) ; (newPos+1, oldPos+1) )
else (newPos, oldPos+1)
in fold_string update (0,0) s
; CharArray.vector charArr
end
fun remCharR (c, s) =
string_filter (fn c' => c <> c') s
You'll notice that remCharR is the same, only the implementation of string_filter varied, thanks to some degree of abstraction. This implementation uses recursion via fold_string, but is otherwise comparable to a for loop that updates the index of an array. So while it is recursive, it's also not very abstract.
Considering that you get optimizations comparable to these using String.translate without the low-level complexity of mutable arrays, I don't think this is worthwhile unless you start to experience performance problems.

Tokenize string with parameterised delimiter

I need to tokenize string to list of words in Standard ML based on a delimeter which is to be passed as a function parameter. This is the code I have so far:
val splitter = String.token(fn (c:string,x:char) => c=x);
I tried this but i know its wrong .Please help me to modify it.
the type of c is string while the type of x is char. They are not comparable. You can convert x to string with Char.toString.
splitter = String.token(fn (c:string,x:char) => c=Char.toString x);
There is no standard library function called String.token, but maybe you mean String.tokens:
- String.tokens;
> val it = fn : (char -> bool) -> string -> string list
You're not saying if your separator is a string or a char, but assuming it's a char,
fun splitter sep s = String.tokens (fn c => c = sep) s
You could also define it as such,
fun curry f a b = f (a, b)
val splitter = String.tokens o curry op=

Empty character in OCaml

I am trying to do something fairly simple. I want to take a string such as "1,000" and return the string "1000".
Here was my attempt:
String.map (function x -> if x = ',' then '' else x) "1,000";;
however I get a compiler error saying there is a syntax error wrt ''
Thanks for the insight!
Unfortunately, there's no character like the one you're looking for. There is a string that's 0 characters long (""), but there's no character that's not there at all. All characters (so to speak) are 1 character.
To solve your problem you need a more general operation than String.map. The essence of a map is that its input and output have the same shape but different contents. For strings this means that the input and output are strings of the same length.
Unless you really want to avoid imperative coding (which is actually a great thing to avoid, especially when starting out with OCaml), you would probably do best using String.iter and a buffer (from the Buffer module).
Update
The string_map_partial function given by Andreas Rossberg is pretty nice. Here's another implementation that uses String.iter and a buffer:
let string_map_partial f s =
let b = Buffer.create (String.length s) in
let addperhaps c =
match f c with
| None -> ()
| Some c' -> Buffer.add_char b c'
in
String.iter addperhaps s;
Buffer.contents b
Just an alternate implementation with different stylistic tradeoffs. Not faster, probably not slower either. It's still written imperatively (for the same reason).
What you'd need here is a function like the following, which unfortunately is not in the standard library:
(* string_map_partial : (char -> char option) -> string -> string *)
let string_map_partial f s =
let buf = String.create (String.length s) in
let j = ref 0 in
for i = 0 to String.length s - 1 do
match f s.[i] with
| None -> ()
| Some c -> buf.[!j] <- c; incr j
done;
String.sub buf 0 !j
You can then write:
string_map_partial (fun c -> if c = ',' then None else Some c) "1,000"
(Note: I chose an imperative implementation for string_map_partial, because a purely functional one would require repeated string concatenation, which is fairly expensive in OCaml.)
A purely functional version could be this one:
let string_map_partial f s =
let n = String.length s in
let rec map_str i acc =
if i < n then
map_str (i + 1) (acc ^ (f (String.make 1 s.[i])))
else acc
in map_str 0 ""
Which is terminal recursive, but less performant than the imperative version.

print string buffer from file contents

I want to print the contents of a file. I tried to use a string buffer:
let ch = open_in "myfile.txt" in
let buf = Buffer.create 1024 in
(try Buffer.add_channel buf ch max_int with _ -> ());
close_in ch;
let string = Buffer.contents buf
print_endline string
this just gives me a syntax error.
How can I do this?
You need to give the right channel length:
let ic = open_in "foo" in
let len = in_channel_length ic in
let buf = Buffer.create len in
Buffer.add_channel bif ic len;
let str = Buffer.contents b in
print_endline str
The only syntax error I see is a missing in after let string = Buffer.contents buf.
The purpose of Buffer.add_channel is to add exactly the given number of characters from the given channel to the buffer. Unless your file "myfile.txt" is exceptionally large, the buffer will be empty when you print it out.
In fact on my system (a 64-bit system), max_int is so large that Buffer.add_channel doesn't even try to read that much data. It raises an Invalid_argument exception.