I am trying to do something fairly simple. I want to take a string such as "1,000" and return the string "1000".
Here was my attempt:
String.map (function x -> if x = ',' then '' else x) "1,000";;
however I get a compiler error saying there is a syntax error wrt ''
Thanks for the insight!
Unfortunately, there's no character like the one you're looking for. There is a string that's 0 characters long (""), but there's no character that's not there at all. All characters (so to speak) are 1 character.
To solve your problem you need a more general operation than String.map. The essence of a map is that its input and output have the same shape but different contents. For strings this means that the input and output are strings of the same length.
Unless you really want to avoid imperative coding (which is actually a great thing to avoid, especially when starting out with OCaml), you would probably do best using String.iter and a buffer (from the Buffer module).
Update
The string_map_partial function given by Andreas Rossberg is pretty nice. Here's another implementation that uses String.iter and a buffer:
let string_map_partial f s =
let b = Buffer.create (String.length s) in
let addperhaps c =
match f c with
| None -> ()
| Some c' -> Buffer.add_char b c'
in
String.iter addperhaps s;
Buffer.contents b
Just an alternate implementation with different stylistic tradeoffs. Not faster, probably not slower either. It's still written imperatively (for the same reason).
What you'd need here is a function like the following, which unfortunately is not in the standard library:
(* string_map_partial : (char -> char option) -> string -> string *)
let string_map_partial f s =
let buf = String.create (String.length s) in
let j = ref 0 in
for i = 0 to String.length s - 1 do
match f s.[i] with
| None -> ()
| Some c -> buf.[!j] <- c; incr j
done;
String.sub buf 0 !j
You can then write:
string_map_partial (fun c -> if c = ',' then None else Some c) "1,000"
(Note: I chose an imperative implementation for string_map_partial, because a purely functional one would require repeated string concatenation, which is fairly expensive in OCaml.)
A purely functional version could be this one:
let string_map_partial f s =
let n = String.length s in
let rec map_str i acc =
if i < n then
map_str (i + 1) (acc ^ (f (String.make 1 s.[i])))
else acc
in map_str 0 ""
Which is terminal recursive, but less performant than the imperative version.
Related
I have been trying to create a cycle to call an changing function i-times , but for some reason the cycle itself always spits out an error. I have also tried an recursive function to call itself but didn't work either.
Is it even possible to make it work with for`s.
r is a list of lists.
a and b are two immutable variables.
(List.nth (r) (i)) gives an list.
let rec changing (lista: 'a zlista) (a:int) (b:int) =
match lista with
| Vazio -> failwith "NO"
| Nodo (n, l, r) ->
if a <= n && n <= b then n
else if a < n && b < n then changing l a b
else changing r a b
let rec call_changing (a: int) (b: int) =
for i=0 to ort do
changing (List.nth (r) (i)) (a) (b)
done;;
Changing returns an int, in order to call it in a for loop you have to ignore the result of the function :
for i = 0 to ort do
let _ = changing .... in ()
done
(* Or *)
for i = 0 to ort do
ignore (changing ....)
done
EDIT :
If you want to print the result you can do :
for i = 0 to ort do
Printf.printf "Result for %d iteration : %d\n" i (changing ....)
done
See the Printf documentation for more information
To perhaps generalize on Butanium's answer, OCaml is not a pure functional programming language. It does contain imperative features. Imperative features are all about side-effects. Functions which exist for the purpose of their side-effects on the system (like Printf.printf) by convention return () (the literal for the unit type).
A for loop is an imperative feature. As such, it expects that any expression (or expessions chained with ;) contained within will return unit. If they do not, you will receive warnings.
The for loop expression itself (for ... = ... to ... do ... done) returns unit so the warning can clue you in that any code in the loop which does not have side-effects is inconsequential, and while your code will compile and run, it may not do what you expect.
As a side note, I believe you may be a little overzealous with the parentheses, likely making your code harder to read.
let rec call_changing (a: int) (b: int) =
for i=0 to ort do
changing (List.nth (r) (i)) (a) (b)
done;;
Properly indented and with extraneous parens removed:
let rec call_changing (a: int) (b: int) =
for i=0 to ort do
changing (List.nth r i) a b
done;;
I am newbie to SML, trying to write recursive program to delete chars from a string:
remCharR: char * string -> string
So far wrote this non-recursive prog. Need help to write recursive one.
- fun stripchars(string,chars) = let
= fun aux c =
= if String.isSubstring(str c) chars then
= ""
= else
= str c
= in
= String.translate aux string
= end
= ;
You have already found a very idiomatic way to do this. Explicit recursion is not a goal in itself, except perhaps in a learning environment. That is, explicit recursion is, compared to your current solution, encumbered with a description of the mechanics of how you achieve the result, but not what the result is.
Here is one way you can use explicit recursion by converting to a list:
fun remCharR (c, s) =
let fun rem [] = []
| rem (c'::cs) =
if c = c'
then rem cs
else c'::rem cs
in implode (rem (explode s)) end
The conversion to list (using explode) is inefficient, since you can iterate the elements of a string without creating a list of the same elements. Generating a list of non-removed chars is not necessarily a bad choice, though, since with immutable strings, you don't know exactly how long your end-result is going to be without first having traversed the string. The String.translate function produces a list of strings which it then concatenates. You could do something similar.
So if you replace the initial conversion to list with a string traversal (fold),
fun fold_string f e0 s =
let val max = String.size s
fun aux i e =
if i < max
then let val c = String.sub (s, i)
in aux (i+1) (f (c, e))
end
else e
in aux 0 e0 end
you could then create a string-based filter function (much alike the String.translate function you already found, but less general):
fun string_filter p s =
implode (fold_string (fn (c, res) => if p c then c::res else res) [] s)
fun remCharR (c, s) =
string_filter (fn c' => c <> c') s
Except, you'll notice, it accidentally reverses the string because it folds from the left; you can fold from the right (efficient, but different semantics) or reverse the list (inefficient). I'll leave that as an exercise for you to choose between and improve.
As you can see, in avoiding String.translate I've built other generic helper functions so that the remCharR function does not contain explicit recursion, but rather depends on more readable high-level functions.
Update: String.translate actually does some pretty smart things wrt. memory use.
Here is Moscow ML's version of String.translate:
fun translate f s =
Strbase.translate f (s, 0, size s);
with Strbase.translate looking like:
fun translate f (s,i,n) =
let val stop = i+n
fun h j res = if j>=stop then res
else h (j+1) (f(sub_ s j) :: res)
in revconcat(h i []) end;
and with the helper function revconcat:
fun revconcat strs =
let fun acc [] len = len
| acc (v1::vr) len = acc vr (size v1 + len)
val len = acc strs 0
val newstr = if len > maxlen then raise Size else mkstring_ len
fun copyall to [] = () (* Now: to = 0. *)
| copyall to (v1::vr) =
let val len1 = size v1
val to = to - len1
in blit_ v1 0 newstr to len1; copyall to vr end
in copyall len strs; newstr end;
So it first calculates the total length of the final string by summing the length of each sub-string generated by String.translate, and then it uses compiler-internal, mutable functions (mkstring_, blit_) to copy the translated strings into the final result string.
You can achieve a similar optimization when you know that each character in the input string will result in 0 or 1 characters in the output string. The String.translate function can't, since the result of a translate can be multiple characters. So an alternative implementation uses CharArray. For example:
Find the number of elements in the new string,
fun countP p s =
fold_string (fn (c, total) => if p c
then total + 1
else total) 0 s
Construct a temporary, mutable CharArray, update it and convert it to string:
fun string_filter p s =
let val newSize = countP p s
val charArr = CharArray.array (newSize, #"x")
fun update (c, (newPos, oldPos)) =
if p c
then ( CharArray.update (charArr, newPos, c) ; (newPos+1, oldPos+1) )
else (newPos, oldPos+1)
in fold_string update (0,0) s
; CharArray.vector charArr
end
fun remCharR (c, s) =
string_filter (fn c' => c <> c') s
You'll notice that remCharR is the same, only the implementation of string_filter varied, thanks to some degree of abstraction. This implementation uses recursion via fold_string, but is otherwise comparable to a for loop that updates the index of an array. So while it is recursive, it's also not very abstract.
Considering that you get optimizations comparable to these using String.translate without the low-level complexity of mutable arrays, I don't think this is worthwhile unless you start to experience performance problems.
The function tally below is really simple: it takes a string s as argument, splits it on non-alphanumeric characters, and tallies the numbers of the resulting "words", case-insensitively.
open Core.Std
let tally s =
let get m k =
match Map.find m k with
| None -> 0
| Some n -> n
in
let upd m k = Map.add m ~key:k ~data:(1 + get m k) in
let re = Str.regexp "[^a-zA-Z0-9]+" in
let ws = List.map (Str.split re s) ~f:String.lowercase in
List.fold_left ws ~init:String.Map.empty ~f:upd
I think this function is harder to read than it should be due to clutter. I wish I could write something closer to this (where I've indulged in some "fantasy syntax"):
(* NOT VALID SYNTAX -- DO NOT COPY !!! *)
open Core.Std
let tally s =
let get m k =
match find m k with
| None -> 0
| Some n -> n ,
upd m k = add m k (1 + get m k) ,
re = regexp "[^a-zA-Z0-9]+" ,
ws = map (split re s) lowercase
in fold_left ws empty upd
The changes I did above fall primarily into three groups:
get rid of the repeated let ... in's, consolidated all the bindings (into a ,-separated sequence; this, AFAIK, is not valid OCaml);
got rid of the ~foo:-type noise in function calls;
got rid of the prefixes Str., List., etc.
Can I achieve similar effects using valid OCaml syntax?
Readability is difficult to achieve, it highly depends on the reader's abilities and familiarity with the code. I'll focus simply on the syntax transformations, but you could perhaps refactor the code in a more compact form, if this is what you are really looking for.
To remove the module qualifiers, simply open them beforehand:
open Str
open Map
open List
You must open them in that order to make sure the List values you are using there are still reachable, and not scope-overridden by the Map ones.
For labelled parameters, you may omit the labels if for each function call you provide all the parameters of the function in the function signature order.
To reduce the number of let...in constructs, you have several options:
Use a set of rec definitions:
let tally s =
let rec get m k =
match find m k with
| None -> 0
| Some n -> n
and upd m k = add m k (1 + get m k)
and re = regexp "[^a-zA-Z0-9]+"
and ws = map lowercase (split re s)
in fold_left ws empty upd
Make multiple definitions at once:
let tally s =
let get, upd, ws =
let re = regexp "[^a-zA-Z0-9]+" in
fun m k ->
match find m k with
| None -> 0
| Some n -> n,
fun g m k -> add m k (1 + g m k),
map lowercase (split re s)
in fold_left ws empty (upd get)
Use a module to group your definitions:
let tally s =
let module M = struct
let get m k =
match find m k with
| None -> 0
| Some n -> n
let upd m k = add m k (1 + get m k)
let re = regexp "[^a-zA-Z0-9]+"
let ws = map lowercase (split re s)
end in fold_left ws empty M.upd
The later is reminiscent of the Sml syntax, and perhaps better suited to proper optimization by the compiler, but it only get rid of the in keywords.
Please note that since I am not familiar with the Core Api, I might have written incorrect code.
If you have a sequence of computations on the same value, then in OCaml there is a |> operator, that takes a value from the left, and applies in to the function on the right. This can help you to "get rid of" let and in. What concerning labeled arguments, then you can get rid of them by falling back to a vanilla standard library, and make your code smaller, but less readable. Anyway, there is a small piece of sugar with labeled arguments, you can always write f ~key ~data instead of f ~key:key ~data:data. And, finally, module names can be removed either by local open syntax (let open List in ...) or by locally shorcutting it to a smaller names (let module L = List in).
Anyway, I would like to show you a code, that contains less clutter, to my opinion:
open Core.Std
open Re2.Std
open Re2.Infix
module Words = String.Map
let tally s =
Re2.split ~/"\\PL" s |>
List.map ~f:(fun s -> String.uppercase s, ()) |>
Words.of_alist_multi |>
Words.map ~f:List.length
Suppose I am writing an OCaml program and my input will be a large stream of integers separated by spaces i.e.
let string = input_line stdin;;
will return a string which looks like e.g. "2 4 34 765 5 ..." Now, the program itself will take a further two values i and j which specify a small subsequence of this input on which the main procedure will take place (let's say that the main procedure is the find the maximum of this sublist). In other words, the whole stream will be inputted into the program but the program will only end up acting on a small subset of the input.
My question is: what is the best way to translate the relevant part of the input stream into something usable i.e. a string of ints? One option would be to convert the whole input string into a list of ints using
let list = List.map int_of_string(Str.split (Str.regexp_string " ") string;;
and then once the bounds i and j have been entered one easily locates the relevant sublist and its maximum. The problem is that the initial pre-processing of the large stream is immensely time-consuming.
Is there an efficient way of locating the small sublist directly from the large stream i.e. processing the input along with the main procedure?
OCaml's standard library is rather small. It provides necessary and sufficient set of orthogonal features, as should do any good standard library. But, usually, this is not enough for a casual user. That's why there exist libraries, that do the stuff, that is rather common.
I would like to mention two the most prominent libraries: Jane Street's Core library and Batteries included (aka Core and Batteries).
Both libraries provides a bunch of high-level I/O functions, but there exists a little problem. It is not possible or even reasonable to try to address any use case in a library. Otherwise the library's interface wont be terse and comprehensible. And your case is non-standard. There is a convention, a tacit agreement between data engineers, to represent a set of things with a set of lines in a file. And to represent one "thing" (or a feature) with a line. So, if you have a dataset where each element is a scalar, you should represent it as a sequence of scalars separated by a newline. Several elements on a single line is only for multidimensional features.
So, with a proper representation, your problem can be solve as simple as (with Core):
open Core.Std
let () =
let filename = "data" in
let max_number =
let open In_channel in
with_file filename
~f:(fold_lines ~init:0
~f:(fun m s -> Int.(max m ## of_string s))) in
printf "Max number is %s is %d\n" filename max_number
You can compile and run this program with corebuild test.byte -- assuming that code is in a file name test.byte and core library is installed (with opam install core if you're using opam).
Also, there exists an excellent library Lwt, that provides a monadic high-level interface to the I/O. With this library, you can parse a set of scalars in a following way:
open Lwt
let program =
let filename = "data" in
let lines = Lwt_io.lines_of_file filename in
Lwt_stream.fold (fun s m -> max m ## int_of_string s) lines 0 >>=
Lwt_io.printf "Max number is %s is %d\n" filename
let () = Lwt_main.run program
This program can be compiled and run with ocamlbuild -package lwt.unix test.byte --, if lwt library is installed on your system (opam install lwt).
So, that is not to say, that your problem cannot be solved (or is hard to be solved) in OCaml, it is just to mention, that you should start with a proper representation. But, suppose, you do not own the representation, and cannot change it. Let's look, how this can be solved efficiently with OCaml. As previous examples represent, in general your problem can be described as a channel folding, i.e. an consequential application of a function f to each value in a file. So, we can define a function fold_channel, that will read an integer value from a channel and apply a function to it and the previously read value. Of course, this function can be further abstracted, by lifting the format argument, but for the demonstration purpose, I suppose, this will be enough.
let rec fold_channel f init ic =
try Scanf.fscanf ic "%u " (fun s -> fold_channel f (f s init) ic)
with End_of_file -> init
let () =
let max_value = open_in "atad" |> fold_channel max 0 in
Printf.printf "max value is %u\n" max_value
Although, I should note that this implementation is not for a heavy duty work. It is even not tail-recursive. If you need really efficient lexer, you can use ocaml's lexer generator, for example.
Update 1
Since there is a word "efficient" in the title, and everybody likes benchmarks, I've decided to compare this three implementations. Of course, since pure OCaml implementation is not tail-recursive it is not comparable to others. You may wonder, why it is not tail-recursive, as all calls to fold_channel is in a tail position. The problem is with exception handler - on each call to the fold channel, we need to remember the init value, since we're going to return it. This is a common issue with recursion and exceptions, you may google it for more examples and explanations.
So, at first we need to fix the third implementation. We will use a common trick with option value.
let id x = x
let read_int ic =
try Some (Scanf.fscanf ic "%u " id) with End_of_file -> None
let rec fold_channel f init ic =
match read_int ic with
| Some s -> fold_channel f (f s init) ic
| None -> init
let () =
let max_value = open_in "atad" |> fold_channel max 0 in
Printf.printf "max value is %u\n" max_value
So, with a new tail-recursive implementation, let's try them all on a big-data. 100_000_000 numbers is a big data for my 7 years old laptop. I've also added a C implementations as a baseline, and an OCaml clone of the C implementation:
let () =
let m = ref 0 in
try
let ic = open_in "atad" in
while true do
let n = Scanf.fscanf ic "%d " (fun x -> x) in
m := max n !m;
done
with End_of_file ->
Printf.printf "max value is %u\n" !m;
close_in ic
Update 2
Yet another implementation, that uses ocamllex. It consists of two files, a lexer specification lex_int.mll
{}
let digit = ['0'-'9']
let space = [' ' '\t' '\n']*
rule next = parse
| eof {None}
| space {next lexbuf}
| digit+ as n {Some (int_of_string n)}
{}
And the implementation:
let rec fold_channel f init buf =
match Lex_int.next buf with
| Some s -> fold_channel f (f s init) buf
| None -> init
let () =
let max_value = open_in "atad" |>
Lexing.from_channel |>
fold_channel max 0 in
Printf.printf "max value is %u\n" max_value
And here are the results:
implementation time ratio rate (MB/s)
plain C 22 s 1.0 12.5
ocamllex 33 s 1.5 8.4
Core 62 s 2.8 4.5
C-like OCaml 83 s 3.7 3.3
fold_channel 84 s 3.8 3.3
Lwt 143 s 6.5 1.9
P.S. You can see, that in this particular case Lwt is an outlier. This doesn't mean that Lwt is slow, it is just not its granularity. And I would like to assure you, that to my experience Lwt is a well suited tool for a HPC. For example, in one of my programs it processes a 30 MB/s network stream in a real-time.
Update 3
By the way, I've tried to address the problem in an abstract way, and I didn't provide a solution for your particular example (with j and k). Since, folding is a generalization of the iteration, it can be easily solved by extending the state (parameter init) to hold a counter and check whether it is contained in a range, that was specified by a user. But, this leads to an interesting consequence: what to do, when you have outran the range? Of course, you can continue to the end, just ignoring the output. Or you can non-locally exit from a function with an exception, something like raise (Done m). Core library provides such facility with a with_return function, that allows you to break out of your computation at any point.
open Core.Std
let () =
let filename = "data" in
let b1,b2 = Int.(of_string Sys.argv.(1), of_string Sys.argv.(2)) in
let range = Interval.Int.create b1 b2 in
let _,max_number =
let open In_channel in
with_return begin fun call ->
with_file filename
~f:(fold_lines ~init:(0,0)
~f:(fun (i,m) s ->
match Interval.Int.compare_value range i with
| `Below -> i+1,m
| `Within -> i+1, Int.(max m ## of_string s)
| `Above -> call.return (i,m)
| `Interval_is_empty -> failwith "empty interval"))
end in
printf "Max number is %s is %d\n" filename max_number
You may use the Scanf module family of functions. For instance, Scanf.fscanf let you read tokens from a channel according to a string format (which is a special type in OCaml).
Your program can be decomposed in two functions:
one which skip a number i of tokens from the input channel,
one which extract the maximum integer out of a number j from a channel
Let's write these:
let rec skip_tokens c i =
match i with
| i when i > 0 -> Scanf.fscanf c "%s " (fun _ -> skip_tokens c ## pred i)
| _ -> ()
let rec get_max c j m =
match j with
| j when j > 0 -> Scanf.fscanf c "%d " (fun x -> max m x |> get_max c (pred j))
| _ -> m
Note the space after the token format indicator in the string which tells the scanner to also swallow all the spaces and carriage returns in between tokens.
All you need to do now is to combine them. Here's a small program you can run from the CLI which takes the i and j parameters, expects a stream of tokens, and print out the maximum value as wanted:
let _ =
let i = int_of_string Sys.argv.(1)
and j = int_of_string Sys.argv.(2) in
skip_tokens stdin (pred i);
get_max stdin j min_int |> print_int;
print_newline ()
You could probably write more flexible combinators by extracting the recursive part out. I'll leave this as an exercise for the reader.
I am absolute OCaml beginner and have an assignment about more code. I have got the following code, but I don't know how it works. If someone can help me out, I appreciate it.
# let explode str = (*defines function that explodes argument str witch is type
string into list of chars*)
let rec exp = function (*defines recursive function exp*)
| a, b when a < 0 -> b (*this part i dont know.is this pattern
matching ?is it function with arguments a and b
and they go into expression? when is a guard and
then we have if a is smaller than 0 then b *)
(*if a is not smaller than 0 then this function ? *)
| a, b -> exp (a-1, str.[a]::b) (*this i dont know, a and b are arguments
that go into recursive function in the way
that a is decreesed by one and b goes into
string a?? *)
in
exp ((String.length str)-1, []);; (*defined function exp on string lenght of
str decresed by one (why?) [ ]these
brackets mean or tell some kind of type ? *)
# let split lst ch =
let rec split = function (* defines recursive fun split *)
| [], ch, cacc', aacc' -> cacc'::aacc'(* if empty ...this is about what i got
so far :) *)
| c::lst, ch, cacc', aacc' when c = ch -> split (lst, ch, [], cacc'::aacc')
| c::lst, ch, cacc', aacc' -> split (lst, ch, c::cacc', aacc')
in
split (lst, ch, [], []);;
val split : 'a list -> 'a -> 'a list list = <fun>
This code is ugly. Whoever has been giving that to you is making you a disservice. If a student of mine wrote that, I would ask them to rewrite them without using when conditionals, because they tend to be confusing, encourage to write pattern-matching-heavy code at places where they are not warranted.
As a rule of the thumb, beginners should never use when. A simple if..then..else test provides an increase in readability.
Here are equivalent versions of those two functions, rewritten for readability:
let explode str =
let rec exp a b =
if a < 0 then b
else exp (a - 1) (str.[a] :: b)
in
exp (String.length str - 1) []
let split input delim_char =
let rec split input curr_word past_words =
match input with
| [] -> curr_word :: past_words
| c :: rest ->
if c = delim_char
then split rest [] (curr_word :: past_words)
else split rest (c :: curr_word) past_words
in
split input [] []
My advice to understand them is to run them yourself, on a given example, on paper. Just write down the function call (eg. explode "foo" and split 'b' ['a';'b';'c';'d']), expand the definition, evaluate the code to get another expression, etc., until you get to the result. Here is an example:
explode "fo"
=>
exp (String.length "fo" - 1) []
=>
exp 1 []
=>
if 1 < 0 then [] else exp 0 ("fo".[1] :: [])
=>
exp 0 ("fo".[1] :: [])
=>
exp 0 ('o' :: [])
=>
exp 0 ['o']
=>
if 0 < 0 then ['o'] else exp (-1) ("fo".[0] :: ['o'])
=>
exp (-1) ("fo".[0] :: ['o'])
=>
exp (-1) ('f' :: ['o'])
=>
exp (-1) ['f'; 'o']
=>
if -1 < 0 then ['f'; 'o'] else exp (-2) ("fo".[-1] :: ['o'])
=>
['f'; 'o']
Take the care to do that, for each function, and any function you will have problem understanding. On a small example. That's the best way to get a global view of what's going on.
(Later when you grow more used to recursion, you'll find out that you don't actually need to do that, you can reason inductively on the function: make an assumption on what they do, and assuming that recursive calls actually do that, check that it indeed does it. In more advanced cases, trying to hold all the execution in one's head is just too hard, and this induction technique works better, but it is more high-level and requires more practices. First begin by simply running the code.)
If you're using the Core library you can just use
String.to_list "BKMGTPEZY"
Which will return a list of chars if you want strings just map it:
String.to_list "BKMGTPEZY" |> List.map ~f:Char.to_string
Outputs:
- : bytes list = ["B"; "K"; "M"; "G"; "T"; "P"; "E"; "Z"; "Y"]
As a function
let explode s = String.to_list s |> List.map ~f:Char.to_string
You can also implement in this way.
let rec strexp s =
if length(s)==0 then
[]
else
(strexp (sub s 0 (length(s)-1)))#(s.[length(s)-1]::[])
;;