How to list files of a given extension in OCaml - ocaml

I want to retrieve the list of direct files (i.e. no recursive search) of a given directory and a given extension in OCaml.
I tried the following but:
It does not look OCaml-spirit
It does not work (error of import)
let list_osc2 =
let list_files = Sys.readdir "tests/osc2/expected/pp" in
List.filter (fun x -> Str.last_chars x 4 = ".osc2") (Array.to_list list_files)
I got the error (I am using OCamlPro):
Required module `Str' is unavailable
Thanks

You can use Filename.extension instead of Str.last_chars:
let list_osc2 =
let list_files = Sys.readdir "tests/osc2/expected/pp" in
List.filter (fun x -> Filename.extension x = ".osc2") (Array.to_list list_files)
and then use the pipe operator to make it a bit more readable:
let list_osc2 =
Sys.readdir "tests/osc2/expected/pp"
|> Array.to_list
|> List.filter (fun x -> Filename.extension x = "osc2")
I don't know how you expect this to work in OCamlPro though, as it doesn't have a filesystem as far as I'm aware.

To use the Str module, you need to link with the str library. For example, with ocamlc, you need to pass str.cma, and with ocamlopt, you need to pass str.cmxa. I don't know how to do that with OcamlPro.
In any case, Str.last_chars is not particularly useful here. It doesn't work if the file name is shorter than the suffix. By the way, your code would never match because ".osc2" is 5 characters, which is never equal to last_chars x 4.
The Filename module from the standard library has functions to extract and check a file's extension. You don't need to do any string manipulation.
I don't know what you consider “ugly as hell”, but apart from the mistake with string manipulation, I don't see any problem with your code. Enumerating the matches and filtering them is perfectly idiomatic.
let list_osc2 =
let list_files = Sys.readdir "tests/osc2/expected/pp" in
List.filter (fun name -> check_suffix name ".osc2") (Array.to_list list_files)

Related

Combination takeWhile, skipWhile

In F#, I find when I want to use takeWhile, I usually also want to use skipWhile, that is, take the list prefix that satisfies a predicate, and also remember the rest of the list for subsequent processing. I don't think there is a standard library function that does both, but I can write one easily enough.
My question is, what should this combination be called? It's obvious enough that there should be a standard name for it; what is it? Best I've thought of so far is split, which seems consistent with splitAt.
span is another name I've seen for this function. For example, in Haskell
This part of your question stood out to me (emphasis mine):
take the list prefix that satisfies a predicate, and also remember the rest of the list for subsequent processing
I am guessing that you want to recurse with the rest of the list and then apply this splitting function again. This is what I have wanted to do a few times before. Initially, I wrote the function that I think you are describing but after giving it more thought I realised that there might be a more general way to think about it and avoid the recursion completely, which usually makes code simpler. This is the function I came up with.
module List =
let groupAdjacentBy f xs =
let mutable prevKey, i = None, 0
xs
|> List.groupBy (fun x ->
let key = f x
if prevKey <> Some key then
i <- i + 1
prevKey <- Some key
(i, key))
|> List.map (fun ((_, k), v) -> (k, v))
let even x = x % 2 = 0
List.groupAdjacentBy even [1; 3; 2; 5; 4; 6]
// [(false, [1; 3]); (true, [2]); (false, [5]); (true, [4; 6])]
I found this one easier to name and more useful. Maybe it works for your current problem. If you don't need the group keys then you can get rid of them by adding |> List.map snd.
As much as I usually avoid mutation, using it here allowed me to use List.groupBy and avoid writing more code.
.slice could capture the intent of a contiguous range:
List.slice skipPredicate takePredicate

OCaml |> operator

Could someone explain what the |> operator does? This code was taken from the reference here:
let m = PairsMap.(empty |> add (0,1) "hello" |> add (1,0) "world")
I can see what it does, but I wouldn't know how to apply the |> operator otherwise.
For that matter, I have no idea what the Module.() syntax is doing either. An explanation on that would be nice too.
Module.(e) is equivalent to let open Module in e. It is a shorthand syntax to introduce things in scope.
The operator |> is defined in module Pervasives as let (|>) x f = f x. (In fact, it is defined as an external primitive, easier to compile. This is unimportant here.) It is the reverse application function, that makes it easier to chain successive calls. Without it, you would need to write
let m = PairsMap.(add (1,0) "world" (add (0,1) "hello" empty))
that requires more parentheses.
The |> operator looks like the | in bash.
The basic idea is that
e |> f = f e
It is a way to write your applications in the order of execution.
As an exemple you could use it (I don't particularly think you should though) to avoid lets:
12 |> fun x -> e
instead of
let x = 12 in e
For the Module.() thing, it is to use a specific function of a given module.
You probably have seen List.map before.
You could of course use open List and then only refer to the function with map. But if you also open Array afterwards, map is now referring to Array.map so you need to use List.map.
The |> operator represents reverse function application. It sounds complicated but it just means you can put the function (and maybe a few extra parameters) after the value you want to apply it to. This lets you build up something that looks like a Unix pipeline:
# let ( |> ) x f = f x;;
val ( |> ) : 'a -> ('a -> 'b) -> 'b = <fun>
# 0.0 |> sin |> exp;;
- : float = 1.
The notation Module.(expr) is used to open the module temporarily for the one expression. In other words, you can use names from the module directly in the expression, without having to prefix the module name.

OCaml - Creating a function which prompts for floats and returns a list of floats

I'm teaching myself OCaml and I sometimes need to create a function where I'm not really sure what the proper solution should be. Here's one that I'm a little confused about.
I need a function that will prompt the user for individual float values and return everything entered in a float list. I can create this function but I'm not sure if its the proper/best way to do it in Ocaml.
Here's my attempt.
let rec get_floats() =
match
(
try Some(read_float())
with
| float_of_string -> None
)
with
| None -> []
| Some s -> s :: get_floats();;
This code works buts I'm at a loss deciding if its a 'proper OCaml' solution. Note, to exit the function and return the float list just enter a non-integer value.
(I hope that) this is a simple peephole rewrite involving no thought whatsoever of the function in your question:
let rec get_floats() =
try
let f = read_float() in (* as suggested by Martin Jambon *)
f :: (get_floats())
with
| float_of_string -> []
The idea I tried to apply here is that you do not need to convert the success/failure of read_float into an option that you immediately match: just do what you have to do with the value read, and let the with handle the failure case.
Now that I think of it, I should point out that in both your question and my rewrite, float_of_string is a fresh variable. If you meant to match a specific exception, you failed at it: all exception constructors, like datatype constructors, are Capitalized. You might as well have written with _ -> instead of with float_of_string ->, and a recent version of OCaml with all warnings active should tell you that your function (or mine) binds a variable float_of_string without ever using it.
Thanks everyone for the help. This works.
let rec get_floats() =
try
let x = read_float() in
x :: get_floats()
with
| _ -> [];;
List.iter (fun x -> print_endline(string_of_float x)) (get_floats());;

how to generate a list quickly iterating a file

Me coming from a c# and python background, feels there must be a better way to read a file and populate a classic F# list. But then I know that a f# list is immutable. There must be an alternative using a List<string> object and calling its Add method.
So far what I have at hand:
let ptr = new StreamReader("stop-words.txt")
let lst = new List<string>()
let ProcessLine line =
match line with
| null -> false
| s ->
lst.Add(s)
true
while ProcessLine (ptr.ReadLine()) do ()
If I were to write the similar stuff in python I'd do something like:
[x[:-1] for x in open('stop-words.txt')]
Simple solution
System.IO.File.ReadAllLines(filename) |> List.ofArray
Although you can write a recursive function
let processline fname =
let file = new System.IO.StreamReader("stop-words.txt")
let rec dowork() =
match file.ReadLine() with
|null -> []
|t -> t::(dowork())
dowork()
If you want to read all lines from a file, you can just use ReadAllLines. The method returns the data as an array, but you can easily turn that into F# list using List.ofArray or process it using the functions in the Seq module:
open System.IO
File.ReadAllLines("stop-words.txt")
Alternatively, if you do not want to read all the contents into memory, you can use File.ReadLines which reads the lines lazily.

Idiomatic way to use OCaml/Core modules?

I wrote the following function while following Real World OCaml, which uses the Core library.
open Core.Core_list
open Core.Option
open Core.Std
open Re2
let getMaxFilename target =
let Ok pat = Regex.create "^.*(..)\\.txt$" in
Sys.ls_dir target |>
List.map ~f:(Regex.find_submatches pat) |>
List.filter ~f:is_ok |>
List.map ~f:(fun x -> ok_exn x |> Array.to_list |> (Fn.flip nth_exn) 1 |> fun x -> value_exn x) |>
List.reduce ~f:max
It looks messy to me since I have a lot of "opens" at the top and I have to name List, Array, Sys, Fn, and the other modules names in all the functions that I use. This is the "right" way to write OCaml? Is there a standard style that dispenses with these?
I'm not sure this is the best way to do this, but here's a fairly straight-ahead stylistic cleanup, without really doing anything material.
open Core.Std
module Regex = Re2.Regex
let get_max_filename target =
let pat = Regex.create_exn "^.*(..)\\.txt$" in
Sys.ls_dir target
|> List.map ~f:(Regex.find_submatches pat)
|> List.filter_map ~f:Result.ok
|> List.filter_map ~f:(fun x -> x.(1))
|> List.reduce ~f:max
Generally speaking, heavy use of open is frowned upon.
The following might be yet clearer and easier to follow.
let get_max_filename target =
let pat = Regex.create_exn "^.*(..)\\.txt$" in
Sys.ls_dir target
|> List.filter_map ~f:(fun entry ->
match Regex.find_submatches pat entry with
| Error _ -> None
| Ok ar -> ar.(1))
|> List.reduce ~f:max