I am basically trying to read a large file (around 10G) into a list of lines. The file contains a sequence of integer, something like this:
0x123456
0x123123
0x123123
.....
I used the method below to read files by default for my codebase, but it turns out to be quit slow (~12 minutes) at this scenario
let lines_from_file (filename : string) : string list =
let lines = ref [] in
let chan = open_in filename in
try
while true; do
lines := input_line chan :: !lines
done; []
with End_of_file ->
close_in chan;
List.rev !lines;;
I guess I need to read the file into memory, and then split them into lines (I am using a 128G server, so it should be fine for the memory space). But I still didn't understand whether OCaml provides such facility after searching the documents here.
So here is my question:
Given my situation, how to read files into string list in a fast way?
How about using stream? But I need to adjust related application code, then that could cause some time.
First of all you should consider whether you really need to have all the information at once in your memory. Maybe it is better to process file line-by-line?
If you really want to have it all at once in memory, then you can use Bigarray's map_file function to map a file as an array of characters. And then do something with it.
Also, as I see, this file contains numbers. Maybe it is better to allocate the array (or even better a bigarray) and the process each line in order and store integers in the (big)array.
I often use the two following function to read the lines of a file. Note that the function lines_from_files is tail-recursive.
let read_line i = try Some (input_line i) with End_of_file -> None
let lines_from_files filename =
let rec lines_from_files_aux i acc = match (read_line i) with
| None -> List.rev acc
| Some s -> lines_from_files_aux i (s :: acc) in
lines_from_files_aux (open_in filename) []
let () =
lines_from_files "foo"
|> List.iter (Printf.printf "lines = %s\n")
This should work:
let rec ints_from_file fdesc =
try
let l = input_line fdesc in
let l' = int_of_string l in
l' :: ints_from_file fdesc
with | _ -> []
This solution converts the strings to integers as they're read in (which should be a bit more memory efficient, and I assume this was going to be done to them eventually.
Also, because it is recursive, the file must be opened outside of the function call.
Related
I want to process the data present in file "persons.txt".
But i have tried everything to process all the lines from text file.
The only way i can process data is by creating the list manually.
let myList = ["John";"23"]
I want the program to iterate through all the lines of the text file.
I have managed a way to pass all the content of the text file into a list but i can+t seem to move on from that stage.
My way of thinking is:
Read content from text file
Convert to OCaml list
Separate list into sublists
Iterate through sublists
Only print to screen text respecting conditions
Can you please guide me?
Thanks!!
open Printf
(* FILE CONTENTS *)
(*
John;23;
Mary;16;
Anne;21;
*)
let file = "data/persons.txt"
;;
(* READ FROM EXTERNAL FILE *)
let read_lines name : string list =
if Sys.file_exists (name) then
begin
let ic = open_in name in
try
let try_read () =
try Some (input_line ic) with End_of_file -> None in
let rec loop acc = match try_read () with
| Some s -> loop (s :: acc)
| None -> close_in_noerr ic; List.rev acc in
loop []
with e ->
close_in_noerr ic;
[]
end
else
[]
;;
(...)
Your question is not at all clear. Here are some observations:
First, your read_lines function doesn't return the input in the form you need.
What read_lines returns looks like this:
["John;23;"; "Mary;16;"; "Anne;21;"]
But what you want is something more like this:
[("John", "23)"; ("Mary", "16"); ("Anne", "21")]
The key here is to split the strings into pieces using ; as a separator. You can probably use String.split_on_char for this.
Second, you are not defining a function to calculate an answer from paramters. Instead your calculation is based on global variables. This won't generalize.
Instead of saying this:
let adult_check_condition =
... using global age and name ...
You need to define a function:
let adult_check_condition age name =
... use parameters age and name ...
Then you can call this function with different ages and names.
I'm a student and I've been given a exercice i've been struggling with for about a month or so.
I'm trying to write a function in Ocaml. This function must read a text file which has a word per line, and it must store all the words in a list.
But the problem is that this program must be a recursive one (which means no loops, no "while").
All I've been able to do so far is to create a function which reads the text file (pretty much like the BASH command "cat")
let dico filename =
let f = open_in filename in
let rec dico_rec () =
try
print_string (input_line f);
print_newline ();
dico_rec();
with End_of_file -> close_in f
in dico_rec() ;;
I just don't know how to do it. Ocaml is hardly my favourite language.
Here's an alternate definition of build_list that is tail recursive. You can use it instead of #MitchellGouzenko's definition if your inputs can have many lines.
let rec build_list l =
match input_line ic with
| line -> build_list (line :: l)
| exception End_of_file -> close_in ic; List.rev l
open Printf
let file = "example.dat"
let () =
let ic = open_in file in
let rec build_list infile =
try
let line = input_line infile in
line :: build_list(infile)
with End_of_file ->
close_in infile;
[] in
let rec print_list = function
[] -> ()
| e::l -> print_string e ; print_string " " ; print_list l in
print_list(build_list(ic))
Edit: The algorithm I previously proposed was unnecessarily complicated. Try to understand this one instead.
To understand this recursion, we assume that build_list works correctly. That is, assume build_list correctly takes an open file as an argument and returns a list of lines in the file.
Now, let's look at the function's body. It reads a line from the file and calls build_list again. If there are N lines in the file, calling build_list again should return a list of the remaining N-1 lines in the file (since we just read the first line ourselves). We append the line we just read to the list returned from build_list, and return the resulting list, which has all N lines.
The recursion continues until it hits the base case. The base case is when there's an End_of_file. In this case we return an empty list.
I have a input.txt file with few lines of text. I am trying to store those lines in a list l. I think I am doing correct but list l is not getting updated. please help.
let l = []
let () =
let ic = open_in "input.txt"
in
try
while true do
let line = input_line ic
in
let rec append(a, b) = match a with
|[] -> [b]
|c::cs -> c::append(cs,b)
in
append(l, line)
(* print_endline line *)
done
with End_of_file ->
close_in ic;;
Apart from Warning 10, I am not getting any error.
let l = []
Variables in OCaml are immutable, so no matter what code you write after this line, l will always be equal to [].
It looks like you are caught in imperative programming - a good thing to start with OCaml!
Typical functional and recursive programming would read a file like this:
Read a line, then append "read a line" to it. At End_of_File you finish the list with [].
I have a function save that take standard input, which is used individually like this:
./try < input.txt (* save function is in try file *)
input.txt
2
3
10 29 23
22 14 9
and now i put the function into another file called path.ml which is a part of my interpreter. Now I have a problem in defining the type of Save function and this is because save function has type in_channel, but when i write
type term = Save of in_channel
ocamlc complain about the parameter in the command function.
How can i fix this error? This is the reason why in my last question posted on stackoverflow, I asked for the way to express a variable that accept any type. I understand the answers but actually it doesn't help much in make the code running.
This is my code:
(* Data types *)
open Printf
type term = Print_line_in_file of int*string
| Print of string
| Save of in_channel (* error here *)
;;
let input_line_opt ic =
try Some (input_line ic)
with End_of_file -> None
let nth_line n filename =
let ic = open_in filename in
let rec aux i =
match input_line_opt ic with
| Some line ->
if i = n then begin
close_in ic;
(line)
end else aux (succ i)
| None ->
close_in ic;
failwith "end of file reached"
in
aux 1
(* get all lines *)
let k = ref 1
let first = ref ""
let second = ref ""
let sequence = ref []
let append_item lst a = lst # [a]
let save () =
try
while true do
let line = input_line stdin in
if k = ref 1
then
begin
first := line;
incr k;
end else
if k = ref 2
then
begin
second := line;
incr k;
end else
begin
sequence := append_item !sequence line;
incr k;
end
done;
None
with
End_of_file -> None;;
let rec command term = match term with
| Print (n) -> print_endline n
| Print_line_in_file (n, f) -> print_endline (nth_line n f)
| Save () -> save ()
;;
EDIT
Error in code:
Save of in_channel:
Error: This pattern matches values of type unit
but a pattern was expected which matches values of type in_channel
Save of unit:
Error: This expression has type 'a option
but an expression was expected of type unit
There are many errors in this code, so it's hard to know where to start.
One problem is this: your save function has type unit -> 'a option. So it's not the same type as the other branches of your final match. The fix is straightforward: save should return (), not None. In OCaml these are completely different things.
The immediate problem seems to be that you have Save () in your match, but have declared Save as taking an input channel. Your current code doesn't have any way to pass the input channel to the save function, but if it did, you would want something more like this in your match:
| Save ch -> save ch
Errors like this suggest (to me) that you're not so familiar with OCaml's type system. It would probably save you a lot of trouble if you went through a tutorial of some kind before writing much more code. You can find tutorials at http://ocaml.org.
I am trying to learn input output in sml.In an effort to copy strings of lsthat are the same as s1 into the file l2 I did the following.I am getting some errors I can not really understand.Can someone help me out.
fun test(l2:string,ls:string list,s1:string) = if (String.isSubstring(s1 hd(ls))) then
(TextIO.openOut l2; TextIO.inputLine hd(ls))::test(l2,tl(ls),s1) else
test(l2,tl(ls),s1);
Here are some general hints:
Name your variables something meaningful, like filename, lines and line.
The function TextIO.inputLine takes as argument a value of type instream.
When you write TextIO.inputLine hd(ls), what this is actually interpreted as is
(TextIO.inputLine hd) ls, which means "treat hd as if it were an instream and
try and read a line from it, take that line and treat it as if it were a function,
and apply it on ls", which is of course complete nonsense.
The proper parenthesising in this case would be TextIO.inputLine (hd ls), which
still does not make sense, since we decided that ls is a string list, and so hd ls
will be a string and not an instream.
Here is something that resembles what you want to do, but opposite:
(* Open a file, read each line from file and return those that contain mySubstr *)
fun test (filename, mySubstr) =
let val instr = TextIO.openIn filename
fun loop () = case TextIO.inputLine instr of
SOME line => if String.isSubstring mySubstr line
then line :: loop () else loop ()
| NONE => []
val lines = loop ()
val _ = TextIO.closeIn instr
in lines end
You need to use TextIO.openOut and TextIO.output instead. TextIO.inputLine is one that reads from files.