How to schedule a task in OCaml?

How to schedule a task in OCaml? - ocaml

I have a task need to be done every 4 hours or once a day.
In Java, it has quartz or spring or timer.
But in OCaml, how do I do that? Any good lib for that?

I don't know any library to do that, but I think you can easily implement that kind of behavior using the Lwt library.
Little example, to print Hello world every 4 hours :
let rec hello () =
Lwt.bind (Lwt_unix.sleep 14400.)
(fun () -> print_endline "Hello, world !"; hello ())
Lwt.async (hello)
The Lwt.async function call the function given (here, hello) in an asynchronous light weight thread, so you're free to do other stuff in your program. As long as your program doesn't exit, "Hello world" will be printed every 4 hours.
If you want to be able to stop it, you can also launch the thread like this instead of Lwt.async :
let a = hello ()
And then, to stop the thread :
Lwt.cancel a
Be aware that Lwt.cancel throws a "Lwt.canceled" exception !
Then, to be able to launch a task at a particular time of day, I can only encourage you to use functions from the Unix module, like localtime and mktime.

Related

File Server erlang response

I am trying to follow the very first example given in the Programming Erlang, Software for a concurrent world by Joe Armstrong. Here is the code:
-module(afile_server).
-export([start/1,loop/1]).
start(Dir) -> spawn(afile_server,loop,[Dir]).
loop(Dir) ->
receive
{Client, list_dir} ->
Client ! {self(), file:list_dir(Dir)};
{Client, {get_file, File}} ->
Full = filename:join(Dir,File),
Client ! {self(), file:read_file(Full)}
end,
loop(Dir).
Then I run this in the shell:
c(afile_server).
FileServer = spawn(afile_server, start, ".").
FileServer ! {self(), list_dir}.
receive X -> X end.
In the book a list of the files is returned as expected however in my shell it looks as if the program has frozen. Nothing gets returned yet the program is still running. I'm not familiar at all with erlang however I can understand how this should work.
I'm running this in Windows 7 64-bit. The directory is not empty as it contains a bunch of other erlang files.

So... what start/1 function is doing here? It's spawning a process which starts from loop/1 and you don't just run it in your shell but spawning it too! So you have a chain of two spawned processes and process under FileServer dies imiedietly because its only job is to spawn actual file server which pid is unknown to you.
Just change line:
FileServer = spawn(afile_server, start, ".").
to:
FileServer = afile_server:start(".").

The drawing below illustrates Lukasz explanation.

Run part of program inside Fortran code for a limited time

I wanted to run a code (or an external executable) for a specified amount of time. For example, in Fortran I can
call system('./run')
Is there a way I can restrict its run to let's say 10 seconds, for example as follows
call system('./run', 10)
I want to do it from inside the Fortran code, example above is for system command, but I want to do it also for some other subroutines of my code. for example,
call performComputation(10)
where performComputation will be able to run only for 10 seconds. The system it will run on is Linux.
thanks!

EDITED
Ah, I see - you want to call a part of the current program a limited time. I see a number of options for that...
Option 1
Modify the subroutines you want to run for a limited time so they take an additional parameter, which is the number of seconds they may run. Then modify the subroutine to get the system time at the start, and then in their processing loop get the time again and break out of the loop and return to the caller if the time difference exceeds the maximum allowed number of seconds.
On the downside, this requires you to change every subroutine. It will exit the subroutine cleanly though.
Option 2
Take advantage of a threading library - e.g. pthreads. When you want to call a subroutine with a timeout, create a new thread that runs alongside your main program in parallel and execute the subroutine inside that thread of execution. Then in your main program, sleep for 10 seconds and then kill the thread that is running your subroutine.
This is quite easy and doesn't require changes to all your subroutines. It is not that elegant in that it chops the legs off your subroutine at some random point, maybe when it is least expecting it.
Imagine time running down the page in the following example, and the main program actions are on the left and the subroutine actions are on the right.
MAIN SUBROUTINE YOUR_SUB
... something ..
... something ...
f_pthread_create(,,,YOUR_SUB,) start processing
sleep(10) ... calculate ...
... calculate ...
... calculate ...
f_pthread_kill()
... something ..
... something ...
Option 3
Abstract out the subroutines you want to call and place them into their own separate executables, then proceed as per my original answer below.
Whichever option you choose, you are going to have to think about how you get the results from the subroutine you are calling - will it store them in a file? Does the main program need to access them? Are they in global variables? The reason is that if you are going to follow options 2 or 3, there will not be a return value from the subroutine.
Original Answer
If you don't have timeout, you can do
call system('./run & sleep 10; kill $!')

Yes there is a way. take a look at the linux command timeout
# run command for 10 seconds and then send it SIGTERM kill message
# if not finished.
call system('timeout 10 ./run')
Example
# finishes in 10 seconds with a return code of 0 to indicate success.
sleep 10
# finishes in 1 second with a return code of `124` to indicate timed out.
timeout 1 sleep 10
You can also choose the type of kill signal you want to send by specifying the -s parameter. See man timeout for more info.

Go concurrency and channel confusion

I'm new to Go and have a problem understanding the concurrency and channel.
package main
import "fmt"
func display(msg string, c chan bool){
fmt.Println("display first message:", msg)
c <- true
}
func sum(c chan bool){
sum := 0
for i:=0; i < 10000000000; i++ {
sum++
}
fmt.Println(sum)
c <- true
}
func main(){
c := make(chan bool)
go display("hello", c)
go sum(c)
<-c
}
The output of the program is:
display first message: hello
10000000000
But I thought it should be only one line:
display first message: hello
So in the main function, <-c is blocking it and waits for the other two go rountines to send data to the channel. Once the main function receives the data from c, it should proceed and exit.
display and sum run simultaneously and sum takes longer so display should send true to c and the program should exit before sum finishes...
I'm not sure I understand it clearly. Could someone help me with this? Thank you!

The exact output of your program is not defined and depends on the scheduler. The scheduler can choose freely between all goroutines that are currently not blocked. It tries to run those goroutines concurrently by switching the current goroutine in very short time intervals so that the user gets the feeling that everything happens simultanously. In addition to that, it can also execute more than one goroutine in parallel on different CPUs (if you happen to have a multicore system and increase runtime.GOMAXPROCS). One situation that might lead to your output is:
main creates two goroutines
the scheduler chooses to switch to one of the new goroutines immediately and chooses display
display prints out the message and is blocked by the channel send (c <- true) since there isn't a receiver yet.
the scheduler chooses to run sum next
the sum is computed and printed on the screen
the scheduler chooses to not resume the sum goroutine (it has already used a fair amount of time) and continues with display
display sends the value to the channel
the scheduler chooses to run main next
main quits and all goroutines are destroyed
But that is just one possible execution order. There are many others and some of them will lead to a different output. If you want to print just the first result and quit the program afterwards, you should probably use a result chan string and change your main function to print fmt.Println(<-result).

How to fully utilise `lwt` in this case

Here is what I am going to do:
I have a list of task and I need to run them all every 1 hour (scheduling).
All those tasks are similar. for example, for one task, I need to download some data from a server (using http protocol and would take 5 - 8 seconds) and then do a computation on the data (would take 1 - 5 seconds).
I think I can use lwt to achieve these, but can't figure out the best way for efficiency.
For the task scheduling part, I can do like this (How to schedule a task in OCaml?):
let rec start () =
(Lwt_unix.sleep 1.) >>= (fun () -> print_endline "Hello, world !"; start ())
let _ = Lwt_main.run (start())
The questions come from the actual do_task part.
So a task involves http download and computation.
The http download part would have to wait for 5 to 8 seconds. If I really execute each task one by one, then it wastes the bandwidth and of course, I wish the download process of all tasks to be in parallel. So should I put this download part to lwt? and will lwt handle all the downloads in parallel?
By code, should I do like this?:
let content = function
| Some (_, body) -> Cohttp_lwt_unix.Body.string_of_body body
| _ -> return ""
let download task =
Cohttp_lwt_unix.Client.get ("http://dataserver/task?name="^task.name)
let get_data task =
(download task) >>= (fun response -> Lwt.return (Content response))
let do_task task =
(get_data task) >>= (fun data -> Lwt.return_unit (calculate data))
So, through the code above, will all tasks be executed in parallel, at least for the http download part?
For calculation part, will all calculations be executed in sequence?
Furthermore, can any one briefly describe the mechanism of lwt? Internally, what is the logic of light weight thread? Why can it handle IO in parallel?

To do parallel computation using lwt, you can check the lwt_list module, and especially iter_p.
val iter_p : ('a -> unit Lwt.t) -> 'a list -> unit Lwt.t
iter_p f l call the function f on each element of l, then waits for all the threads to terminate. For your purpose, it would look like :
let do_tasks tasks = List.iter_p do_task tasks
Assuming that "tasks" is a list of task.

Is F# really faster than Erlang at spawning and killing processes?

Updated: This question contains an error which makes the benchmark meaningless. I will attempt a better benchmark comparing F# and Erlang's basic concurrency functionality and inquire about the results in another question.
I am trying do understand the performance characteristics of Erlang and F#. I find Erlang's concurrency model very appealing but am inclined to use F# for interoperability reasons. While out of the box F# doesn't offer anything like Erlang's concurrency primitives -- from what I can tell async and MailboxProcessor only cover a small portion of what Erlang does well -- I've been trying to understand what is possible in F# performance wise.
In Joe Armstrong's Programming Erlang book, he makes the point that processes are very cheap in Erlang. He uses the (roughly) the following code to demonstrate this fact:
-module(processes).
-export([max/1]).
%% max(N)
%% Create N processes then destroy them
%% See how much time this takes
max(N) ->
statistics(runtime),
statistics(wall_clock),
L = for(1, N, fun() -> spawn(fun() -> wait() end) end),
{_, Time1} = statistics(runtime),
{_, Time2} = statistics(wall_clock),
lists:foreach(fun(Pid) -> Pid ! die end, L),
U1 = Time1 * 1000 / N,
U2 = Time2 * 1000 / N,
io:format("Process spawn time=~p (~p) microseconds~n",
[U1, U2]).
wait() ->
receive
die -> void
end.
for(N, N, F) -> [F()];
for(I, N, F) -> [F()|for(I+1, N, F)].
On my Macbook Pro, spawning and killing 100 thousand processes (processes:max(100000)) takes about 8 microseconds per processes. I can raise the number of processes a bit further, but a million seems to break things pretty consistently.
Knowing very little F#, I tried to implement this example using async and MailBoxProcessor. My attempt, which may be wrong, is as follows:
#r "System.dll"
open System.Diagnostics
type waitMsg =
| Die
let wait =
MailboxProcessor.Start(fun inbox ->
let rec loop =
async { let! msg = inbox.Receive()
match msg with
| Die -> return() }
loop)
let max N =
printfn "Started!"
let stopwatch = new Stopwatch()
stopwatch.Start()
let actors = [for i in 1 .. N do yield wait]
for actor in actors do
actor.Post(Die)
stopwatch.Stop()
printfn "Process spawn time=%f microseconds." (stopwatch.Elapsed.TotalMilliseconds * 1000.0 / float(N))
printfn "Done."
Using F# on Mono, starting and killing 100,000 actors/processors takes under 2 microseconds per process, roughly 4 times faster than Erlang. More importantly, perhaps, is that I can scale up to millions of processes without any apparent problems. Starting 1 or 2 million processes still takes about 2 microseconds per process. Starting 20 million processors is still feasible, but slows to about 6 microseconds per process.
I have not yet taken the time to fully understand how F# implements async and MailBoxProcessor, but these results are encouraging. Is there something I'm doing horribly wrong?
If not, is there some place Erlang will likely outperform F#? Is there any reason Erlang's concurrency primitives can't be brought to F# through a library?
EDIT: The above numbers are wrong, due to the error Brian pointed out. I will update the entire question when I fix it.

In your original code, you only started one MailboxProcessor. Make wait() a function, and call it with each yield. Also you are not waiting for them to spin up or receive the messages, which I think invalidates the timing info; see my code below.
That said, I have some success; on my box I can do 100,000 at about 25us each. After too much more, I think possibly you start fighting the allocator/GC as much as anything, but I was able to do a million too (at about 27us each, but at this point was using like 1.5G of memory).
Basically each 'suspended async' (which is the state when a mailbox is waiting on a line like
let! msg = inbox.Receive()
) only takes some number of bytes while it's blocked. That's why you can have way, way, way more asyncs than threads; a thread typically takes like a megabyte of memory or more.
Ok, here's the code I'm using. You can use a small number like 10, and --define DEBUG to ensure the program semantics are what is desired (printf outputs may be interleaved, but you'll get the idea).
open System.Diagnostics
let MAX = 100000
type waitMsg =
| Die
let mutable countDown = MAX
let mre = new System.Threading.ManualResetEvent(false)
let wait(i) =
MailboxProcessor.Start(fun inbox ->
let rec loop =
async {
#if DEBUG
printfn "I am mbox #%d" i
#endif
if System.Threading.Interlocked.Decrement(&countDown) = 0 then
mre.Set() |> ignore
let! msg = inbox.Receive()
match msg with
| Die ->
#if DEBUG
printfn "mbox #%d died" i
#endif
if System.Threading.Interlocked.Decrement(&countDown) = 0 then
mre.Set() |> ignore
return() }
loop)
let max N =
printfn "Started!"
let stopwatch = new Stopwatch()
stopwatch.Start()
let actors = [for i in 1 .. N do yield wait(i)]
mre.WaitOne() |> ignore // ensure they have all spun up
mre.Reset() |> ignore
countDown <- MAX
for actor in actors do
actor.Post(Die)
mre.WaitOne() |> ignore // ensure they have all got the message
stopwatch.Stop()
printfn "Process spawn time=%f microseconds." (stopwatch.Elapsed.TotalMilliseconds * 1000.0 / float(N))
printfn "Done."
max MAX
All this said, I don't know Erlang, and I have not thought deeply about whether there's a way to trim down the F# any more (though it's pretty idiomatic as-is).

Erlang's VM doesn't uses OS threads or process to switch to new Erlang process. It's VM simply counts function calls into your code/process and jumps to other VM's process after some (into same OS process and same OS thread).
CLR uses mechanics based on OS process and threads, so F# has much higher overhead cost for each context switch.
So answer to your question is "No, Erlang is much faster than spawning and killing processes".
P.S. You can find results of that practical contest interesting.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js