Open a downloadable file - ocaml

I use open_in to open a local file with its path:
let f = open_in "/Users/SoftTimur/file.txt" in
...
Now, I would like to open a downloadable file with its URL:
let f = open_in "http://caml.inria.fr/distrib/ocaml-4.02/ocaml-4.02-refman.txt" in
...
returns an error Fatal error: exception Sys_error("http://caml.inria.fr/distrib/ocaml-4.02/ocaml-4.02-refman.txt: No such file or directory").
Does anyone know which function I could use to open such a downloadable file? Do I have to first download it to local (how to do that by OCaml?)?

Well, there are plenty of libraries in OCaml that can deal with http protocol in particular and with network communications in general. None of them will provide a function of type string -> in_channel as the in_channel type is an abstraction owned by OCaml. The language doesn't allow us to create our own implementations of the channel type1.
The libraries, that I know and used are:
cohttp - asynchronous library for http client and servers;
ocurl - a binding to the libcurl;
ocamlnet - all things network and even much more;
Presumably there are others, please feel free to edit this answer and add them.
I personally prefer asynchronous monadic cohttp, but it is easier to start with ocamlnet, that is also an excellent library, that has lots of features. This is how to play with it in the OCaml toplevel:
# #use "topfind"
# #require "netclient";
# module Client = Nethttp_client.Convenience
# let ocamldoc = Client.http_get "http://caml.inria.fr/distrib/ocaml-4.02/ocaml-4.02-refman.txt";;
Before starting the playing, make sure that you installed it with
opam install ocamlnet
Footnotes:
at least in pure OCaml, it is possible to create it from C, but I doubt if someone will pursue in this direction, it doesn't worth.

In Ocaml this is an C fopen call, no way to do it with a http source scheme.
You will have to download the file first, use one of the usual download tools wget or curl which are used in your linux package manager as well. Sys.command is your friend to do this.
It would be not too hard writing a module that checks the file name for a "schema:" like prefix, and taking appropriate action.
Maybe look at the opam sources for inspiration?

Does anyone know which function I could use to open such a downloadable file?
Install the wget program for your Linux distro and use the Unix library when you compile and you can use:
let open_url_in = Unix.open_process_in("wget -O - " ^ url)
This runs the wget program with -O o asking it to output the downloaded file to stdout which open_process_in reads into your process as the resulting channel.
Do I have to first download it to local (how to do that by OCaml?)?
No.

Related

"Embedding" a folder into a C/C++ program

I have a script library stored in .../lib/ that I want to embed into my program. So far, that sounds simple: On Windows, I'd use Windows Resource Files - on MacOS, I'd put them into a Resource folder and use the proper API to access the current bundle and it's resources. On plain Linux, I am not too sure how to do it... But, I want to be cross-platform anyway.
Now, I know that there are tools like IncBin (https://github.com/graphitemaster/incbin) and alike, but they are best used for single files. What I have, however, might even require some kind of file system abstraction.
So here is the few guesses and estimates I did. I'd like to know if there is possibly a better solution - or others, in general.
Create a Zip file and use MiniZ in order to read it's contents off a char array. Basically, running the zip file through IncBin and passing it as a buffer to MiniZ to let me work on that.
Use an abstracted FS layer like PhysicsFS or TTVFS and add the possibility to work off a Zip file or any other kind of archive.
Are there other solutions? Thanks!
I had this same issue, and I solved it by locating the library relative to argv[0]. But that only works if you invoke the program by its absolute path -- i.e., not via $PATH in the shell. So I invoke my program by a one-line script in ~/bin, or any other directory that's in your search path:
exec /wherever/bin/program "$#"
When the program is run, argv[0] is set to "/wherever/bin/program", and it knows to look in "/wherever/lib" for the related scripts.
Of course if you're installing directly into standard locations, you can rely on the standard directory structure, such as /usr/local/bin/program for the executable and /etc/program for related scripts & config files. The technique above is just when you want to be able to install a whole bundle in an arbitrary place.
EDIT: If you don't want the one-line shell script, you can also say:
alias program=/wherever/bin/program

Optional dependencies with ocamlbuild

I have an OCaml project that is currently built using OCamlMake. I am not happy with the current build system since it leaves all build artefacts in the same directory as source files, plus it also requires to manually specify order of dependencies between modules. I would like to switch to a build system that does not suffer from these issues. I decided to give Oasis a try but have run into issues.
The problems arise from the fact that the project is built in a very specific way. It supports several different database backends (PostgreSQL, MySQL, SQLite). Currently, to compile a database backend user must install extra libraries required by that backend and enable it by setting an environment variable. This is how it looks in the Makefile:
ifdef MYSQL_LIBDIR
DB_CODE += mysql_database.ml
DB_AUXLIBS += $(MYSQL_LIBDIR)
DB_LIBS += mysql
endif
Notice that this also adds extra module to the list of compiled modules. The important bit is that there is no dependency (in a sense of module import) between any module reachable from the applications entry point and database backend module. What happens rather is that each database backend module contains top-level code that runs when a module is initiated and registers itself, using side-effects, with the main application.
I cannot get this setup to work with Oasis. I declared each of the database backend modules as a separate library that can be enabled for compilation with a flag:
Library mysql-backend
Path : .
Build $: flag(mysql)
Install : false
BuildTools : ocamlbuild
BuildDepends : main, mysql
Modules : Mysql_backend
However, I cannot figure out a way to tell Oasis to link the optional modules into the executable. I tried to figure out a way of doing this by modifying myocamlbuild.ml file but failed. Can I achieve this with the rule function described here?
If what I describe cannot be achieved with ocamlbuild, is there any ither tool that would do the job and avoid problems of OCamlMake?
Well, I guess that answers it: https://github.com/links-lang/links/pull/77 :)
I saw the question and started working on it before I noticed Drup's answer above. Below is a self-contained ocamlbuild solution that is essentially the same as Drup's.
open Ocamlbuild_plugin
let enable_plugin () =
let plugins = try string_list_of_file "plugin.config" with _ -> [] in
dep ["ocaml"; "link_with_plugin"; "byte"]
(List.map (fun p -> p^".cmo") plugins);
dep ["ocaml"; "link_with_plugin"; "native"]
(List.map (fun p -> p^".cmx") plugins);
()
let () = dispatch begin
function
| Before_rules -> enable_plugin ()
| _ -> ()
end
Using the tag link_with_plugin on an ocamlbuild target will make it depend on any module whose path (without extension) is listed in the file plugin.config. For example if you have plugins pluga.ml, plugb.ml and a file main.ml, then writing pluga plugb in plugin.config and having <main.{cmo,cmx}>: link_with_plugin will link both plugin modules in the main executable.
Unfortunately, this is beyond oasis capabilities. This limitation has nothing to do with ocamlbuild, it just because oasis author tried to keep it simple, and didn't provide optional dependencies as a feature.
As always, an extra level of indirection may solve your problem. What you need, is a configuration script (configure) that will generate _oasis file for you, depending on parameters, provided by a user.
For example, in our project we have a similar setup, i.e., multiple different backends, that might be chosen by a user during the configuration phase, with --{enable,disable}-<feature>. We achieved this by writing our own ./configure script that generate _oasis file, depending on configuration. The configuration script just concatenates the resulting _oasis files from pieces, described in oasis folder.
An alternative solution would be to use m4 or just cpp, and have an _oasis.in file, that is preprocessed.

there is any way to open and read a file over a SSH connection?

I have an access to some server where there is a lot of data. I can't copy the whole of data on my computer.
I can't compile on the server the program I want because the server doesn't have all libs I need.
I don't think that the server admin would be very happy to see me coming and asking to him to install some libs just for me...
So, I try to figure if there is a way to open a file like with,
FILE *fopen(const char *filename, const char *mode);
or
void std::ifstream::open(const char* filename, ios_base::openmode mode = ios_base::in);
but over a SSH connection. Then reading the file like I do for usual program.
both computer and server are running linux
I assume you are working on your Linux laptop and the remote machine is some supercomputer.
First non-technical advice: ask permission first to access the data remotely. In some workplaces you are not allowed to do that, even if it technically possible.
You could sort-of use libssh for that purpose, but you'll need some coding and read its documentation.
You could consider using some FUSE file system (on your laptop), e.g. some sshfs; you would then be able to access some supercomputer files as /sshfilesystem/foo.bar). It is probably the slowest solution, and probably not a very reliable one. I don't really recommend it.
You could ask permission to use NFS mounts.
Maybe you might consider some HTTPS access (if the remote computer has it for your files) using some HTTP/HTTPS client library like libcurl (or the other way round, some HTTP/HTTPS server library like libonion)
And you might (but ask permission first!) use some TLS connection (e.g. start manually a server like program on the remote supercomputer) perhaps thru OpenSSL or libgnutls
At last, you should consider installing (i.e. asking politely the installation on the remote supercomputer) or using some database software (e.g. a PostgreSQL or MariaDB or Redis or MongoDB server) on the remote computer and make your program become a database client application ...
BTW, things might be different if you access a few dozen of terabyte sized files in a random access (each run reading a few kilobytes inside them), or a million files, of which a given run access only a dozen of them with sequential reads, each file of a reasonable size (a few megabytes). In other words, DNA data, video films, HTML documents, source code, ... are all different cases!
Well, the answer to your question is no, as already stated several times (unless you think about implementing ssh yourself which is out of scope of sanity).
But as you also describe your real problem, it's probably just asking the wrong question, so -- looking for alternatives:
Alternative 1
Link the library you want to use statically to your binary. Say you want to link libfoo statically:
Make sure you have libfoo.a (the object archive of your library) in your library search path. Often, development packages for a library provided by your distribution already contain it, if not, compile the library yourself with options to enable the creation of the static library
Assuming the GNU toolchain, build your program with the following flags: -Wl,-Bstatic -lfoo -Wl,-Bdynamic (instead of just -lfoo)
Alternative 2
Create your binary as usual (linked against the dynamic library) and put that library (libfoo.so) e.g. in ~/lib on the server. Then run your binary there with LD_LIBRARY_PATH=~/lib ./a.out.
You can copy parts of file to your computer over SSH connection:
copy part of source file using dd command to temporary file
copy temporary file to your local box using scp or rsync
You can create a shell script to automate this if you need to do that multiple times.
Instead of fopen on a path, you can use popen on an ssh command. (Don't forget that FILE * streams obtained from popen are closed with pclose and not fclose).
You can simplify the interface by writing a function which wraps popen. The function accepts just the remote file name, and then generates the ssh command to fetch that file, properly escaping everything, like spaces in the file name, shell meta-characters and whatnot.
FILE *stream = popen("ssh user#host cat /path/to/remote/file", "r");
if (stream != 0) {
/* ... */
pclose(stream);
}
popen has some drawbacks because it processes a shell command. Because the argument to ssh is also a shell command that is processed on the remote end, it raises issues of double escaping: passing a command through as a shell command.
To do something more robust, you can create a pipe using pipe, then fork and exec* the ssh process, installing the write end of the pipe as its stdout, and use fdopen to create a FILE * stream on the reading end of the pipe in the parent process. This way, there is accurate control over the arguments which are handed to the process: at least locally, you're not running a shell command.
You can't directly(1) open a file over ssh with fopen() or ifstream::open. But you can leverage the existing ssh binary. Simply have your program read from stdin, and pipe the file to it via ssh:
ssh that_server cat /path/to/largefile | ./yourprogram
(1) Well, if you mount the remote system using sshfs you can access the files over ssh as if they were local.

OCaml - cannot install Core

My question is similar to this post:
OCaml: Can't run utop after installing it
I try to open the core library, and end up with the same problem:
$ open Core.Std
Couldn't get a file descriptor referring to the console
I have tried the following command with correct quote mark:
eval `opam config env`
But nothing happens and the problem persists despite I have installed core. I also tried to follow installation instructions on this webpage (https://github.com/realworldocaml/book/wiki/Installation-Instructions), but it does not mention this strange problem.
I am using Ubuntu 24 in virtual machine by Hyper-V on windows 8. Another question I want to ask is many webpages like (http://kwangyulseo.com/2014/03/04/installing-ocamlopamutopcore-library-on-ubuntu-saucy/) suggested to compile certain lines of command in ".ocamlinit file". But I do not know where to find this file and how to modify it in linux. I have been an windows user for most of the time in my life.
Sorry if the question's level is too low.
Oh, man. open Core.Std is not a bash command. You need to open OCaml toplevel (i.e. to execute utop or ocaml) and execute this command there. Probably it is not written explicitly in manual. If you see
#use "topfind";;
#thread;;
#camlp4o;;
#require "core.top";;
#require "core.syntax";;
It means that you should enter (or add to .ocamlinit) this by hand. I mean that you should enter # too. So, if you will use ocaml you will see two #. It's normal.
About OCaml init file. As you see they refer to it as ~/.ocamlinit. Character ~ means home directory in POSIX systems. So you will probably need some GUI text editor (gedit or kwrite, for example), create new file, put content there ans save it you home directory. N.B. POSIX systems have no concept of file extension, i.e. leading dot is part of file name.

using OCaml Batteries Included as a vanilla cma

I am a bit frustrated and confused by the OCaml Batteries Included concept and the way most tutorials I could find proceed. Before I get to use "productivity" tools like GODI or replace invocations of ocamlc with ocamlfind batteries/ocamlc (which is, at this point, too magical for me) I was hoping to be able to simply use OCaml Batteries Included core set of libraries like any other library. To that end I downloaded the latest source from git (head hash: 9f94ecb) and did a make all. I noticed that I got three .cma libraries at ./_build/src/ together with 102 .cmi files in the same directory. So I assumed that compiling with the -I switch pointing to that directory and linking with one of the three .cma libraries found there would be enough without needing to "install" the Batteries or use the platform tools. To test that, I set out to produce an executable for the following simple program I found somewhere:
(* file euler001.ml *)
open BatEnum
open BatPervasives
let main () =
(1--999)
|> BatEnum.filter (fun i -> i mod 3 = 0 || i mod 5 == 0)
|> BatEnum.reduce (+)
|> BatInt.print stdout
let _ = main ()
I was able to compile it with:
ocamlc -c -I ../batteries-included/_build/src/ euler001.ml
but when I tried to link with:
ocamlc -o euler001 unix.cma nums.cma ../batteries-included/_build/src/batteries.cma euler001.cmo
I got:
File "_none_", line 1, characters 0-1:
Error: Error while linking ../batteries-included/_build/src/batteries.cma(BatBigarray):
The external function `caml_ba_reshape' is not available
The nums.cma and unix.cma I added at the command line because the linker complained about missing references to undefined global Big_int and (when that was added) to Unix. But after these two modules were added on the linker invocation I received the last message (on the missing external function 'caml_ba_reshape') which proved blocking for me. So I would like to ask:
how does one proceed in this particular case?
how does one proceed in the general case (i.e. when the linker complains about a missing external function)
is it viable to use Batteries Included in this fashion? Before I rely on platform tools I want to have the assurance that I can use the underlying artifacts (cma and cmi/mli files) with the standard OCaml compiler and linker if I run into problems.
caml_ba_reshape is, as you could guess from the name but I agree it's not obvious, a primitive of the Bigarray module. You should add bigarray.cma in your compilation command, before batteries.cma which depends on it.
There is a reason why it is advised to use ocamlfind, which is precisely used to abstract over those dependencies. I don't think you are supposed to use ocamlfind batteries/ocamlc, but rather ocamlfind ocamlc -package batteries. If you insist on using the compiler without such support, then indeed you have to compile manually -- I understand your frustration, but I hope you also understand that it is intrisic to any sufficiently sophisticated OCaml library, and that it comes only from your self-imposed constraints.
how does one proceed in the general case (i.e. when the linker complains about a missing external function
You have to know or guess where the primitive comes from. Looking at the META file provided by the library, which is used to inform ocamlfind of the dependencies, may help you. You can use the tool ocamlobjinfo to know which primitive a .cma provides, if you want to check your assumption. (Or better, use ocamlfind to spit the correct compile command, see below.)
is it viable to use Batteries Included in this fashion?
Compiling "by hand" is reasonable if you insist. Working only in the source repository, without installing the library, is not. It's easy to keep doing what you do after an install, just replace your -I ... by the chosen install path.
Before I rely on platform tools I want to have the assurance that I can use the underlying artifacts (cma and cmi/mli files) with the standard OCaml compiler and linker if I run into problems.
ocamlfind is not (only) a platform tool. It is the way to use third-party ocaml libraries, period. It should be a standard on any ocaml-using platform. That it does not come with INRIA's distribution is an historical detail.
You can ask ocamlfind to show you its invocation of the bare compilers:
% ocamlfind ocamlc -linkpkg -package batteries t.ml -o test -verbose
Effective set of compiler predicates:
pkg_unix,pkg_num.core,pkg_num,pkg_bigarray,pkg_str,pkg_batteries,autolink,byte
+ ocamlc.opt -o test -verbose -I /usr/local/lib/ocaml/3.12.1/batteries /usr/lib/ocaml/unix.cma /usr/lib/ocaml/nums.cma /usr/lib/ocaml/bigarray.cma /usr/lib/ocaml/str.cma /usr/local/lib/ocaml/3.12.1/batteries/batteries.cma t.ml
I don't want to throw stones at you. The landscape of OCaml tools, beside the minimal nutshell of what's provided by the source distribution, is quite sparse and lack a coherent point of entry. With time I've grown used to those tools and it's quite natural to use them, but I understand there is some cost of entry that we should try to lower.
PS: any advice on how to improve batteries documentation is warmly welcome. Patches to add things to the documentation or fix it are even better. batteries-devel#lists.forge.ocamlcore.org is the place to go.