How do I get a TextIO.instream from a POSIX file descriptor? - sml

If I have a POSIX file descriptor of type Posix.FileSys.file_desc, how to I convert it into a TextIO.instream? I want to do the reverse of How do I get the file descriptor of a file opened using TextIO.openIn?

This'll do the trick. It requires a "name" for the instream, which the basis library claims is used for error messages shown to the user. So I would recommend using the name or path of the underlying file, if there is one.
fun fdToInstream (name: string, fd: Posix.FileSys.file_desc) : TextIO.instream =
let
val (flags, _) = Posix.IO.getfl fd
val isNonBlockMode = Posix.IO.O.anySet (Posix.IO.O.nonblock, flags)
val reader: TextIO.StreamIO.reader =
Posix.IO.mkTextReader
{ fd = fd
, name = name
, initBlkMode = not isNonBlockMode
}
val stream_ins: TextIO.StreamIO.instream =
TextIO.StreamIO.mkInstream (reader, "")
in
TextIO.mkInstream stream_ins
end

Related

How do I un-gzip a file without saving it?

I am new to rust and I am trying to port golang code that I had written previosuly. The go code basically downloaded files from s3 and directly (without writing to disk) ungziped the files and parsed them.
Currently the only solution I found is to save the gzipped files on disk then ungzip and parse them.
Perfect pipeline would be to directly ungzip and parse them.
How can I accomplish this?
const ENV_CRED_KEY_ID: &str = "KEY_ID";
const ENV_CRED_KEY_SECRET: &str = "KEY_SECRET";
const BUCKET_NAME: &str = "bucketname";
const REGION: &str = "us-east-1";
use anyhow::{anyhow, bail, Context, Result}; // (xp) (thiserror in prod)
use aws_sdk_s3::{config, ByteStream, Client, Credentials, Region};
use std::env;
use std::io::{Write};
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() -> Result<()> {
let client = get_aws_client(REGION)?;
let keys = list_keys(&client, BUCKET_NAME, "CELLDATA/year=2022/month=06/day=06/").await?;
println!("List:\n{}", keys.join("\n"));
let dir = Path::new("input/");
let key: &str = &keys[0];
download_file_bytes(&client, BUCKET_NAME, key, dir).await?;
println!("Downloaded {key} in directory {}", dir.display());
Ok(())
}
async fn download_file_bytes(client: &Client, bucket_name: &str, key: &str, dir: &Path) -> Result<()> {
// VALIDATE
if !dir.is_dir() {
bail!("Path {} is not a directory", dir.display());
}
// create file path and parent dir(s)
let mut file_path = dir.join(key);
let parent_dir = file_path
.parent()
.ok_or_else(|| anyhow!("Invalid parent dir for {:?}", file_path))?;
if !parent_dir.exists() {
create_dir_all(parent_dir)?;
}
file_path.set_extension("json");
// BUILD - aws request
let req = client.get_object().bucket(bucket_name).key(key);
// EXECUTE
let res = req.send().await?;
// STREAM result to file
let mut data: ByteStream = res.body;
let file = File::create(&file_path)?;
let Some(bytes)= data.try_next().await?;
let mut gzD = GzDecoder::new(&bytes);
let mut buf_writer = BufWriter::new( file);
while let Some(bytes) = data.try_next().await? {
buf_writer.write(&bytes)?;
}
buf_writer.flush()?;
Ok(())
}
fn get_aws_client(region: &str) -> Result<Client> {
// get the id/secret from env
let key_id = env::var(ENV_CRED_KEY_ID).context("Missing S3_KEY_ID")?;
let key_secret = env::var(ENV_CRED_KEY_SECRET).context("Missing S3_KEY_SECRET")?;
// build the aws cred
let cred = Credentials::new(key_id, key_secret, None, None, "loaded-from-custom-env");
// build the aws client
let region = Region::new(region.to_string());
let conf_builder = config::Builder::new().region(region).credentials_provider(cred);
let conf = conf_builder.build();
// build aws client
let client = Client::from_conf(conf);
Ok(client)
}
Your snippet doesn't tell where GzDecoder comes from, but I'll assume it's flate2::read::GzDecoder.
flate2::read::GzDecoder is already built in a way that it can wrap anything that implements std::io::Read:
GzDecoder::new expects an argument that implements Read => deflated data in
GzDecoder itself implements Read => inflated data out
Therefore, you can use it just like a BufReader: Wrap your reader and used the wrapped value in place:
use flate2::read::GzDecoder;
use std::fs::File;
use std::io::BufReader;
use std::io::Cursor;
fn main() {
let data = [0, 1, 2, 3];
// Something that implements `std::io::Read`
let c = Cursor::new(data);
// A dummy output
let mut out_file = File::create("/tmp/out").unwrap();
// Using the raw data would look like this:
// std::io::copy(&mut c, &mut out_file).unwrap();
// To inflate on the fly, "pipe" the data through the decoder, i.e. wrap the reader
let mut stream = GzDecoder::new(c);
// Consume the `Read`er somehow
std::io::copy(&mut stream, &mut out_file).unwrap();
}
playground
You don't mention what "and parse them" entails, but the same concept applies: If your parser can read from an impl Read (e.g. it can read from a std::fs::File), then it can also read directly from a GzDecoder.

persistent HTTP connection using cohttp library (or other)

It seems (based on wireshark), cohttp client closes its connection automatically after response to GET request was received.
Is there a way to keep this connection alive (to make it persistent)?
If no is there any other HTTP library to create persistent connections?
Looking at the code at github it doesn't look like there is such an option.
let call ?(ctx=default_ctx) ?headers ?(body=`Empty) ?chunked meth uri =
...
Net.connect_uri ~ctx uri >>= fun (conn, ic, oc) ->
let closefn () = Net.close ic oc in
...
read_response ~closefn ic oc meth
Where read_response is:
let read_response ~closefn ic oc meth =
...
match has_body with
| `Yes | `Unknown ->
let reader = Response.make_body_reader res ic in
let stream = Body.create_stream Response.read_body_chunk reader in
let closefn = closefn in
Lwt_stream.on_terminate stream closefn;
let gcfn st = closefn () in
Gc.finalise gcfn stream;
let body = Body.of_stream stream in
return (res, body)
If I am reading this correctly the connection will close as soon as the GC cleans up the stream.

How to write to a file in sml

I'm trying to write a string to a file, however I cant seem to get it working, I've read though all the questions like this on stack overflow but none seem to be addressing the issue. I'm from an imperative background so usually I would, write to file, then close the output stream... However this doest work in sml.
fun printToFile pathOfFile str = printToOutstream (TextIO.openOut pathOfFile) str;
//Here is where the issues start coming in
fun printToOutStream outstream str = TextIO.output (outstream, str)
TextIO.closeOut outstream
//will not work. I've also tried
fun printToOutStream outstream str = let val os = outStream
in
TextIO.output(os,str)
TextIO.closeOut os
end;
//also wont work.
I know I need to write to the file and close the output stream but I cant figure out how to do it. Using my "sml brain" I'm telling myself I need to call the function recursively stepping towards something and then when I reach it close the output stream... but again I don't have a clue how I would do this.
You're almost there. Between the in and end you need to delimit the expressions by the semicolon. In SML ; is a sequence-operator. It evaluates expressions in turn and then only returns the value of the last one.
If you already have an outstream open, use:
fun printToOutStream outstream str = let val os = outstream
in
TextIO.output(os,str);
TextIO.closeOut os
end;
Used like thus:
- val os = TextIO.openOut "C:/programs/testfile.txt";
val os = - : TextIO.outstream
- printToOutStream os "Hello SML IO";
val it = () : unit
Then when I go to "C:/programs" I see a brand new text file that looks like this:
If you always read/write complete files at once, you could create some helper functions for this, like:
fun readFile filename =
let val fd = TextIO.openIn filename
val content = TextIO.inputAll fd handle e => (TextIO.closeIn fd; raise e)
val _ = TextIO.closeIn fd
in content end
fun writeFile filename content =
let val fd = TextIO.openOut filename
val _ = TextIO.output (fd, content) handle e => (TextIO.closeOut fd; raise e)
val _ = TextIO.closeOut fd
in () end

Replace a character with another in python

I am using the following code in python to receive data from a device.
from socket import *
HOST = 'localhost'
PORT = 30003 #our port from before
ADDR = (HOST,PORT)
BUFSIZE = 4096
sock = socket( AF_INET,SOCK_STREAM)
sock.connect((ADDR))
def readlines(sock, recv_buffer=4096, delim='\n'):
buffer = ''
data = True
while data:
data = sock.recv(recv_buffer)
buffer += data
while buffer.find(delim) != -1:
line, buffer = buffer.split('\n', 1)
yield line.strip('\r\n')
return
for line in readlines(sock):
print line
And I am getting the output in following format:
MSG,2,0,0,8963AB,0,2015/02/06,15:03:27.380,2015/02/06,15:03:27.380,,0,7.5,343.0,10.152763,76.390593,,,,,,-1 MSG,2,0,0,8963AB,0,2015/02/06,15:03:28.630,2015/02/06,15:03:28.630,,0,7.5,348.0,10.152809,76.390593,,,,,,-1
I should get the output in following format:
'MSG','2','0','0','8963AB','0','2015/02/06','15:03:27.380','2015/02/06','15:03:27.380','','0','7.5','343.0','10.152763','76.390593','','','','','','-1'
Split the string with , as delimiter. Then append the elements to a list
appended_list = [x for x in original_list.split(',')]
Now the appended list will contain the strings like you wish

Communication client-server with OCaml marshalled data

I want to do a client-side js_of_ocaml application with a server in OCaml, with contraints described below, and I would like to know if the approach below is right or if there is a more efficient one. The server can sometimes send large quantities of data (> 30MB).
In order to make the communication between client and server safer and more efficient, I am sharing a type t in a .mli file like this :
type client_to_server =
| Say_Hello
| Do_something_with of int
type server_to_client =
| Ack
| Print of string * int
Then, this type is marshalled into a string and sent on the network. I am aware that on the client side, some types are missing (Int64.t).
Also, in a XMLHTTPRequest sent by the client, we want to receive more than one marshalled object from the server, and sometimes in a streaming mode (ie: process the marshal object received (if possible) during the loading state of the request, and not only during the done state).
These constraints force us to use the field responseText of the XMLHTTPRequest with the content-type application/octet-stream.
Moreover, when we get back the response from responseText, an encoding conversion is made because JavaScript's string are in UTF-16. But the marshalled object being binary data, we do what is necessary in order to retrieve our binary data (by overriding the charset with x-user-defined and by applying a mask on each character of the responseText string).
The server (HTTP server in OCaml) is doing something simple like this:
let process_request req =
let res = process_response req in
let s = Marshal.to_string res [] in
send s
However, on the client side, the actual JavaScript primitive of js_of_ocaml for caml_marshal_data_size needs an MlString. But in streaming mode, we don't want to convert the javascript's string in a MlString (which can iter on the full string), we prefer to do the size verification and unmarshalling (and the application of the mask for the encoding problem) only on the bytes read. Therefore, I have writen my own marshal primitives in javascript.
The client code for processing requests and responses is:
external marshal_total_size : Js.js_string Js.t -> int -> int = "my_marshal_total_size"
external marshal_from_string : Js.js_string Js.t -> int -> 'a = "my_marshal_from_string"
let apply (f:server_to_client -> unit) (str:Js.js_string Js.t) (ofs:int) : int =
let len = str##length in
let rec aux pos =
let tsize =
try Some (pos + My_primitives.marshal_total_size str pos)
with Failure _ -> None
in
match tsize with
| Some tsize when tsize <= len ->
let data = My_primitives.marshal_from_string str pos in
f data;
aux tsize
| _ -> pos
in
aux ofs
let reqcallback f req ofs =
match req##readyState, req##status with
| XmlHttpRequest.DONE, 200 ->
ofs := apply f req##responseText !ofs
| XmlHttpRequest.LOADING, 200 ->
ignore (apply f req##responseText !ofs)
| _, 200 -> ()
| _, i -> process_error i
let send (f:server_to_client -> unit) (order:client_to_server) =
let order = Marshal.to_string order [] in
let msg = Js.string (my_encode order) in (* Do some stuff *)
let req = XmlHttpRequest.create () in
req##_open(Js.string "POST", Js.string "/kernel", Js._true);
req##setRequestHeader(Js.string "Content-Type",
Js.string "application/octet-stream");
req##onreadystatechange <- Js.wrap_callback (reqcallback f req (ref 0));
req##overrideMimeType(Js.string "application/octet-stream; charset=x-user-defined");
req##send(Js.some msg)
And the primitives are:
//Provides: my_marshal_header_size
var my_marshal_header_size = 20;
//Provides: my_int_of_char
function my_int_of_char(s, i) {
return (s.charCodeAt(i) & 0xFF); // utf-16 char to 8 binary bit
}
//Provides: my_marshal_input_value_from_string
//Requires: my_int_of_char, caml_int64_float_of_bits, MlStringFromArray
//Requires: caml_int64_of_bytes, caml_marshal_constants, caml_failwith
var my_marshal_input_value_from_string = function () {
/* Quite the same thing but with a custom Reader which
will call my_int_of_char for each byte read */
}
//Provides: my_marshal_data_size
//Requires: caml_failwith, my_int_of_char
function my_marshal_data_size(s, ofs) {
function get32(s,i) {
return (my_int_of_char(s, i) << 24) | (my_int_of_char(s, i + 1) << 16) |
(my_int_of_char(s, i + 2) << 8) | (my_int_of_char(s, i + 3));
}
if (get32(s, ofs) != (0x8495A6BE|0))
caml_failwith("MyMarshal.data_size");
return (get32(s, ofs + 4));
}
//Provides: my_marshal_total_size
//Requires: my_marshal_data_size, my_marshal_header_size, caml_failwith
function my_marshal_total_size(s, ofs) {
if ( ofs < 0 || ofs > s.length - my_marshal_header_size )
caml_failwith("Invalid argument");
else return my_marshal_header_size + my_marshal_data_size(s, ofs);
}
Is this the most efficient way to transfer large OCaml values from server to client, or what would time- and space-efficient alternatives be?
Have you try to use EventSource https://developer.mozilla.org/en-US/docs/Web/API/EventSource
You could stream json data instead of marshaled data.
Json.unsafe_input should be faster than unmarshal.
class type eventSource =
object
method onmessage :
(eventSource Js.t, event Js.t -> unit) Js.meth_callback
Js.writeonly_prop
end
and event =
object
method data : Js.js_string Js.t Js.readonly_prop
method event : Js.js_string Js.t Js.readonly_prop
end
let eventSource : (Js.js_string Js.t -> eventSource Js.t) Js.constr =
Js.Unsafe.global##_EventSource
let send (f:server_to_client -> unit) (order:client_to_server) url_of_order =
let url = url_of_order order in
let es = jsnew eventSource(Js.string url) in
es##onmessage <- Js.wrap_callback (fun e ->
let d = Json.unsafe_input (e##data) in
f d);
()
On the server side, you then need to rely on deriving_json http://ocsigen.org/js_of_ocaml/2.3/api/Deriving_Json to serialize your data
type server_to_client =
| Ack
| Print of string * int
deriving (Json)
let process_request req =
let res = process_response req in
let data = Json_server_to_client.to_string res in
send data
note1: Deriving_json serialize ocaml value to json using the internal representation of values in js_of_ocaml. Json.unsafe_input is a fast deserializer for Deriving_json that rely on browser-native JSON support.
note2: Deriving_json and Json.unsafe_input take care of ocaml string encoding