Communication client-server with OCaml marshalled data - ocaml

I want to do a client-side js_of_ocaml application with a server in OCaml, with contraints described below, and I would like to know if the approach below is right or if there is a more efficient one. The server can sometimes send large quantities of data (> 30MB).
In order to make the communication between client and server safer and more efficient, I am sharing a type t in a .mli file like this :
type client_to_server =
| Say_Hello
| Do_something_with of int
type server_to_client =
| Ack
| Print of string * int
Then, this type is marshalled into a string and sent on the network. I am aware that on the client side, some types are missing (Int64.t).
Also, in a XMLHTTPRequest sent by the client, we want to receive more than one marshalled object from the server, and sometimes in a streaming mode (ie: process the marshal object received (if possible) during the loading state of the request, and not only during the done state).
These constraints force us to use the field responseText of the XMLHTTPRequest with the content-type application/octet-stream.
Moreover, when we get back the response from responseText, an encoding conversion is made because JavaScript's string are in UTF-16. But the marshalled object being binary data, we do what is necessary in order to retrieve our binary data (by overriding the charset with x-user-defined and by applying a mask on each character of the responseText string).
The server (HTTP server in OCaml) is doing something simple like this:
let process_request req =
let res = process_response req in
let s = Marshal.to_string res [] in
send s
However, on the client side, the actual JavaScript primitive of js_of_ocaml for caml_marshal_data_size needs an MlString. But in streaming mode, we don't want to convert the javascript's string in a MlString (which can iter on the full string), we prefer to do the size verification and unmarshalling (and the application of the mask for the encoding problem) only on the bytes read. Therefore, I have writen my own marshal primitives in javascript.
The client code for processing requests and responses is:
external marshal_total_size : Js.js_string Js.t -> int -> int = "my_marshal_total_size"
external marshal_from_string : Js.js_string Js.t -> int -> 'a = "my_marshal_from_string"
let apply (f:server_to_client -> unit) (str:Js.js_string Js.t) (ofs:int) : int =
let len = str##length in
let rec aux pos =
let tsize =
try Some (pos + My_primitives.marshal_total_size str pos)
with Failure _ -> None
in
match tsize with
| Some tsize when tsize <= len ->
let data = My_primitives.marshal_from_string str pos in
f data;
aux tsize
| _ -> pos
in
aux ofs
let reqcallback f req ofs =
match req##readyState, req##status with
| XmlHttpRequest.DONE, 200 ->
ofs := apply f req##responseText !ofs
| XmlHttpRequest.LOADING, 200 ->
ignore (apply f req##responseText !ofs)
| _, 200 -> ()
| _, i -> process_error i
let send (f:server_to_client -> unit) (order:client_to_server) =
let order = Marshal.to_string order [] in
let msg = Js.string (my_encode order) in (* Do some stuff *)
let req = XmlHttpRequest.create () in
req##_open(Js.string "POST", Js.string "/kernel", Js._true);
req##setRequestHeader(Js.string "Content-Type",
Js.string "application/octet-stream");
req##onreadystatechange <- Js.wrap_callback (reqcallback f req (ref 0));
req##overrideMimeType(Js.string "application/octet-stream; charset=x-user-defined");
req##send(Js.some msg)
And the primitives are:
//Provides: my_marshal_header_size
var my_marshal_header_size = 20;
//Provides: my_int_of_char
function my_int_of_char(s, i) {
return (s.charCodeAt(i) & 0xFF); // utf-16 char to 8 binary bit
}
//Provides: my_marshal_input_value_from_string
//Requires: my_int_of_char, caml_int64_float_of_bits, MlStringFromArray
//Requires: caml_int64_of_bytes, caml_marshal_constants, caml_failwith
var my_marshal_input_value_from_string = function () {
/* Quite the same thing but with a custom Reader which
will call my_int_of_char for each byte read */
}
//Provides: my_marshal_data_size
//Requires: caml_failwith, my_int_of_char
function my_marshal_data_size(s, ofs) {
function get32(s,i) {
return (my_int_of_char(s, i) << 24) | (my_int_of_char(s, i + 1) << 16) |
(my_int_of_char(s, i + 2) << 8) | (my_int_of_char(s, i + 3));
}
if (get32(s, ofs) != (0x8495A6BE|0))
caml_failwith("MyMarshal.data_size");
return (get32(s, ofs + 4));
}
//Provides: my_marshal_total_size
//Requires: my_marshal_data_size, my_marshal_header_size, caml_failwith
function my_marshal_total_size(s, ofs) {
if ( ofs < 0 || ofs > s.length - my_marshal_header_size )
caml_failwith("Invalid argument");
else return my_marshal_header_size + my_marshal_data_size(s, ofs);
}
Is this the most efficient way to transfer large OCaml values from server to client, or what would time- and space-efficient alternatives be?

Have you try to use EventSource https://developer.mozilla.org/en-US/docs/Web/API/EventSource
You could stream json data instead of marshaled data.
Json.unsafe_input should be faster than unmarshal.
class type eventSource =
object
method onmessage :
(eventSource Js.t, event Js.t -> unit) Js.meth_callback
Js.writeonly_prop
end
and event =
object
method data : Js.js_string Js.t Js.readonly_prop
method event : Js.js_string Js.t Js.readonly_prop
end
let eventSource : (Js.js_string Js.t -> eventSource Js.t) Js.constr =
Js.Unsafe.global##_EventSource
let send (f:server_to_client -> unit) (order:client_to_server) url_of_order =
let url = url_of_order order in
let es = jsnew eventSource(Js.string url) in
es##onmessage <- Js.wrap_callback (fun e ->
let d = Json.unsafe_input (e##data) in
f d);
()
On the server side, you then need to rely on deriving_json http://ocsigen.org/js_of_ocaml/2.3/api/Deriving_Json to serialize your data
type server_to_client =
| Ack
| Print of string * int
deriving (Json)
let process_request req =
let res = process_response req in
let data = Json_server_to_client.to_string res in
send data
note1: Deriving_json serialize ocaml value to json using the internal representation of values in js_of_ocaml. Json.unsafe_input is a fast deserializer for Deriving_json that rely on browser-native JSON support.
note2: Deriving_json and Json.unsafe_input take care of ocaml string encoding

Related

How do I un-gzip a file without saving it?

I am new to rust and I am trying to port golang code that I had written previosuly. The go code basically downloaded files from s3 and directly (without writing to disk) ungziped the files and parsed them.
Currently the only solution I found is to save the gzipped files on disk then ungzip and parse them.
Perfect pipeline would be to directly ungzip and parse them.
How can I accomplish this?
const ENV_CRED_KEY_ID: &str = "KEY_ID";
const ENV_CRED_KEY_SECRET: &str = "KEY_SECRET";
const BUCKET_NAME: &str = "bucketname";
const REGION: &str = "us-east-1";
use anyhow::{anyhow, bail, Context, Result}; // (xp) (thiserror in prod)
use aws_sdk_s3::{config, ByteStream, Client, Credentials, Region};
use std::env;
use std::io::{Write};
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() -> Result<()> {
let client = get_aws_client(REGION)?;
let keys = list_keys(&client, BUCKET_NAME, "CELLDATA/year=2022/month=06/day=06/").await?;
println!("List:\n{}", keys.join("\n"));
let dir = Path::new("input/");
let key: &str = &keys[0];
download_file_bytes(&client, BUCKET_NAME, key, dir).await?;
println!("Downloaded {key} in directory {}", dir.display());
Ok(())
}
async fn download_file_bytes(client: &Client, bucket_name: &str, key: &str, dir: &Path) -> Result<()> {
// VALIDATE
if !dir.is_dir() {
bail!("Path {} is not a directory", dir.display());
}
// create file path and parent dir(s)
let mut file_path = dir.join(key);
let parent_dir = file_path
.parent()
.ok_or_else(|| anyhow!("Invalid parent dir for {:?}", file_path))?;
if !parent_dir.exists() {
create_dir_all(parent_dir)?;
}
file_path.set_extension("json");
// BUILD - aws request
let req = client.get_object().bucket(bucket_name).key(key);
// EXECUTE
let res = req.send().await?;
// STREAM result to file
let mut data: ByteStream = res.body;
let file = File::create(&file_path)?;
let Some(bytes)= data.try_next().await?;
let mut gzD = GzDecoder::new(&bytes);
let mut buf_writer = BufWriter::new( file);
while let Some(bytes) = data.try_next().await? {
buf_writer.write(&bytes)?;
}
buf_writer.flush()?;
Ok(())
}
fn get_aws_client(region: &str) -> Result<Client> {
// get the id/secret from env
let key_id = env::var(ENV_CRED_KEY_ID).context("Missing S3_KEY_ID")?;
let key_secret = env::var(ENV_CRED_KEY_SECRET).context("Missing S3_KEY_SECRET")?;
// build the aws cred
let cred = Credentials::new(key_id, key_secret, None, None, "loaded-from-custom-env");
// build the aws client
let region = Region::new(region.to_string());
let conf_builder = config::Builder::new().region(region).credentials_provider(cred);
let conf = conf_builder.build();
// build aws client
let client = Client::from_conf(conf);
Ok(client)
}
Your snippet doesn't tell where GzDecoder comes from, but I'll assume it's flate2::read::GzDecoder.
flate2::read::GzDecoder is already built in a way that it can wrap anything that implements std::io::Read:
GzDecoder::new expects an argument that implements Read => deflated data in
GzDecoder itself implements Read => inflated data out
Therefore, you can use it just like a BufReader: Wrap your reader and used the wrapped value in place:
use flate2::read::GzDecoder;
use std::fs::File;
use std::io::BufReader;
use std::io::Cursor;
fn main() {
let data = [0, 1, 2, 3];
// Something that implements `std::io::Read`
let c = Cursor::new(data);
// A dummy output
let mut out_file = File::create("/tmp/out").unwrap();
// Using the raw data would look like this:
// std::io::copy(&mut c, &mut out_file).unwrap();
// To inflate on the fly, "pipe" the data through the decoder, i.e. wrap the reader
let mut stream = GzDecoder::new(c);
// Consume the `Read`er somehow
std::io::copy(&mut stream, &mut out_file).unwrap();
}
playground
You don't mention what "and parse them" entails, but the same concept applies: If your parser can read from an impl Read (e.g. it can read from a std::fs::File), then it can also read directly from a GzDecoder.

different function type when using if statement in Rust

I'm new in rust and i need to make a small if statement on function option for example
use isahc::{
HttpClient,
config::{
RedirectPolicy,
VersionNegotiation,
SslOption},
prelude::*
};
use std::{
time::Duration
};
pub struct http {
pub timeout: u64
}
impl http {
pub fn send(&self) -> HttpClient {
let client =
HttpClient::builder()
.version_negotiation(VersionNegotiation::http11())
.redirect_policy(RedirectPolicy::None)
.timeout(Duration::from_secs(self.timeout));
.ssl_options(SslOption::DANGER_ACCEPT_INVALID_CERTS | SslOption::DANGER_ACCEPT_REVOKED_CERTS);
return client.build().unwrap();
}
}
fn main(){
let req = http{ timeout:"20".parse().unwrap()};
let test = req.send();
test.get("https://www.google.com");
}
now in my program the user will give me the options of the request (eg: follow redirects or not ) and this need if statement on these options so i tried to use it on this case but i always get a different function return type
impl http {
pub fn send(&self) -> HttpClient {
let client =
HttpClient::builder()
.version_negotiation(VersionNegotiation::http11())
.redirect_policy(RedirectPolicy::None)
.ssl_options(SslOption::DANGER_ACCEPT_INVALID_CERTS | SslOption::DANGER_ACCEPT_REVOKED_CERTS);
if 1 == 1 {
client.timeout(Duration::from_secs(self.timeout));
}
return client.build().unwrap();
}
}
Cargo Output
warning: type `http` should have an upper camel case name
--> src/sender.rs:14:12
|
14 | pub struct http {
| ^^^^ help: convert the identifier to upper camel case: `Http`
|
= note: `#[warn(non_camel_case_types)]` on by default
error[E0382]: use of moved value: `client`
--> src/sender.rs:32:16
|
24 | let client =
| ------ move occurs because `client` has type `HttpClientBuilder`, which does not implement the `Copy` trait
...
30 | client.timeout(Duration::from_secs(self.timeout));
| ------ value moved here
31 | }
32 | return client.build().unwrap();
| ^^^^^^ value used here after move
so what I'm doing wrong ?
i tired to change the function type exception but i can't use the function of the class client.get() for example
this a clear example in python for explain what i need to do
options : dict = {
"redirects": False,
"ssl_path":"~/cer.pem"
}
def send(opts):
# not real httplib !
r = httplib.get("http://stackoverflow.com")
if opts.get('redirects') == True:
r.redirects = True
if opts.get('cert_path',''):
r.ssl = opts.get('cert_path')
return r.send()
def main():
send(options)
Thanks
Since this is a builder pattern, each function call consumes the builder and returns it back. So you need to capture client from the return value of the timeout function in order to continue using it. Note that you also need to make client mutable.
Something like
impl http {
pub fn send(&self) -> HttpClient {
let mut client =
HttpClient::builder()
.version_negotiation(VersionNegotiation::http11())
.redirect_policy(RedirectPolicy::None)
.ssl_options(SslOption::DANGER_ACCEPT_INVALID_CERTS | SslOption::DANGER_ACCEPT_REVOKED_CERTS);
if 1 == 1 {
client = client.timeout(Duration::from_secs(self.timeout));
}
return client.build().unwrap();
}
}

Updating a nested record with new data in elm

I have two pieces of JSON that I've successfully decoded sequentially. I would like to take the new html_fragment and update my existing html_fragment. Generally this would be simple but my data structure is giving me difficulties:
type PostDataContainer
= PostDataContainer PostData
type alias PostData =
{ title : String
, comments : List Comment
}
type alias Comment =
{ comment_id : Int
, html_fragment : String
}
type alias CommentHtml =
{ id : Int
, html_fragment : String
}
I've just gotten CommentHtml and would like to update the existing html_fragment in Comment. This is what I have so far:
MergeCommentHtml commentHtmlData ->
case commentHtmlData of
Err err ->
Debug.log ("Error decoding CommentHtmlData" ++ toString err)
( mdl, Cmd.none )
Ok commentHtml ->
case mdl.maybePostDataContainer of
Just (PostDataContainer postData) ->
let
updatedCommentData = -- I dont know how to calculate this?
in
( { mdl | postData = { postData | comments = updatedCommentData } }, Cmd.none )
Note that commentHtml here is a List CommentHtml. Any thoughts on how to update my old comment.html_fragment with the new values in commentHtml?
Option 1:
just decode the data as it stands. When it's time to display it, arrange it appropriately via some function you write like rawJsonDataToNicerData.
Option 2:
Suppose you implement the following function:
-- given a new comment, and some PostData, return the new version of the PostData
updateData : CommentHtml -> PostData -> PostData
-- so now, assuming we can decode a CommentHtml with commentHtmlDeocder
-- we can do the following
dataUpdaterDecoder : Decoder (PostData -> PostData)
dataUpdaterDecoder
commentHtmlDecoder |> Decoder.andThen (\commentHtml -> updateData commentHtml)
Now wherever we were going to decode a commentHtmlDeocder we can decode a dataUpdaterDecoder instead, and use a bunch of these to update our data.
Here is an example of a relational data decoder in action using the idea above:
https://ellie-app.com/3KWmyJmMrDsa1
Given that commentHtmlData is a List according to a comment, I think the easiest approach is to convert it to a Dict keyed by id, then map over the existing comments looking for the comment_id in the dict. If it exists, replace html_fragment, if not then return the original unmodified:
let
commentHtmlDict =
commentHtmlData
|> List.map (\c -> (c.id, c))
|> Dict.fromList
updatedCommentData =
postData.comments
|> List.map (\comment ->
case Dict.get comment.comment_id commentHtmlDict of
Just commentHtml ->
{ comment | html_fragment = commentHtml.html_fragment }
Nothing ->
comment
)

persistent HTTP connection using cohttp library (or other)

It seems (based on wireshark), cohttp client closes its connection automatically after response to GET request was received.
Is there a way to keep this connection alive (to make it persistent)?
If no is there any other HTTP library to create persistent connections?
Looking at the code at github it doesn't look like there is such an option.
let call ?(ctx=default_ctx) ?headers ?(body=`Empty) ?chunked meth uri =
...
Net.connect_uri ~ctx uri >>= fun (conn, ic, oc) ->
let closefn () = Net.close ic oc in
...
read_response ~closefn ic oc meth
Where read_response is:
let read_response ~closefn ic oc meth =
...
match has_body with
| `Yes | `Unknown ->
let reader = Response.make_body_reader res ic in
let stream = Body.create_stream Response.read_body_chunk reader in
let closefn = closefn in
Lwt_stream.on_terminate stream closefn;
let gcfn st = closefn () in
Gc.finalise gcfn stream;
let body = Body.of_stream stream in
return (res, body)
If I am reading this correctly the connection will close as soon as the GC cleans up the stream.

different output with mono on linux as on visual studio on win7 with calling a webservice

I use the Exchange webservices to extract attachments from exchange mailserver.
When i call the code on linux with mono a certain text attachment contain some mixed-up strings.
like so
"sam winglin vz" becomes "sainglin vz" -> so it is missing "m w".
I see this about 3 times in a 150kb file. 3 bytes are missing in the linux output vs the windows output.
When i extract it from visual studio the text attachment is perfect.
It is like this example
Save attachments from exchange inbox
Any idea in what direction i should look to fix this?
Code:
#r "Microsoft.Exchange.WebServices.dll"
open Microsoft
open Microsoft.Exchange.WebServices.Data
open System
open System.Net
type PgzExchangeService(url,user,password) =
let service = new ExchangeService(ExchangeVersion.Exchange2007_SP1,
TimeZoneInfo.CreateCustomTimeZone("Central Standard Time",new TimeSpan(-6, 0, 0),"(GMT-06:00) Central Time (US & Canada)","Central Standard Time"))
do
ServicePointManager.ServerCertificateValidationCallback <- ( fun _ _ _ _ -> true )
service.Url <- new Uri(url)
service.Credentials <- new WebCredentials(user, password, "domain")
member this.Service with get() = service
member this.InboxItems = this.Service.FindItems(WellKnownFolderName.Inbox, new ItemView(10))
member this.GetFileAttachments ( item : Item ) =
let emailMessage =
EmailMessage.Bind( this.Service,
item.Id,
new PropertySet(BasePropertySet.IdOnly, ItemSchema.Attachments))
item, emailMessage.Attachments |> Seq.choose (fun attachment -> match box attachment with
| :? FileAttachment as x -> Some(x) | _ -> None)
let mailAtdomain = new PgzExchangeService("https://xx.xx.XX.XX/EWS/Exchange.asmx", "user", "passw")
let printsave (item : Item ,att : seq<FileAttachment>) =
if (Seq.length att) > 0 then
printfn "%A - saving %i attachments" item.Subject (Seq.length att)
att |> Seq.iter ( fun attachment -> printfn "%A" attachment.Name
attachment.Load(#"/tmp/test/" + attachment.Name ) )
// filter so we only have items with attachements and ...
let itemsWithAttachments = mailAtdomain.InboxItems
|> Seq.map mailAtdomain.GetFileAttachments
|> Seq.iter printsave
The code doesn't run on Windows with mono due to a bug in TimeZoneInfo
This sample code runs on linux but not on windows. because of the TimeZoneInfo bug.
But with this the code that works on linux to extract attachments.
Try csv attachments and see if the result is the same. i loose data ! about 3 bytes every somemany lines
mail me if you need the sample csv attachment that gives the problem
Here is a c# version that i used for testing. Running from VS2010 it works perfect but on linux with mono the attachment has wrong size , some bytes are missing?!.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.Exchange.WebServices.Data;
using System.Net;
namespace Exchange_SDP_Attachment_Extracter
{
public class PgzExchangeService
{
public void Extract()
{
ExchangeService service = new ExchangeService (ExchangeVersion.Exchange2007_SP1,TimeZoneInfo.Local);
service.Credentials = new NetworkCredential("user", "pass", "domain");
service.Url = new Uri("https://xx.xx.xx.xx/EWS/Exchange.asmx");
ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
FindItemsResults<Item> findResults = service.FindItems(WellKnownFolderName.Inbox, new ItemView(10));
foreach (Item item in findResults.Items)
{
EmailMessage e = EmailMessage.Bind
(service,
item.Id,
new PropertySet(BasePropertySet.IdOnly, ItemSchema.Attachments));
foreach ( Attachment att in e.Attachments )
{
if (att is FileAttachment)
{
FileAttachment fileAttachment = (FileAttachment)att;
fileAttachment.Load(#"/tmp/testsdp/" + fileAttachment.Name);
}
}
}
}
}
class Program
{
static void Main(string[] args)
{
PgzExchangeService pgz = new PgzExchangeService();
pgz.Extract();
}
}
}
My suggestion would be to try examining the text attachment with a hex editor. There has be something unusual about those three occurrences. You need to find out what those three lines have in common before any of us can recommend a course of action to you.