Creating a serializable fixed size char array in F# - casting

I am dealing with a very large amount of data I need to load / save to disk where speed is the key.
I wrote this code:
// load from cache
let loadFromCacheAsync<'a when 'a: (new: unit -> 'a) and 'a: struct and 'a :> ValueType> filespec =
async {
let! bytes = File.ReadAllBytesAsync(filespec) |> Async.AwaitTask
let result =
use pBytes = fixed bytes
let sourceSpan = Span<byte>(NativePtr.toVoidPtr pBytes, bytes.Length)
MemoryMarshal.Cast<byte, 'a>(sourceSpan).ToArray()
return result
}
// save to cache
let saveToCacheAsync<'a when 'a: unmanaged> filespec (data: 'a array) =
Directory.CreateDirectory cacheFolder |> ignore
let sizeStruct = sizeof<'a>
use ptr = fixed data
let nativeSpan = Span<byte>(NativePtr.toVoidPtr ptr, data.Length * sizeStruct).ToArray()
File.WriteAllBytesAsync(filespec, nativeSpan) |> Async.AwaitTask
and it requires the data structures to be unmanaged.
For example, I have:
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type ShortTradeData =
{
[<FieldOffset(00)>] Timestamp: DateTime
[<FieldOffset(08)>] Price: double
[<FieldOffset(16)>] Quantity: double
[<FieldOffset(24)>] Direction: int
}
or
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type ShortCandleData =
{
[<FieldOffset(00)>] Timestamp: DateTime
[<FieldOffset(08)>] Open: double
[<FieldOffset(16)>] High: double
[<FieldOffset(24)>] Low: double
[<FieldOffset(32)>] Close: double
}
etc...
I'm now facing a case where I need to store a string. I know the max length of the strings but I'm trying to find out how I can do this with un-managed types.
I'm wondering if I could do something like this (for 256 bytes):
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type TestData =
{
[<FieldOffset(00)>] Timestamp: DateTime
[<FieldOffset(08)>] Text: char
[<FieldOffset(264)>] Dummy: int
}
Would it be safe then to get a pointer to Text, cast it to a char array, read / write what I want in it and then save / load it as needed?
Or am I asking for some random troubles at some point?
As a side question, any way to speed up the loadFromCache function is very welcome too :)
Edit:
I came up with this for now. It converts a list of complex event objects into something serializable.
The line:
let bytes = Pipeline.serializeBinary event
turns the original event data into a byte array.
Then I create the struct that will hold the binary stream, write the length, create a span representing the struct and copy the bytes. Then I marshal the span into the struct type (ShortEventData).
I can't use Marshal copy since I can't put a destination offset, so I have to copy the bytes with a loop. But there has to be a better way.
And I think, there has to be a better way for everything else in this as well :D Any suggestion would help, I just don't really like this solution.
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type ShortEventData =
{
[<FieldOffset(00)>] Timestamp: DateTime
[<FieldOffset(08)>] Event: byte
[<FieldOffset(1032)>] Length: int
}
events
|> List.map (fun event ->
let bytes = Pipeline.serializeBinary event
let serializableEvent : DataCache.ShortEventData =
{
Timestamp = event.GetTimestamp()
Event = byte 0
Length = bytes.Length
}
use ptr = fixed [|serializableEvent|]
let nativeSpan = Span<byte>(NativePtr.toVoidPtr ptr, serializableEvent.Length * sizeStruct)
for i = 0 to bytes.Length - 1 do
nativeSpan[8 + i] <- bytes[i]
MemoryMarshal.Cast<byte, DataCache.ShortEventData>(nativeSpan).ToArray()[0]
)
Edit:
Adding benchmarks for different serialization models:
open System
open System.IO
open System.Runtime.InteropServices
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
open MBrace.FsPickler
open Microsoft.FSharp.NativeInterop
open Newtonsoft.Json
#nowarn "9"
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type TestStruct =
{
[<FieldOffset(00)>] SomeValue: int
[<FieldOffset(04)>] AnotherValue: int
[<FieldOffset(08)>] YetAnotherValue: double
}
static member MakeOne(r: Random) =
{
SomeValue = r.Next()
AnotherValue = r.Next()
YetAnotherValue = r.NextDouble()
}
[<MemoryDiagnoser>]
type Benchmarks () =
let testData =
let random = Random(1000)
Array.init 1000 (fun _ -> TestStruct.MakeOne(random))
// inits, outside of the benchmarks
// FSPickler
let FSPicklerSerializer = FsPickler.CreateBinarySerializer()
// APEX
let ApexSettings = Apex.Serialization.Settings().MarkSerializable(typeof<TestStruct>)
let ApexBinarySerializer = Apex.Serialization.Binary.Create(ApexSettings)
[<Benchmark>]
member _.Thomas() = // thomas' save to disk
let sizeStruct = sizeof<TestStruct>
use ptr = fixed testData
Span<byte>(NativePtr.toVoidPtr ptr, testData.Length * sizeStruct).ToArray()
[<Benchmark>]
member _.Newtonsoft() =
JsonConvert.SerializeObject(testData)
[<Benchmark>]
member _.FSPickler() =
FSPicklerSerializer.Pickle testData
[<Benchmark>]
member _.Apex() =
let outputStream = new MemoryStream()
ApexBinarySerializer.Write(testData, outputStream)
[<EntryPoint>]
let main _ =
let _ = BenchmarkRunner.Run<Benchmarks>()
0
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------- |-------------:|-------------:|-------------:|---------:|--------:|--------:|----------:|
| Thomas | 878.4 ns | 11.74 ns | 10.41 ns | 2.5444 | 0.1411 | - | 16 KB |
| Newtonsoft | 880,641.2 ns | 16,346.50 ns | 15,290.52 ns | 103.5156 | 79.1016 | 48.8281 | 508 KB |
| FSPickler | 71,786.6 ns | 1,373.89 ns | 1,349.35 ns | 13.6719 | 2.0752 | - | 84 KB |
| Apex | 1,088.8 ns | 20.59 ns | 22.03 ns | 2.6093 | 0.0725 | - | 16 KB |
It looks like Apex is very close to what I did, but it's probably a lot more flexible and more optimized, so it could make sense to switch to it, UNLESS what I have can be a lot more optimized.
I have to see how #JL0PD's excellent comments can improve the speed.

Out of interest I took the lambda at the end of your question and tested three similar implementations and ran it on Benchmark.Net.
Reference - as you have shown
Mutable Struct - as I might have done it with a mutable struct
Record - using a plain old dumb record
See the results for yourself. Plain old dumb record is the fastest (though only marginally faster than my attempt and ~10x faster than your example). Write dumb code first. Benchmark it. Then try to improve.
#nowarn "9"
open System
open System.Runtime.InteropServices
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
open Microsoft.FSharp.NativeInterop
type ShortEventDataRec =
{
Timestamp: DateTime
Event: byte[]
Length: int
}
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type ShortEventData =
{
[<FieldOffset(00)>] Timestamp: DateTime
[<FieldOffset(08)>] Event: byte
[<FieldOffset(1032)>] Length: int
}
[<StructLayout(LayoutKind.Explicit)>]
type MutableShortEventData =
struct
[<FieldOffset(00)>] val mutable Timestamp: DateTime
[<FieldOffset(08)>] val mutable Event: byte
[<FieldOffset(1032)>] val mutable Length: int
end
[<MemoryDiagnoser>]
type Benchmarks () =
let event =
Array.init 1024 (fun i -> byte (i % 256))
let time = DateTime.Now
let sizeStruct = sizeof<ShortEventData>
[<Benchmark>]
member __.Reference() =
let bytes = event
let serializableEvent =
{
ShortEventData.Timestamp = time
Event = byte 0
Length = bytes.Length
}
use ptr = fixed [|serializableEvent|]
let nativeSpan = Span<byte>(NativePtr.toVoidPtr ptr, sizeStruct)
for i = 0 to bytes.Length - 1 do
nativeSpan.[8 + i] <- bytes.[i]
MemoryMarshal.Cast<byte, ShortEventData>(nativeSpan).[0]
[<Benchmark>]
member __.MutableStruct() =
let bytes = event
let targetBytes = GC.AllocateUninitializedArray(sizeStruct)
let targetSpan = Span(targetBytes)
let targetStruct = MemoryMarshal.Cast<_, MutableShortEventData>(targetSpan)
targetStruct.[0].Timestamp <- time
let targetEvent = bytes.CopyTo(targetSpan.Slice(8, 1024))
targetStruct.[0].Length <- event.Length
targetStruct.[0]
[<Benchmark>]
member __.Record() =
let bytes = event
let serializableEvent =
{
ShortEventDataRec.Timestamp = time
Event =
let eventBytes = GC.AllocateUninitializedArray(bytes.Length)
System.Array.Copy(bytes, eventBytes, bytes.Length)
eventBytes
Length = bytes.Length
}
serializableEvent
[<EntryPoint>]
let main _ =
let _ = BenchmarkRunner.Run<Benchmarks>()
0
Method
Mean
Error
StdDev
Gen 0
Gen 1
Allocated
Reference
526.88 ns
6.318 ns
5.909 ns
0.0629
-
1 KB
MutableStruct
49.50 ns
0.966 ns
1.074 ns
0.0636
-
1 KB
Record
42.73 ns
0.672 ns
0.628 ns
0.0650
0.0002
1 KB

Related

How to force any decimal value (be it part of a type or not) generated with fscheck to be within a certain range?

I'm using fscheck to write some unite tests and I would like to narrow down the range of decimal automatically generated and that regardless of the parameter I'm passing. What I mean by that is that let's say I have the types below:
decimal
DecimalHolder
Nested records containing decimal fields
DU with cases with decimal fields
Without having something to define an arbitrary for each single type, just that down the line in the generation if there a decimal it must say be between 0 and 300,000.
module Tests
open Xunit
open FsCheck.Xunit
open Swensen.Unquote
let addDecimals a b: decimal =
a + b
[<Property>]
let ``test adding two decimals`` a b =
let actual = addDecimals a b
let expected = a + b
test<# actual = expected #>
type DecimalHolder =
{ Value: decimal }
let addDecimalHolders a b =
{ Value = a.Value + b.Value }
[<Property>]
let ``test adding two decimal holders`` a b =
let actual = addDecimalHolders a b
let expected = { Value = a.Value + b.Value }
test<# actual = expected #>
type DecimalStuff =
| Value of decimal
| Holder of DecimalHolder
| Holders of DecimalHolder list
// Whatever
etc.
How can I achieve that?
Ok actually the Arbitrary definition works recursively across parameters types was enough:
module Tests
open Xunit
open FsCheck.Xunit
open Swensen.Unquote
type NotBigPositiveDecimalArbitrary =
static member NotBigPositiveDecimal() =
Gen.choose (1, 500)
|> Gen.map (fun x -> decimal x)
|> Arb.fromGen
let addDecimals a b: decimal =
a + b
[<Property(Arbitrary = [| typeof<NotBigPositiveDecimalArbitrary> |])>]
let ``test adding two decimals`` a b =
let actual = addDecimals a b
let expected = a + b
test<# actual = expected #>
type DecimalHolder =
{ Value: decimal }
let addDecimalHolders a b =
{ Value = a.Value + b.Value }
[<Property(Arbitrary = [| typeof<NotBigPositiveDecimalArbitrary> |])>]
let ``test adding two decimal holders`` a b =
let actual = addDecimalHolders a b
let expected = { Value = a.Value + b.Value }
test<# actual = expected #>

Printing Lists in Haskell new

Brand new to haskell and I need to print out the data contained on a seperate row for each individual item
Unsure on how to
type ItemDescr = String
type ItemYear = Int
type ItemPrice = Int
type ItemSold = Int
type ItemSales = Int
type Item = (ItemRegion,ItemDescr,ItemYear,ItemPrice,ItemSold,ItemSales)
type ListItems = [Item]
rownumber x
| x == 1 = ("Scotland","Desktop",2017,900,25,22500)
| x == 2 = ("England","Laptop",2017,1100,75,82500)
| x == 3 = ("Wales","Printer",2017,120,15,1800)
| x == 4 = ("England","Printer",2017,120,60,7200)
| x == 5 = ("England","Desktop",2017,900,50,45000)
| x == 6 = ("Wales","Desktop",2017,900,20,18000)
| x == 7 = ("Scotland","Printer",2017,25,25,3000)
showall
--print??
So for example on each individual line
show
"Scotland","Desktop",2017,900,25,22500
followed by the next record
Tip 1:
Store the data like this
items = [("Scotland","Desktop",2017,900,25,22500),
("England","Laptop",2017,1100,75,82500),
("Wales","Printer",2017,120,15,1800),
("England","Printer",2017,120,60,7200),
("England","Desktop",2017,900,50,45000),
("Wales","Desktop",2017,900,20,18000),
("Scotland","Printer",2017,25,25,3000)]
Tip 2:
Implement this function
toString :: Item -> String
toString = undefined -- do this yourselves
Tip 3:
Try to combine the following functions
unlines, already in the Prelude
toString, you just wrote it
map, does not need any explanation
putStrLn, not even sure if this is a real function, but you need it anyway.
($), you can do without this one, but it will give you bonus points

"Borrowed Value Does Not Live Long Enough" when pushing into a vector

I am trying a daily programmer problem to shuffle a list of arguments and output them.
I'm not sure if this is the correct approach but it sounded like a good idea: remove the element from the args vector so it doesn't get repeated, and insert it into the result vector.
extern crate rand; // 0.7.3
use std::io;
use std::cmp::Ordering;
use std::env;
use rand::Rng;
fn main() {
let mut args: Vec<_> = env::args().collect();
let mut result: Vec<_> = Vec::with_capacity(args.capacity());
if args.len() > 1 {
println!("There are(is) {} argument(s)", args.len() - 1)
}
for x in args.iter().skip(1) {
let mut n = rand::thread_rng().gen_range(1, args.len());
result.push(&args.swap_remove(n));
}
for y in result.iter() {
println!("{}", y);
}
}
I get the error:
error[E0716]: temporary value dropped while borrowed
--> src/main.rs:18:22
|
18 | result.push(&args.swap_remove(n));
| ^^^^^^^^^^^^^^^^^^^ - temporary value is freed at the end of this statement
| |
| creates a temporary which is freed while still in use
...
21 | for y in result.iter() {
| ------ borrow later used here
|
= note: consider using a `let` binding to create a longer lived value
Older compilers said:
error[E0597]: borrowed value does not live long enough
--> src/main.rs:18:42
|
18 | result.push(&args.swap_remove(n));
| ------------------- ^ temporary value dropped here while still borrowed
| |
| temporary value created here
...
24 | }
| - temporary value needs to live until here
|
= note: consider using a `let` binding to increase its lifetime
Let's start with a smaller example. This is called an Minimal, Reproducible Example , and is very valuable for both you as a programmer and for us to answer your question. Additionally, it can run on the Rust Playground, which is convenient.
fn main() {
let mut args = vec!["a".to_string()];
let mut result = vec![];
for _ in args.iter() {
let n = args.len() - 1; // Pretend this is a random index
result.push(&args.swap_remove(n));
}
for y in result.iter() {
println!("{}", y);
}
}
The problem arises because when you call swap_remove, the item is moved out of the vector and given to you - the ownership is transferred. You then take a reference to the item and try to store that reference in the result vector. The problem is that the item is dropped after the loop iteration has ended because nothing owns it. If you were allowed to take that reference, it would be a dangling reference, one that points to invalid memory. Using that reference could cause a crash, so Rust prevents it.
The immediate fix is to not take a reference, but instead transfer ownership from one vector to the other. Something like:
for _ in args.iter() {
let n = args.len() - 1; // Pretend this is a random index
result.push(args.swap_remove(n));
}
The problem with this is that you will get
error[E0502]: cannot borrow `args` as mutable because it is also borrowed as immutable
--> src/main.rs:7:21
|
5 | for _ in args.iter() {
| -----------
| |
| immutable borrow occurs here
| immutable borrow later used here
6 | let n = args.len() - 1;
7 | result.push(args.swap_remove(n));
| ^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here
See the args.iter? That creates an iterator that refers to the vector. If you changed the vector, then the iterator would become invalid, and allow access to an item that may not be there, another potential crash that Rust prevents.
I'm not making any claim that this is a good way to do it, but one solution would be to iterate while there are still items:
while !args.is_empty() {
let n = args.len() - 1; // Pretend this is a random index
result.push(args.swap_remove(n));
}
I'd solve the overall problem by using shuffle:
use rand::seq::SliceRandom; // 0.8.3
use std::env;
fn main() {
let mut args: Vec<_> = env::args().skip(1).collect();
args.shuffle(&mut rand::thread_rng());
for y in &args {
println!("{}", y);
}
}

Error while using Z3 module in OCaml

I am new to OCaml. I installed Z3 module as mentioned in this link
I am calling Z3 using the command:
ocamlc -custom -o ml_example.byte -I ~/Downloads/z3-unstable/build/api/ml -cclib "-L ~/Downloads/z3-unstable/build/ -lz3" nums.cma z3ml.cma $1
where $1 is replaced with file name.
type loc = int
type var = string
type exp =
| Mul of int * exp
| Add of exp * exp
| Sub of exp * exp
| Const of int
| Var of var
type formula =
| Eq of exp * exp
| Geq of exp
| Gt of exp
type stmt =
| Assign of var * exp
| Assume of formula
type transition = loc * stmt * loc
module OrdVar =
struct
type t = var
let compare = Pervasives.compare
end
module VarSets = Set.Make( OrdVar )
type vars = VarSets.t
module OrdTrans =
struct
type t = transition
let compare = Pervasives.compare
end
module TransitionSets = Set.Make( OrdTrans )
type transitionSet = TransitionSets.t
type program = vars * loc * transitionSet * loc
let ex1 () : program =
let vset = VarSets.empty in
let vset = VarSets.add "x" vset in
let vset = VarSets.add "y" vset in
let vset = VarSets.add "z" vset in
let ts = TransitionSets.empty in
(* 0 X' = X + 1 *)
let stmt1 = Assign( "x", Add( Var("x"), Const(1) ) ) in
let tr1 = (0,stmt1,1) in
let ts = TransitionSets.add tr1 ts in
(vset,0,ts,10)
In the above code I am defining some types. Now if I include the command "open Z3", I am getting "Error: Unbound module Set.Make".
I could run test code which uses Z3 module with out any difficulty, but unable to run with the above code.
The error message in this case is a little bit confusing. The problem is that Z3 also provides a module called Set, which doesn't have a make function. This can be overcome simply by not importing everything from Z3, as there are a number of modulse that might clash with others. For example,
open Z3.Expr
open Z3.Boolean
will work fine and opens only the Z3.Expr and Z3.Boolean modules, but not the Z3.Set module. so that we can write an example function:
let myfun (ctx:Z3.context) (args:expr list) =
mk_and ctx args
If Z3.Boolean is not opened, we would have to write Z3.Boolean.mk_and instead, and similarly we can still access Z3's Set module functions by prefixing them with Z3.Set.

Regex / subString to extract all matching patterns / groups

I get this as a response to an API hit.
1735 Queries
Taking 1.001303 to 31.856310 seconds to complete
SET timestamp=XXX;
SELECT * FROM ABC_EM WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
38 Queries
Taking 1.007646 to 5.284330 seconds to complete
SET timestamp=XXX;
show slave status;
6 Queries
Taking 1.021271 to 1.959838 seconds to complete
SET timestamp=XXX;
SHOW SLAVE STATUS;
2 Queries
Taking 4.825584, 18.947725 seconds to complete
use marketing;
SET timestamp=XXX;
SELECT * FROM ABC WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
I have extracted this out of the response html and have it as a string now.I need to retrieve values as concisely as possible such that I get a map of values of this format Map(Query -> T1 to T2 seconds) Basically what this is the status of all the slow queries running on MySQL slave server. I am building an alert system over it . So from this entire paragraph in the form of String I need to separate out the queries and save the corresponding time range with them.
1.001303 to 31.856310 is a time range . And against the time range the corresponding query is :
SET timestamp=XXX; SELECT * FROM ABC_EM WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
This information I was hoping to save in a Map in scala. A Map of the form (query:String->timeRange:String)
Another example:
("use marketing; SET timestamp=XXX; SELECT * FROM ABC WHERE last_modified >= 'XXX' AND last_modified xyz ;"->"4.825584 to 18.947725 seconds")
"""###(.)###(.)\n\n(.*)###""".r.findAllIn(reqSlowQueryData).matchData foreach {m => println("group0"+m.group(1)+"next group"+m.group(2)+m.group(3)}
I am using the above statement to extract the the repeating cells to do my manipulations on it later. But it doesnt seem to be working;
THANKS IN ADvance! I know there are several ways to do this but all the ones striking me are inefficient and tedious. I need Scala to do the same! Maybe I can extract recursively using the subString method ?
If you want use scala try this:
val regex = """(\d+).(\d+).*(\d+).(\d+) seconds""".r // extract range
val txt = """
|1735 Queries
|
|Taking 1.001303 to 31.856310 seconds to complete
|
|SET timestamp=XXX; SELECT * FROM ABC_EM WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
|
|38 Queries
|
|Taking 1.007646 to 5.284330 seconds to complete
|
|SET timestamp=XXX; show slave status;
|
|6 Queries
|
|Taking 1.021271 to 1.959838 seconds to complete
|
|SET timestamp=XXX; SHOW SLAVE STATUS;
|
|2 Queries
|
|Taking 4.825584, 18.947725 seconds to complete
|
|use marketing; SET timestamp=XXX; SELECT * FROM ABC WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
""".stripMargin
def logToMap(txt:String) = {
val (_,map) = txt.lines.foldLeft[(Option[String],Map[String,String])]((None,Map.empty)){
(acc,el) =>
val (taking,map) = acc // taking contains range
taking match {
case Some(range) if el.trim.nonEmpty => //Some contains range
(None,map + ( el -> range)) // add to map
case None =>
regex.findFirstIn(el) match { //extract range
case Some(range) => (Some(range),map)
case _ => (None,map)
}
case _ => (taking,map) // probably empty line
}
}
map
}
Modified ajozwik's answer to work for SQL commands over multiple lines :
val regex = """(\d+).(\d+).*(\d+).(\d+) seconds""".r // extract range
def logToMap(txt:String) = {
val (_,map) = txt.lines.foldLeft[(Option[String],Map[String,String])]((None,Map.empty)){
(accumulator,element) =>
val (taking,map) = accumulator
taking match {
case Some(range) if element.trim.nonEmpty=> {
if (element.contains("Queries"))
(None, map)
else
(Some(range),map+(range->(map.getOrElse(range,"")+element)))
}
case None =>
regex.findFirstIn(element) match {
case Some(range) => (Some(range),map)
case _ => (None,map)
}
case _ => (taking,map)
}
}
println(map)
map
}