How to wrap a string in an input-stream? - clojure

How can I wrap a string in an input-stream in such a way that I can test the function bellow?
(defn parse-body [body]
(cheshire/parse-stream (clojure.java.io/reader body) true))
(deftest test-parse-body
(testing "read body"
(let [body "{\"age\": 28}"] ;; must wrap string
(is (= (parse-body body) {:age 28}))
)))

It is straightforward to construct an InputStream from a String using host interop, by converting to a byte-array first:
(defn string->stream
([s] (string->stream s "UTF-8"))
([s encoding]
(-> s
(.getBytes encoding)
(java.io.ByteArrayInputStream.))))
As another stream and byte interop example, here's a function that returns a vector of the bytes produced when encoding a String to a given format:
(defn show-bytes
[s encoding]
(let [buf (java.io.ByteArrayOutputStream.)
stream (string->stream s encoding)
;; worst case, 8 bytes per char?
data (byte-array (* (count s) 8))
size (.read stream data 0 (count data))]
(.write buf data 0 size)
(.flush buf)
(apply vector-of :byte (.toByteArray buf))))
+user=> (string->stream "hello")
#object[java.io.ByteArrayInputStream 0x39b43d60 "java.io.ByteArrayInputStream#39b43d60"]
+user=> (isa? (class *1) java.io.InputStream)
true
+user=> (show-bytes "hello" "UTF-8")
[104 101 108 108 111]
+user=> (show-bytes "hello" "UTF-32")
[0 0 0 104 0 0 0 101 0 0 0 108 0 0 0 108 0 0 0 111]

Related

using java.lang.invoke.MethodHandle in clojure

I'm following a tutorial here: https://www.baeldung.com/java-method-handles
In clojure, I've got a simple example:
(import (java.lang.invoke MethodHandles
MethodHandles$Lookup
MethodType
MethodHandle))
(defonce +lookup+ (MethodHandles/lookup))
(def ^MethodHandle concat-handle (.findVirtual +lookup+
String
"concat"
(MethodType/methodType String String)))
(.invokeExact concat-handle (into-array Object ["hello" "there"]))
which gives an error:
Unhandled java.lang.invoke.WrongMethodTypeException
expected (String,String)String but found (Object[])Object
Invokers.java: 476 java.lang.invoke.Invokers/newWrongMethodTypeException
Invokers.java: 485 java.lang.invoke.Invokers/checkExactType
REPL: 26 hara.object.handle/eval17501
REPL: 26 hara.object.handle/eval17501
Compiler.java: 7062 clojure.lang.Compiler/eval
Compiler.java: 7025 clojure.lang.Compiler/eval
core.clj: 3206 clojure.core/eval
core.clj: 3202 clojure.core/eval
main.clj: 243 clojure.main/repl/read-eval-print/f
is there a way to get invoke working?
You can use .invokeWithArguments which will figure out the correct arity from the supplied arguments:
(.invokeWithArguments concat-handle (object-array ["hello" "there"]))
=> "hellothere"
Or you can use .invoke, but you'll need MethodHandle.asSpreader to apply the varargs correctly to String.concat which has fixed arity:
(def ^MethodHandle other-handle
(.asSpreader
concat-handle
(Class/forName "[Ljava.lang.String;") ;; String[].class
2))
(.invoke other-handle (into-array String ["hello" "there"]))
=> "hellothere"
I'm not sure how to make this work with .invokeExact from Clojure, if it's possible.
The symbolic type descriptor at the call site of invokeExact must exactly match this method handle's type. No conversions are allowed on arguments or return values.
This answer has more explanation on restrictions of .invoke and .invokeExact.
Some interesting benchmarks based on #TaylorWood's answer:
(with-out-str
(time (dotimes [i 1000000]
(.concat "hello" "there"))))
=> "\"Elapsed time: 8.542214 msecs\"\n"
(with-out-str
(def concat-fn (fn [a b] (.concat a b)))
(time (dotimes [i 1000000]
(concat-fn "hello" "there"))))
=> "\"Elapsed time: 3600.357352 msecs\"\n"
(with-out-str
(def concat-anno (fn [^String a b] (.concat a b)))
(time (dotimes [i 1000000]
(concat-anno "hello" "there"))))
=> "\"Elapsed time: 16.461237 msecs\"\n"
(with-out-str
(def concat-reflect (.? String "concat" :#))
(time (dotimes [i 1000000]
(concat-reflect "hello" "there"))))
=> "\"Elapsed time: 1804.522226 msecs\"\n"
(with-out-str
(def ^MethodHandle concat-handle
(.findVirtual +lookup+
String
"concat"
(MethodType/methodType String String)))
(time (dotimes [i 1000000]
(.invokeWithArguments concat-handle (into-array Object ["hello" "there"])))))
=> "\"Elapsed time: 1974.824815 msecs\"\n"
(with-out-str
(def ^MethodHandle concat-spread
(.asSpreader concat-handle
(Class/forName "[Ljava.lang.String;") ;; String[].class
2))
(time (dotimes [i 1000000]
(.invoke other-handle (into-array String ["hello" "there"])))))
=> "\"Elapsed time: 399.779913 msecs\"\n"

Composing a Buffy buffer from the middle of an array and finding out how much it has consumed

I'd like to use Buffy to interpret binary data starting from the middle of an array. I also need to find out how many bytes of the array have been consumed by Buffy.
Let's say I have a dynamic buffer definition like this:
(ns foo.core
(:refer-clojure :exclude [read])
(:use [byte-streams])
(:require [clojurewerkz.buffy.core :refer :all]
[clojurewerkz.buffy.frames :refer :all]
[clojurewerkz.buffy.types.protocols :refer :all])
(:import [io.netty.buffer Unpooled ByteBuf]))
(def dynbuf
(let [string-encoder (frame-encoder [value]
length (short-type) (count value)
string (string-type (count value)) value)
string-decoder (frame-decoder [buffer offset]
length (short-type)
string (string-type (read length buffer offset)))]
(dynamic-buffer (frame-type string-encoder string-decoder second))))
I hoped I could use a Netty ByteBuf to parse a bunch of bytes using dynbuf starting at an offset:
(def buf
(let [bytes (concat [0 0 0 4] (map #(byte %) "Foobar"))
offset 2]
(Unpooled/wrappedBuffer (byte-array bytes) offset (- (count bytes) offset))))
At this point, I can parse buf per dynbuf:
user> (decompose dynbuf buf)
["Foob"]
At this point, I was hoping that reading the short-type and the string-type from buf has moved its readerIndex by 6, but alas, it is not so:
user> (.readerIndex buf)
0
Is this because buffy/decompose makes some kind of shallow copy of the stream for its internal use, so the readerIndex of the outer buf is not updated? Or am I misunderstanding what readerIndex is supposed to be?
How can I achieve my original goal of passing a (byte-array) at a given offset to Buffy and learning how many bytes it has consumed?
Buffy is using the absolute version of the getXXX method, which do not modify the position of the buffer, so you cannot use .readerIndex.
I see two possible options, depending on what you are trying to achieve:
Use Buffy dynamic frames. Note that the clojurewerkz.buffy.frames namespace has a decoding-size function if you want to know how much the dynamic frame will take. Something like:
(defn read-from-middle [data f-type start-idx]
(let [tmp-buf (Unpooled/wrappedBuffer data start-idx (- (alength data) start-idx))
buffy-buffer (dynamic-buffer f-type)
total-size (decoding-size f-type tmp-buf 0)]
[(decompose buffy-buffer tmp-buf) (+ start-idx total-size)]))
(def f-type
(let [string-encoder (frame-encoder [value]
length (short-type) (count value)
string (string-type (count value)) value)
string-decoder (frame-decoder [buffer offset]
length (short-type)
string (string-type (read length buffer offset)))]
(frame-type string-encoder string-decoder second)))
(let [my-data (byte-array [0 1 0x61 0 2 0x62 0x63 0 1 0x64])
idx 0
[i1 idx] (read-from-middle my-data f-type idx)
[i2 idx] (read-from-middle my-data f-type idx)
[i3 idx] (read-from-middle my-data f-type idx)]
[i1 i2 i3])
Calculate the size of the frame as Buffy is doing and manually set the correct position in the buffer. Something like:
(import [io.netty.buffer Unpooled])
(require '[clojurewerkz.buffy.core :as buffy]
'[clojurewerkz.buffy.types.protocols :as ptypes])
(defn read-from-middle [data spec start-idx]
(let [total-size (reduce + (map ptypes/size (map second spec)))
tmp-buf (Unpooled/wrappedBuffer data start-idx (- (alength data) start-idx))
buffy-buffer (buffy/compose-buffer spec :orig-buffer tmp-buf)]
[(buffy/decompose buffy-buffer) (+ start-idx total-size)]))
(let [my-data (byte-array [0 0 0 1 0 0 0 2 0 0 0 3])
spec (buffy/spec :foo (buffy/int32-type))
idx 0
[i1 idx] (read-from-middle my-data spec idx)
[i2 idx] (read-from-middle my-data spec idx)
[i3 idx] (read-from-middle my-data spec idx)]
[i1 i2 i3])

Clojure flat sequence into tree

I have the following vector, [-1 1 2 -1 3 0 -1 2 -1 4 0 3 0 0]
which represents the tree [[1 2 [3] [2 [4] 3]]]
where -1 begins a new branch and 0 ends it. How can I convert the original vector into a usable tree-like clojure structure (nested vector, nested map)? I think clojure.zip/zipper might do it but I'm not sure how to build those function args.
Zippers are a good tool for this:
(require '[clojure.zip :as zip])
(def in [-1 1 2 -1 3 0 -1 2 -1 4 0 3 0 0])
(def out [[1 2 [3] [2 [4] 3]]])
(defn deepen [steps]
(->> steps
(reduce (fn [loc step]
(case step
-1 (-> loc
(zip/append-child [])
(zip/down)
(zip/rightmost))
0 (zip/up loc)
(zip/append-child loc step)))
(zip/vector-zip []))
(zip/root)))
(assert (= (deepen in) out))
Somehow this feels like cheating:
[(read-string
(clojure.string/join " "
(replace {-1 "[" 0 "]"}
[-1 1 2 -1 3 0 -1 2 -1 4 0 3 0 0])))]
This is not too hard with some recursion:
(defn numbers->tree [xs]
(letfn [(step [xs]
(loop [ret [], remainder xs]
(if (empty? remainder)
[ret remainder]
(let [x (first remainder)]
(case x
0 [ret (next remainder)]
-1 (let [[ret' remainder'] (step (next remainder))]
(recur (conj ret ret'), remainder'))
(recur (conj ret x) (next remainder)))))))]
(first (step xs))))
The idea is to have a function (step) that finds a sub-tree, and returns that tree as well as what numbers are left to be processed. It proceeds iteratively (via loop) for most inputs, and starts a recursive instance of itself when it runs into a -1. The only tricky part is making sure to use the remainder returned from these recursive invocations, rather than proceeding on with the list you were in the middle of.

what is the clojure way to do things

As part of a larger program, I'm testing a function that will turn a string of days on which a class occurs (such as "MWF") into a list of seven numbers: (1 0 1 0 1 0 0).
I first translate"TH" (Thursday) to "R" and "SU" (Sunday) to "N" to make things a bit easier.
I came up with the following code:
(defn days-number-maker
"Recursively compare first item in days of week with
first item in string of days. If matching, add a 1,
else add a zero to the result"
[all-days day-string result]
(if (empty? all-days) (reverse result)
(if (= (first all-days) (first day-string))
(recur (rest all-days)(rest day-string) (conj result 1))
(recur (rest all-days) day-string (conj result 0)))))
(defn days-to-numbers
"Change string like MTTH to (1 1 0 1 0 0 0)"
[day-string]
(let [days (clojure.string/replace
(clojure.string/replace day-string #"TH" "R") #"SU" "N")]
(days-number-maker "MTWRFSN" days (list))))
The good news: the code works. The bad news: I'm convinced I'm doing it wrong, in the moral purity sense of the word. Something inside of me says, "You could have just used (map...) to do this the right way," but I can't see how to do it with (map). So, my two questions are:
1) Is there such a thing as "the Clojure way," and if so,
2) How can I rewrite the code to be more Clojure-ish?
You can use map and sets
Using map and sets:
(defn days-number-maker
[all-days day-string]
(let [day-set (set day-string)]
(map (fn [day]
(if (day-set day)
1
0))
all-days)))
(defn days-to-numbers
"Change string like MTTH to (1 1 0 1 0 0 0)"
[day-string]
(let [days (clojure.string/replace
(clojure.string/replace day-string #"TH" "R") #"SU" "N")]
(days-number-maker "MTWRFSN" days)))
This is how I would do it a bit more succinctly:
(defn days-to-numbers
"Change string like MTTH to (1 1 0 1 0 0 0)"
[week-string]
(let [char-set (set (clojure.string/replace
(clojure.string/replace week-string "TH" "R") "SU" "N"))]
(map #(if (char-set %) 1 0)
"MTWRFSN")))
Tests:
=> (days-to-numbers "")
(0 0 0 0 0 0 0)
=> (days-to-numbers "MTWTHFSSU")
(1 1 1 1 1 1 1)
=> (days-to-numbers "MTHSU")
(1 0 0 1 0 0 1)
=> (days-to-numbers "FM")
(1 0 0 0 1 0 0)
Following on from #TheQuickBrownFox's answer ...
You don't need to recode "TH" and "SU": the second letters will
do.
Use false or nil instead of 0, so that you can apply logical tests directly.
Return the result as a vector, as you're quite likely to want to
index into it.
Giving ...
(defn days-to-numbers [ds]
(let [dns (->> ds
(partition-all 2 1)
(remove #{[\S \U] [\T \H]})
(map first)
set)]
(mapv dns "MTWHFSU")))
For example,
(days-to-numbers "MTTH")
;[\M \T nil \H nil nil nil]
Though the function is mis-named, as the elements are logical values, not numbers.
I'd prefer to return the set of day numbers:
(def day-index (into {} (map-indexed (fn [x y] [y x]) "MTWHFSU")))
;{\M 0, \T 1, \W 2, \H 3, \F 4, \S 5, \U 6}
(defn day-numbers [ds]
(->> ds
(partition-all 2 1)
(remove #{[\S \U] [\T \H]})
(map (comp day-index first))
set))
For example,
(day-numbers "MTTH")
;#{0 1 3}

Clojure's equivalent to Python's encode('hex') and decode('hex')

Is there an idiomatic way of encoding and decoding a string in Clojure as hexadecimal? Example from Python:
'Clojure'.encode('hex')
# ⇒ '436c6f6a757265'
'436c6f6a757265'.decode('hex')
# ⇒ 'Clojure'
To show some effort on my part:
(defn hexify [s]
(apply str
(map #(format "%02x" (int %)) s)))
(defn unhexify [hex]
(apply str
(map
(fn [[x y]] (char (Integer/parseInt (str x y) 16)))
(partition 2 hex))))
(hexify "Clojure")
;; ⇒ "436c6f6a757265"
(unhexify "436c6f6a757265")
;; ⇒ "Clojure"
Since all posted solutions have some flaws, I'm sharing my own:
(defn hexify "Convert byte sequence to hex string" [coll]
(let [hex [\0 \1 \2 \3 \4 \5 \6 \7 \8 \9 \a \b \c \d \e \f]]
(letfn [(hexify-byte [b]
(let [v (bit-and b 0xFF)]
[(hex (bit-shift-right v 4)) (hex (bit-and v 0x0F))]))]
(apply str (mapcat hexify-byte coll)))))
(defn hexify-str [s]
(hexify (.getBytes s)))
and
(defn unhexify "Convert hex string to byte sequence" [s]
(letfn [(unhexify-2 [c1 c2]
(unchecked-byte
(+ (bit-shift-left (Character/digit c1 16) 4)
(Character/digit c2 16))))]
(map #(apply unhexify-2 %) (partition 2 s))))
(defn unhexify-str [s]
(apply str (map char (unhexify s))))
Pros:
High performance
Generic byte stream <--> string conversions with specialized wrappers
Handling leading zero in hex result
Your implementation(s) don't work for non-ascii characters,
(defn hexify [s]
(apply str
(map #(format "%02x" (int %)) s)))
(defn unhexify [hex]
(apply str
(map
(fn [[x y]] (char (Integer/parseInt (str x y) 16)))
(partition 2 hex))))
(= "\u2195" (unhexify(hexify "\u2195")))
false ; should be true
To overcome this you need to serialize the bytes of the string using the required character encoding, which can be multi-byte per character.
There are a few 'issues' with this.
Remember that all numeric types are signed in the JVM.
There is no unsigned-byte.
In idiomatic java you would use the low byte of an integer and mask it like this wherever you used it.
int intValue = 0x80;
byte byteValue = (byte)(intValue & 0xff); -- use only low byte
System.out.println("int:\t" + intValue);
System.out.println("byte:\t" + byteValue);
-- output:
-- int: 128
-- byte: -128
clojure has (unchecked-byte) to effectively do the same.
For example, using UTF-8 you can do this:
(defn hexify [s]
(apply str (map #(format "%02x" %) (.getBytes s "UTF-8"))))
(defn unhexify [s]
(let [bytes (into-array Byte/TYPE
(map (fn [[x y]]
(unchecked-byte (Integer/parseInt (str x y) 16)))
(partition 2 s)))]
(String. bytes "UTF-8")))
; with the above implementation:
;=> (hexify "\u2195")
"e28695"
;=> (unhexify "e28695")
"↕"
;=> (= "\u2195" (unhexify (hexify "\u2195")))
true
Sadly the "idiom" appears to be using the Apache Commons Codec, e.g. as done in buddy:
(ns name-of-ns
(:import org.apache.commons.codec.binary.Hex))
(defn str->bytes
"Convert string to byte array."
([^String s]
(str->bytes s "UTF-8"))
([^String s, ^String encoding]
(.getBytes s encoding)))
(defn bytes->str
"Convert byte array to String."
([^bytes data]
(bytes->str data "UTF-8"))
([^bytes data, ^String encoding]
(String. data encoding)))
(defn bytes->hex
"Convert a byte array to hex encoded string."
[^bytes data]
(Hex/encodeHexString data))
(defn hex->bytes
"Convert hexadecimal encoded string to bytes array."
[^String data]
(Hex/decodeHex (.toCharArray data)))
I believe your unhexify function is as idiomatic as it can be. However, hexify can be written in a simpler way:
(defn hexify [s]
(format "%x" (new java.math.BigInteger (.getBytes s))))