RocksDB iterator seek until last matching prefix - clojure

How do I tell a RocksDB iterator to seek until the last matching prefix?
In Clojure using the RocksDB Java API:
(import '(org.rocksdb RocksDB Options ReadOptions RocksIterator Slice))
(let [opts (-> (ReadOptions.)
(.setPrefixSameAsStart true)
(.setTotalOrderSeek true))
iter (.newIterator db)]
(.seek iter (.getBytes ("some-prefix:"))
(.key iter))
=> "not-matching-prefix"
Do I have to manually check if the next key matches the prefix? This seems suboptimal because I have to stream in the whole key to check it, when RocksDB could eject early.

Yes, you can provide RocksDB the upper bound to optimize it - by using setIterateUpperBound in ReadOptions.

Related

How to run an interactive CLI program from within Clojure?

I'd like to run an interactive CLI program from within Clojure (e.g., vim) and be able to interact with it.
In bash and other programming languages, I can do that with
vim > `tty`
I tried to do the same in Clojure:
(require '[clojure.java.shell :as shell])
(shell/sh "vim > `tty`")
but it just opens vim without giving me tty.
Background: I'm developing a Clojure CLI tool which parses emails and lets a user edit the parsed data before saving them on the disk. It works the following way:
Read a file with email content and parse it. Each email is stored as a separate file.
Show a user the parsed data and let the user edit the data in vim. Internally I create a temporary file with the parsed data, but I don't mind doing it another way if that would solve my issue.
After a user finished editing the parsed data (they might decide to keep it as it is) append the data to a file on a disk. So all parsed data are saved to the same file.
Go to 1st step if there are any files with emails left.
This code relies on Clojure Java interop to make use of Java's ProcessBuilder class.
(defn -main
[]
;use doseq instead of for because for is lazily evaluated
(doseq [i [1 2 3]]
;extract current directory from system variable
(let [file-name (str "test" i ".txt")
working-directory (trim-newline (:out (sh "printenv" "PWD")))]
(spit file-name "")
;this is where fun begins. We use ProcessBuilder to forward commands to terminal
;we pass a list of commands and their arguments to its constructor
(let [process-builder (java.lang.ProcessBuilder. (list "vim" (str working-directory "/" file-name)))
;inherit is a configuration constant
inherit (java.lang.ProcessBuilder$Redirect/INHERIT)]
;we configure input, output and error redirection
(.redirectOutput process-builder inherit)
(.redirectError process-builder inherit)
(.redirectInput process-builder inherit)
;waitFor used to block execution until vim is closed
(.waitFor (.start process-builder))
)
;additional processing here
)
)
;not necessary but script tends to hang for around 30 seconds at end of its execution
;so this command is used to terminate it instantly
(System/exit 0)
)

List buffers associated with files?

I'm very new to using lisp, so I'm sorry if this is a trivial question. I haven't been able to find solutions after a while googling, though I'm sure that this is fault on my part.
So. I'm trying to write a command which will revert all open buffers. Simple. I just do
(setq revert-without-query (buffer-list))
(mapc 'revert-buffer (buffer-list))`
Unfortunately, this ends up failing if there are any buffers which aren't associated with files- which is to say, always.
Doing C-x C-b to list-buffers prints something like
CRM Buffer Size Mode File
init.el 300 Emacs-lisp ~/.spacemacs.d/init.el
%scratch% 30 Test
Ok. Easy enough. If I was allowed to mix lisp and python, I'd do something like
(setq revert-without-query [b for b in buffer-list if b.File != ""])
;; Or would I test for nil? Decisions, decisions...
Upon some digging, I found that there exists remove-if. Unfortunately, being completely new to lisp, I have no idea how to access the list, their attributes, or... well... anything. Mind helping me out?
One possibility would be checking buffer-file-name which will return nil if the buffer isn't visiting a file, eg.
(cl-loop for buf in (buffer-list)
if (buffer-file-name buf)
collect buf)
or
(cl-remove-if-not 'buffer-file-name (buffer-list))
You probably want to revert dired directories also. Any type of buffer can have its own specialized revert (see revert-buffer-function). So you probably want to check for both buffer-file-name and dired-directory being non-nil.
(dolist (b (buffer-list))
(when (buffer-live-p b)
(with-current-buffer b
(when (or buffer-file-name dired-directory)
(revert-buffer 'ignore-auto 'noconfirm)))))
You can also use the ignore-errors hammer, but you're probably better off fixing corner cases as you encounter them.

How do I interactively read the input text for this Emacs function?

I am new to Emacs functions. Today is my first attempt to create a function.
I know that count-matches will tell me how many times a regex appears in the rest of the buffer, but most of the time I need to count from the beginning of the buffer. So I tried this:
(defun count-matches-for-whole-buffer (text-to-count)
"Opens the ~/.emacs.d/init.el file"
(interactive "sText-to-count:")
(beginning-of-buffer)
(count-matches text-to-count))
I put this in ~/.emacs.d/init.el and then do "eval-buffer" on that buffer.
So now I have access to this function. And if I run it, it will ask me for text to search for.
But the function only gets as far as this line:
beginning-of-buffer
I never get the count. Why is that?
Two things.
You should use (goto-char (point-min)) instead of beginning-of-buffer.
count-matches will not display messages when called from lisp code unless you provide a parameter indicating so.
Try this code:
(defun count-matches-for-whole-buffer (text-to-count)
(interactive "sText-to-count:")
(count-matches text-to-count (point-min) (point-max) t))

Getting a dump of all the user-created functions defined in a repl session in clojure

Is there a way to get a dump of all the source code I have entered into a repl session. I have created a bunch of functions using (defn ...) but did it 'on the fly' without entering them in a text file (IDE) first.
Is there a convenience way to get the source back out of the repl session?
I note that:
(dir user)
will give me a printed list of type:
user.proxy$java.lang.Object
so I can't appear to get that printed list into a Seq for mapping a function like 'source' over. And even if I could then:
(source my-defined-fn)
returns "source not found"...even though I personally entered it in to the repl session.
Any way of doing this? Thanks.
Sorry, but I suspect the answer is no :-/
The best you get is scrolling up in the repl buffer to where you defined it. The source function works by looking in the var's metadata for the file and line number where the functions code is (or was last time it was evaluated), opening the file, and printing the lines. It looks like this:
...
(when-let [filepath (:file (meta v))]
(when-let [strm (.getResourceAsStream (RT/baseLoader) filepath)]
(with-open [rdr (LineNumberReader. (InputStreamReader. strm))]
(dotimes [_ (dec (:line (meta v)))] (.readLine rdr))
...
Not including the full source in the metadata was done on purpose to save memory in the normal case, though it does make it less convenient here.

What is the idiomatic way of iterating over a lazy sequence in Clojure?

I have the following functions to process large files with constant memory usage.
(defn lazy-helper
"Processes a java.io.Reader lazily"
[reader]
(lazy-seq
(if-let [line (.readLine reader)]
(cons line (lazy-helper reader))
(do (.close reader) nil))))
(defn lazy-lines
"Return a lazy sequence with the lines of the file"
[^String file]
(lazy-helper (io/reader file)))
This works very well when the processing part is filtering or other mapping or reducing operation that works with lazy sequences quite well.
The problem starts when I have process the file and for example send every line over a channel to worker processes.
(thread
(doseq [line lines]
(blocking-producer work-chan line)))
The obvious downside of this is to process the file eagerly causing a heap overflow.
I was wondering what is the best way of iterating over each line in a file and do some IO with the lines.
It seems this might be unrelated how the file IO is handled, doseq should not hold onto the head of the reader.
As #joost-diepenmaat pointed out this might not be related to the file IO and he is right.
It seems the way I am working with JSON serialization and deserialization is the root cause here.
You can use (line-seq rdr) which "returns the lines of text from rdr as a lazy sequence of strings".
This turned out to be a problem with the JSON handling of the code and not the file IO. Explanation in the original post.