Read a file into a list of pairs in elisp

Read a file into a list of pairs in elisp - list

I am trying to write an elisp function to read each word in a file into a pair. I want the first item of the pair to be the string sorted lexicographically, and the second item to be untouched.
Given the example file:
cat
cow
dog
I want the list to look like:
(act cat)
(cow cow)
(dgo dog)
My best crack at it is:
(defun get-file (filename)
(with-open-file (stream filename)
(loop for word = (read-line stream nil)
while word
collect ((sort word #'char-lessp) word))))
It compiles correctly in Emacs lisp interaction mode. However, when I try to
run it by executing
(get-file "~/test.txt")
I end up in the Emacs debugger, and it's not telling me anything useful . . .
Debugger entered--Lisp error: (void-function get-file)
(get-file "~/test.txt")
eval((get-file "~/test.txt") nil)
eval-last-sexp-1(t)
eval-last-sexp(t)
eval-print-last-sexp(nil)
call-interactively(eval-print-last-sexp nil nil)
command-execute(eval-print-last-sexp)
I am a lisp beginner, and have no idea what is wrong.
Thanks,
Justin

Vanilla Emacs
First, let's use Emacs's built-in functions only. There's no built-in function to sort strings in Emacs, so you first should convert a string to a list, sort, then convert the sorted list back to a string. This is how you convert a string to a list:
(append "cat" nil) ; => (99 97 116)
A string converted to a list becomes a list of characters, and characters are represented as numbers in Elisp. Then you sort the list and convert it to a string:
(concat (sort (append "cat" nil) '<)) ; => "act"
There's no built-in function to load file contents directly into a variable, but you can load them into a temporary buffer. Then you can return the entire temporary buffer as a string:
(with-temp-buffer
(insert-file-contents-literally "file.txt")
(buffer-substring-no-properties (point-min) (point-max))
This will return the string "cat\ncow\ndog\n", so you'll need to split it:
(split-string "cat\ncow\ndog\n") ; => ("cat" "cow" "dog")
Now you need to traverse this list and convert each item into a pair of sorted item and original item:
(mapcar (lambda (animal)
(list (concat (sort (append animal nil) '<)) animal))
'("cat" "cow" "dog"))
;; returns
;; (("act" "cat")
;; ("cow" "cow")
;; ("dgo" "dog"))
Full code:
(mapcar
(lambda (animal)
(list (concat (sort (append animal nil) '<)) animal))
(split-string
(with-temp-buffer
(insert-file-contents-literally "file.txt")
(buffer-substring-no-properties (point-min) (point-max)))))
Common Lisp Emulation
One of the Emacs built-in packages is cl.el, and there's no reason not to use it in your code. Therefore I lied, when I said there is no built-in functions to sort strings and the above is the only way to do the task using built-in functions. So let's use cl.el.
cl-sort a string (or any sequence):
(cl-sort "cat" '<) ; => "act"
cl-mapcar is more versatile than Emacs's built-in mapcar, but here you can use either of them.
There is a problem with cl-sort, it is destructive, meaning it modifies the argument in-place. We use local variable animal inside the anonymous function twice, and we don't want to garble the original animal. Therefore we should pass a copy of a sequence into it:
(lambda (animal)
(list (cl-sort (copy-sequence animal) '<) animal))
The resulting code becomes:
(cl-mapcar
(lambda (animal)
(list (cl-sort (copy-sequence animal) '<) animal))
(split-string
(with-temp-buffer
(insert-file-contents-literally "file.txt")
(buffer-substring-no-properties (point-min) (point-max)))))
seq.el
In Emacs 25 a new sequence manipulation library was added, seq.el. Alternative to mapcar is seq-map, alternative to CL's cl-sort is seq-sort. The full code becomes:
(seq-map
(lambda (animal)
(list (seq-sort animal '<) animal))
(split-string
(with-temp-buffer
(insert-file-contents-literally "file.txt")
(buffer-substring-no-properties (point-min) (point-max)))))
dash, s, f
Usually the best solution to work with sequences and files is to reach directly for these 3 third-party libraries:
dash for list manipulation
s for string manipulation
f for file manipulation.
Their Github pages explain how to install them (installation is very simple). However for this particular problem they are a bit suboptimal. For example, -sort from dash only sorts lists, so we would have to get back to our string->list->string conversion:
(concat (-sort '< (append "cat" nil))) ; => "act"
s-lines from s leaves empty strings in files. On GNU/Linux text files usually end with newline at the end, so splitting your file would look like:
(s-lines "cat\ncow\ndog\n") ; => ("cat" "cow" "dog" "")
s-split supports an optional argument to omit empty lines, but it's separator argument is a regex (note that you need both \n and \r for portability):
(s-split "[\n\r]" "cat\ncow\ndog\n" t) ; => ("cat" "cow" "dog")
Yet there are 2 functions which can simplify our code. -map is similar to mapcar:
(-map
(lambda (animal)
(list (cl-sort (copy-sequence animal) '<) animal))
'("cat" "cow" "dog"))
;; return
;; (("act" "cat")
;; ("cow" "cow")
;; ("dgo" "dog"))
However in dash there are anaphoric versions of functions that accept a function as an argument, such as -map. Anaphoric versions allow to use shorter syntax by exposing local variable as it and start with 2 dashes. E.g. the below are equivalent:
(-map (lambda (x) (+ x 1)) (1 2 3)) ; => (2 3 4)
(--map (+ it 1) (1 2 3)) ; => (2 3 4)
Another improvement is f-read-text from f, which simply returns contents of a file as a string:
(f-read-text "file.txt") ; => "cat\ncow\ndog\n"
Combine best of all worlds
(--map (list (cl-sort (copy-sequence it) '<) it)
(split-string (f-read-text "file.txt")))

On my emacs, either C-j or C-x C-e evaluates the form as you said. When I try to do the same with (get-file "test") the debugger complains about with-open-file being undefined. I cannot find with-open-file in cl-lib (or cl) emacs packages.
Did you require some other package? Also, I think the idiomatic way of opening file in Emacs is to temporary visit them in buffers.
Anyway, if the code was Common Lisp it would be ok except for collect ((sort ...) word), where you are not building a list but using (sort ...) in a function position. I'd use (list (sort ...) word) instead.

Related

how to get a list out of list of lists in common lisp

I am new to common lisp and trying to get a list out of a splitted string.
For example:
["4-No 16dia","6-No 20dia"]
Now I want to collect only the third element like ["16","20"]
I have got the splitting part correctly using:
(defun my-split (string &key (delimiterp #'delimiterp)
)
(loop :for beg = (position-if-not delimiterp string)
:then (position-if-not delimiterp string :start (1+ end)
)
:for end = (and beg (position-if delimiterp string :start beg))
:when beg :collect (subseq string beg end)
:while end))
where :
(defun delimiterp (c) (position c " ,-:"))
but collecting only the third element into a list is the tricky part , I have tried:
(defparameter *list1*
(loop for i in (cdr list)
(append (parse-integer
(nth 0
(my-split (nth 3 i)
:delimiterp #'delimiterp))))))
P.S: there are two list cz the example string is itself part of a list-of lists
Please help me, thanks in advance

I would use a regular expression, and I think I would do this largely irrespective of the language that I was using. Of course, some languages don't have regular expressions, but when they do it saves reinventing the wheel.
In Common Lisp, the regular expressions library is called Common Lisp - Practical Perl Compatible Regular Expressions, cl-ppcre. We load this with (ql:quickload "cl-ppcre").
Then the numbers can be returned using (ppcre:scan-to-strings "^(\\d*)-No (\\d*)dia$" x). The regular expression uses \d to pick out a digit, which in Lisp strings is written \\d. The asterisk says return zero or more digits. The parentheses in the regular expression is the bits that we are going to return, the numbers.
Doing this for a list of string is then just using mapcar.
(defparameter text-match "")
(defparameter text-numbers "")
(defparameter test-text '("4-No 16dia" "6-No 23dia"))
(defun extract-numbers (text)
(setf (values text-match text-numbers)
(ppcre:scan-to-strings "^(\\d*)-No (\\d*)dia$" text))
text-numbers)
(defun extract-numbers-from-list (lst)
(mapcar #'extract-numbers lst))
(extract-numbers-from-list test-text) ; => (#("4" "16") #("6" "23"))
Edit: lexical bindings
When I was writing the above, I was trying to get the regular expression right AND trying to get the lexical bindings right at the same time. Having only limited time I put the effort into getting the regular expressions right, and used dynamic variables and setf. OK, it got the job done, but we can do better.
The classical lexical binding system is let, syntax (let ( (var1 val1) (var2 val2) ...) body). We can try (let ((x 0))), which is valid Lisp code, but which doesn't do much. As soon as the lexical scope ends, the variable x is unbound. Attempting to access x causes an error.
We can return multiple values from many functions, such as floor or scan-to-string. We now have to bind these values to variables, using (multiple-value-bind (variable-list) values). Most websites don't really do a good job of explaining this. Having bound the variables, I was getting errors about unbound variables. OK, it's worth just saying -
multiple-value-bind binds variables lexically, just like let.
The full syntax is (multiple-value-bind (variable-list) values body) and your code goes into the body section, just like let. Hence the above code becomes:
(defparameter test-text '("4-No 16dia" "6-No 23dia"))
(defun extract-numbers (text)
(multiple-value-bind (text-match text-numbers)
(ppcre:scan-to-strings "^(\\d*)-No (\\d*)dia$" text)
text-numbers))
(defun extract-numbers-from-list (lst)
(mapcar #'extract-numbers lst))
(extract-numbers-from-list test-text) ; => (#("4" "16") #("6" "23"))

Just to add, cl-ppcre also has register-groups-bind to do regex matching, binding, and converting in a single form:
CL-USER> (cl-ppcre:register-groups-bind ((#'parse-integer no dia))
("(\\d+)-No (\\d+)dia" "4-No 16dia")
(values no dia))
4
16

Without dependencies, one could use:
(defun extract-nums (s)
(mapcar #'(lambda (x) (parse-integer x :junk-allowed t))
(ql-util:split-spaces s)))
And try it with:
(defparameter *s* (list "4-No 16dia" "6-No 20dia"))
(mapcar #'extract-nums *s*)
;; => ((4 16) (6 20))
parse-integer with the setting junk-allowed-p t helps with extracting integer numbers from the string a lot.
But yes, in real-life I would also just use cl-ppcre, e.g.
Mainly the functions cl-ppcre:split and cl-ppcre:scan-to-strings.
(ql:quickload :cl-ppcre)
(defun extract-nums (s)
(mapcar #'parse-integer (cl-ppcre:scan-to-strings "(\\d+)-No (\\d+)dia" s))
And from then on it is just
(second (map #'list (mapcar #'extract-nums *s*))
;; => (16 20)

Beginner in clojure: Tokenizing lists of different characters

So I know this isn't the best method of solving this issue, but I'm trying to go through a list of lines from an input file, which end up being expressions. I've got a list of expressions, and each expression has it's own list thanks to the split-the-list function. My next step is to replace characters with id, ints with int, and + or - with addop. I've got the regexes to find whether or not my symbols match any of those, but when I try and replace them, I can only get the last for loop I call to leave any lasting results. I know what it stems down to is the way functional programming works, but I can't wrap my head around the trace of this program, and how to replace each separate type of input and keep the results all in one list.
(def reint #"\d++")
(def reid #"[a-zA-Z]+")
(def readdop #"\+|\-")
(def lines (into () (into () (clojure.string/split-lines (slurp "input.txt")) )))
(defn split-the-line [line] (clojure.string/split line #" " ))
(defn split-the-list [] (for [x (into [] lines)] (split-the-line x)))
(defn tokenize-the-line [line]
(for [x line] (clojure.string/replace x reid "id"))
(for [x line] (clojure.string/replace x reint "int"))
(for [x line] (clojure.string/replace x readdop "addop")))
(defn tokenize-the-list [] (for [x (into [] (split-the-list) )] (tokenize-the-line x)))
And as you can probably tell, I'm pretty new to functional programming, so any advice is welcome!

You're using a do block, which evaluates several expressions (normally for side effects) and then returns the last one. You can't see it because fn (and hence defn) implicitly contain one. As such, the lines
(for [x line] (clojure.string/replace x reid "id"))
(for [x line] (clojure.string/replace x reint "int"))
are evaluated (into two different lazy sequences) and then thrown away.
In order for them to affect the return value, you have to capture their return values and use them in the next round of replacements.
In this case, I think the most natural way to compose your replacements is the threading macro ->:
(for [x line]
(-> x
(clojure.string/replace reid "id")
(clojure.string/replace reint "int")
(clojure.string/replace readdop "addop")))
This creates code which does the reid replace with x as the first argument, then does the reint replace with the result of that as the first argument and so on.
Alternatively you could do this by using comp to compose anonymous functions like (fn [s] (clojure.string/replace s reid "id") (partial application of replace). In the imperative world we get pretty used to running several procedures that "bash the data in place" - in the functional world you more often combine several functions together to do all the operations and then run the result.

emacs org mode, search only headers

In emacs, I want to be able to search only the 'headers' in an org mode file.
Idea 1: Search only Visible
I could achieve this by hiding everything, then showing only the outline (S-TAB, S-TAB) and then maybe search all that is visible.(in this case it would be the whole table of content).
But how do I search only visible content? C-s searches everything.
Idea 2: use regex
I can potentially do:
C-c / / //opens regex search
\*.*heading //start with * (escaped), followed by any chars, then heading.
But at the moment it's cumbersome to type all of that. Considering I've started learning emacs like 3 hours ago, can I automate this somehow?
E.g, can I write a function to search with "*.*ARGUMENT" and tie it a hotkey? but still have the ability to go like 'next find, next find' etc..?
The use case for this is searching my notes. Some are like ~7000+ lines long and I commonly only search the headers.
[EDIT Solution 1]
#abo-abo's answer worked well for me. I now use helm-org-in-buffer-headings
I.e, I installed Melpa:
https://github.com/milkypostman/melpa#usage
Then I installed helm from the package list:
M-x package-list-packages
Then I edited my .emacs and bound a hotkey to it:
(global-set-key (kbd "C-=") 'helm-org-in-buffer-headings) ;Outline search.
I reloaded emacs and now when pressing Ctrl+= a searchable outline pops up that automatically narrows down as I type in additional characters. The usual C-n, C-p , buttons work for navigation.
Thanks!
[Edit Solution 2]
Curiosity got the best of me. After enjoying helm's heading search, I messed around with worf also. It is like helm (it uses helm) but looks nicer and I can select a 'level' of outline by pressing the number key. I hacked out just the bits necessary for heading search, if of use:
;; ——— WORF Utilities ———————————————————————————————————————————————————————————————
;; https://github.com/abo-abo/worf/blob/master/worf.el
(defun worf--pretty-heading (str lvl)
"Prettify heading STR or level LVL."
(setq str (or str ""))
(setq str (propertize str 'face (nth (1- lvl) org-level-faces)))
(let (desc)
(while (and (string-match org-bracket-link-regexp str)
(stringp (setq desc (match-string 3 str))))
(setq str (replace-match
(propertize desc 'face 'org-link)
nil nil str)))
str))
(defun worf--pattern-transformer (x)
"Transform X to make 1-9 select the heading level in `worf-goto'."
(if (string-match "^[1-9]" x)
(setq x (format "^%s" x))
x))
(defun worf-goto ()
"Jump to a heading with `helm'."
(interactive)
(require 'helm-match-plugin)
(let ((candidates
(org-map-entries
(lambda ()
(let ((comp (org-heading-components))
(h (org-get-heading)))
(cons (format "%d%s%s" (car comp)
(make-string (1+ (* 2 (1- (car comp)))) ?\ )
(if (get-text-property 0 'fontified h)
h
(worf--pretty-heading (nth 4 comp) (car comp))))
(point))))))
helm-update-blacklist-regexps
helm-candidate-number-limit)
(helm :sources
`((name . "Headings")
(candidates . ,candidates)
(action . (lambda (x) (goto-char x)
(call-interactively 'show-branches)
(worf-more)))
(pattern-transformer . worf--pattern-transformer)))))
And then tied it to a hot key:
(global-set-key (kbd "<f3>") 'worf-goto)

worf-goto from worf can do this,
so can helm-org-in-buffer-headings from helm.
worf-goto actually uses helm as a back end. In addition to helm-org-in-buffer-headings, you get:
headings are colored in the same way as in the original buffer
you can select all headings with the same level using the appropriate digit

If you have ivy installed, you can use counsel-org-goto to search headings in the current buffer or counsel-org-goto-all to search the headings in all open org-mode buffers.
It's a good option if you don't want to install the other things that come with worf.

If you don't want to rely on external packages, org, in fact, already offers this capability: the function is org-goto.
If you want it to behave in a way similar to helm-org-in-buffer-headings, you have to set org-goto-interface to outline-path-completion, for instance by adding to your init file:
(setq org-goto-interface (quote outline-path-completion))

Scheme filters - "wrong value to apply: #f"

I'm trying to filter out a list based off of a predicate I wrote myself, but when I run the filter, I get
ERROR: Wrong value to apply: #f
The code of the predicate:
;;;Predicate for checking if a string is not empty or full of whitespaces
(define (notwhitespace? str)
(if (equal? str "") #F (
(call-with-current-continuation
(lambda (return)
(for-each
(lambda (c)
(if (not (char-whitespace? c)) #T #F))
(string->list str))
#F))
)
)
)
this is my implementation of the filter (it is in a let statement):
(updated-strlist(filter notwhitespace? strlist))
any ideas? thanks!

So (call-with-current-continuation ...) in your code is wrappen in extra parentheses which means that Scheme should take the result and run it as a procedure the moment it gets it.
Usually in a LISP evaluator apply is the procedure that runs procedures. eg.
(define (test) (display "hello"))
(define (get-proc) test)
((get-proc)) ; ==> undefined, displays "hello"
You code however tries to do this (#f) and since #f is not a procedure apply cannot run it as if it were one.
A comment on the rest there. If you are not using return you really shouldn't use call-with-current-continuation at all and for-each does sonething entirely different than you think. nowhitespace? will always evaluate to #f when you've fixed your problems because the last expression in the body of the continuation lambda is #f (the returned value).
I guess you are looking for something like:
;; needs to import (srfi :1)
(define (notwhitespace? str)
(every (lambda (x) (not (char-whitespace? x)))
(list->string str)))
;; needs to import (srfi :13)
(define (notwhitespace2? str)
(not (string-index str char-whitespace?)))

Don't write (#f), it should be #f.

In Clojure is an empty list a sequence of infinite nulls?

I am learning the concept of sequence and nil in Clojure. This was the result of a small experimentation.
1:6 user=> (first '())
nil
1:7 user=> (rest '())
()
1:8 user=> (first (rest '()))
nil
Does this mean that '() is actually a sequence of nils?

If you want to test whether the "rest" of a collection is empty, use next.
user> (next '(foo bar))
(bar)
user> (next '())
nil
user> (doc next)
-------------------------
clojure.core/next
([coll])
Returns a seq of the items after the first. Calls seq on its
argument. If there are no more items, returns nil.
"nil-punning" (treating an empty collection/seq and nil as the same thing) was removed last year in favor of fully-lazy sequences. See here for a discussion leading up to this change.

first and rest are functions that apply to a logical structure (a seq) and not on the linked cons structure of a list (as in other lisps).
Clojure defines many algorithms in terms of sequences (seqs). A seq is a logical list, and unlike most Lisps where the list is represented by a concrete, 2-slot structure, Clojure uses the ISeq interface to allow many data structures to provide access to their elements as sequences.
http://clojure.org/sequences
The behavior is a result of the definition of the function and not determined by the primitive structure of the data.

No - an empty list is not the same as an infinite sequence of nils
This is relatively easy to show. Suppose we have:
(def infinite-nils (repeat nil)) ; an infinite lazy sequence of nils
(def empty-list '()) ; an empty list
They have different numbers of elements:
(count infinite-nils) => doesn't terminate
(count empty-list) => 0
Taking from them:
(take 10 infinite-nils) => (nil nil nil nil nil nil nil nil nil nil)
(take 10 empty-list) => ()
If you call seq on them you get
(seq inifinite-nils) => sequence of infinite nils
(seq empty-list) => nil
The confusion in the original can largely be resolved by understanding the following facts:
'() is a collection (a persistent list), not a sequence. However it is sequential, so you can call seq on it to convert it into sequence.
nil is the empty sequence - so therefore (seq '()) returns nil, as does (seq (rest '()))
first returns nil on an empty sequence - hence why (first (rest '())) is nil.

Also learning Clojure.
For empty sequences, rest returns a sequence for which seq returns nil.
That's why you get that behavior.
I assume this is to simplify recursing on sequences until they are empty, and probably other smartypants reasons...

Technically yes though not in a useful way.
"Does the sequence that is created by calling (seq '()) have an infinite number of nulls?"
the answer is yes becase the (rest) of an empty sequence is sill an empty sequence which it's self can have a (rest)
This output is misleading by the way:
1:7 user=> (rest '())
()
the first '() in this is the empty list.
the secong () in this is the empty sequence.
sequences are printed the same as lists in the repl even though they are not the same.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Read a file into a list of pairs in elisp - list

Related

how to get a list out of list of lists in common lisp

Beginner in clojure: Tokenizing lists of different characters

emacs org mode, search only headers

Scheme filters - "wrong value to apply: #f"

In Clojure is an empty list a sequence of infinite nulls?

Categories

Resources