How can I capture the results of splitting a string in elisp? - regex

I am working in elisp and I have a string that represents a list of items. The string looks like
"apple orange 'tasty things' 'my lunch' zucchini 'my dinner'"
and I'm trying to split it into
("apple" "orange" "tasty things" "my lunch" "zucchini" "my dinner")
This is a familiar problem. My obstacles to solving it are less about the regex, and more about the specifics of elisp.
What I want to do is run a loop like :
(while (< (length my-string) 0) do-work)
where that do-work is:
applying the regex \('[^']*?'\|[[:alnum:]]+)\([[:space:]]*\(.+\) to my-string
appending \1 to my results list
re-binding my-string to \2
However, I can't figure out how to get split-string or replace-regexp-in-string to do that.
How can I split this string into values I can use?
(alternatively: "which built-in emacs function that does this have I not yet found?")

Something similar, but w/o regexp:
(defun parse-quotes (string)
(let ((i 0) result current quotep escapedp word)
(while (< i (length string))
(setq current (aref string i))
(cond
((and (char-equal current ?\ )
(not quotep))
(when word (push word result))
(setq word nil escapedp nil))
((and (char-equal current ?\')
(not escapedp)
(not quotep))
(setq quotep t escapedp nil))
((and (char-equal current ?\')
(not escapedp))
(push word result)
(setq quotep nil word nil escapedp nil))
((char-equal current ?\\)
(when escapedp (push current word))
(setq escapedp (not escapedp)))
(t (setq escapedp nil)
(push current word)))
(incf i))
(when quotep
(error (format "Unbalanced quotes at %d"
(- (length string) (length word)))))
(when word (push result word))
(mapcar (lambda (x) (coerce (reverse x) 'string))
(reverse result))))
(parse-quotes "apple orange 'tasty things' 'my lunch' zucchini 'my dinner'")
("apple" "orange" "tasty things" "my lunch" "zucchini" "my dinner")
(parse-quotes "apple orange 'tasty thing\\'s' 'my lunch' zucchini 'my dinner'")
("apple" "orange" "tasty thing's" "my lunch" "zucchini" "my dinner")
(parse-quotes "apple orange 'tasty things' 'my lunch zucchini 'my dinner'")
;; Debugger entered--Lisp error: (error "Unbalanced quotes at 52")
Bonus: it also allows escaping the quotes with "\" and will report it if the quotes aren't balanced (reached the end of the string, but didn't find the match for the opened quote).

Here is a straightforward way to implement your algorithm using a temporary buffer. I don't know if there would be a way to do this using replace-regexp-in-string or split-string.
(defun my-split (string)
(with-temp-buffer
(insert string " ") ;; insert the string in a temporary buffer
(goto-char (point-min)) ;; go back to the beginning of the buffer
(let ((result nil))
;; search for the regexp (and just return nil if nothing is found)
(while (re-search-forward "\\('[^']*?'\\|[[:alnum:]]+\\)\\([[:space:]]*\\(.+\\)\\)" nil t)
;; (match-string 1) is "\1"
;; append it after the current list
(setq result (append result (list (match-string 1))))
;; go back to the beginning of the second part
(goto-char (match-beginning 2)))
result)))
Example:
(my-split "apple orange 'tasty things' 'my lunch' zucchini 'my dinner'")
==> ("apple" "orange" "'tasty things'" "'my lunch'" "zucchini" "'my dinner'")

You might like to take a look at split-string-and-unquote.

If you manipulate strings often, you should install s.el library via package manager, it introduces a huge load of string utility functions under a constistent API. For this task you need function s-match, its optional 3rd argument accepts starting position. Then, you need a correct regexp, try:
(concat "\\b[a-z]+\\b" "\\|" "'[a-z ]+'")
\| means matching either sequence of letters that constitute a word (\b means a word boundary), or sequence of letters and space inside quotes. Then use loop:
;; let s = given string, r = regex
(loop for start = 0 then (+ start (length match))
for match = (car (s-match r s start))
while match
collect match)
For an educational purpose, i also implemented the same functionality with a recursive function:
;; labels is Common Lisp's local function definition macro
(labels
((i
(start result)
;; s-match searches from start
(let ((match (car (s-match r s start))))
(if match
;; recursive call
(i (+ start (length match))
(cons match result))
;; push/nreverse idiom
(nreverse result)))))
;; recursive helper function
(i 0 '()))
As Emacs lacks tail call optimization, executing it over a big list can cause stack overflow. Therefore you can rewrite it with do macro:
(do* ((start 0)
(match (car (s-match r s start)) (car (s-match r s start)))
(result '()))
((not match) (reverse result))
(push match result)
(incf start (length match)))

Related

elisp function to change string case from camel to upcase snake case

I spent 3 hours trying to figure out to modify a string at point to different cases, e.g. isFailedUpgrade to IS_FAILED_UPGRADE.
I got to the point where i can get the string at point to a var text but has no idea how to update a string text to the desired case.
(defun change-case ()
(interactive)
(let* ((bounds (if (use-region-p)
(cons (region-beginning) (region-end))
(bounds-of-thing-at-point 'symbol)))
(text (buffer-substring-no-properties (car bounds) (cdr bounds))))
(when bounds
(delete-region (car bounds) (cdr bounds))
(insert (change-case-helper text)))))
# the following code is rubbish
(defun change-case-helper (text)
(let ((output ""))
(dotimes (i (length text))
(concat output (char-to-string (aref text i))))
output))
Since i am on the journey to learn a little emacs function myself, i prefer to write this function myself instead of use an existing magical function.
ok after another 2 hours, i think i've figured it out:
(defun change-case ()
(interactive)
(let* ((bounds (if (use-region-p)
(cons (region-beginning) (region-end))
(bounds-of-thing-at-point 'symbol)))
(text (buffer-substring-no-properties (car bounds) (cdr bounds))))
(when bounds
(delete-region (car bounds) (cdr bounds))
(insert (change-case-helper text)))))
(defun change-case-helper (text)
(when (and text (> (length text) 0))
(let ((first-char (string-to-char (substring text 0 1)))
(rest-str (substring text 1)))
(concat (if (upcasep first-char) (string ?_ first-char) (string (upcase first-char)))
(change-case-helper rest-str))))
)
(defun upcasep (c) (and (= ?w (char-syntax c)) (= c (upcase c))))
still feel this is pretty awkward, please comment let me know if there is a better way of writing this function.

How to use regexp in Elisp to match ',' in the line but not inside quotation mark

How could I write a regexp to match , in the line but not inside ""?
For example:
`uvm_info("body", $sformatf("Value: a = %d, b = %d, c = %d", a, b, c), UVM_MEDIUM)
Hope to match those with ^ under it:
`uvm_info("body", $sformatf("Value: a = %d, b = %d, c = %d", a, b, c), UVM_MEDIUM)
^ ^ ^ ^ ^
The following function doesn't use a regular expression but rather parses a region of the buffer as sexps and returns a list of buffer positions of all commas excluding those within strings, or nil if there are no such commas.
(defun find-commas (start end)
(save-excursion
(goto-char start)
(let (matches)
(while (< (point) end)
(cond ((= (char-after) ?,)
(push (point) matches)
(forward-char))
((looking-at "[]\\[{}()]")
(forward-char))
(t
(forward-sexp))))
(nreverse matches))))
It works for the example you show, but might need tweaking for other examples or languages. If your example is in a buffer by itself, calling
(find-commas (point-min) (point-max))
returns
(17 60 63 66 70)
try this
"[^"]+"|(,)
the , in capture group 1
You can use the fact that font-lock first fontifies comments and strings, then applies your font-lock keywords.
The standard solution is to replace your regexp with a function that search for the regexp, and skips any occurrences in comments and strings.
The following is from my package lisp-extra-font-lock (a package that highlights variables bound by let, quoted expressions etc.) It search for quotes and backquotes, but the principle is the same:
(defun lisp-extra-font-lock-is-in-comment-or-string ()
"Return non-nil if point is in comment or string.
This assumes that Font Lock is active and has fontified comments
and strings."
(let ((props (text-properties-at (point)))
(faces '()))
(while props
(let ((pr (pop props))
(value (pop props)))
(if (eq pr 'face)
(setq faces value))))
(unless (listp faces)
(setq faces (list faces)))
(or (memq 'font-lock-comment-face faces)
(memq 'font-lock-string-face faces)
(memq 'font-lock-doc-face faces))))
(defun lisp-extra-font-lock-match-quote-and-backquote (limit)
"Search for quote and backquote in in code.
Set match data 1 if character matched is backquote."
(let (res)
(while (progn (setq res (re-search-forward "\\(?:\\(`\\)\\|'\\)" limit t))
(and res
(or
(lisp-extra-font-lock-is-in-comment-or-string)
;; Don't match ?' and ?`.
(eq (char-before (match-beginning 0)) ??)))))
res))
The font-lock keyword is as follows:
(;; Quote and backquote.
;;
;; Matcher: Set match-data 1 if backquote.
lisp-extra-font-lock-match-quote-and-backquote
(1 lisp-extra-font-lock-backquote-face nil t)
;; ...)

how to do unit test on functions that take active region as input?

For example, here are two versions of function to count the instances of "a" in a region or in a string:
(defun foo (beg end)
(interactive "r")
(let ((count 0))
(save-excursion
(while (/= (point) end)
(if (equal (char-after) ?a)
(setq count (1+ count)))
(forward-char)))
count))
(defun foo1 (str)
(let ((count 0))
(mapcar #'(lambda (x) (if (equal x ?a) (setq count (1+ count))))
str)
count))
This is the test to check the function foo1:
(require 'ert)
(ert-deftest foo-test ()
(should (equal (foo1 "aba") 2)))
but how can I test the function foo that takes a region as input, using ert framework of unit testing?
I would second #sds's suggestion, with the added suggestion to do something like:
(with-temp-buffer
(insert <text>)
(set-mark <markpos>)
(goto-char <otherend>)
(should (equal (call-interactively 'foo) <n>)))
I would do something like this:
(with-temp-buffer
(insert ....) ; prepare the buffer
(should (equal ... (foo (point-min) (point-max)))))

Regexp Emacs for R comments

I would like to build a regexp in Emacs for cleaning up my R code.
One of the problems I ran into was that there are different types of comments:
You have those with a certain amount of whitespace (1), e.g.:
# This is a comment:
# This is also a comment
or you have situations like this (2):
require(lattice) # executable while the comment is informative
The idea is that I want to align the comments when they are of the second kind (after something that's executable), while excluding those of the first kind.
Ideally, it will align all the comments BETWEEN those of the first kind, but not those of the first kind.
Example:
funfun <- function(a, b) {
# This is a function
if (a == b) { # if a equals b
c <- 1 # c is 1
}
}
#
To:
funfun <- function(a, b) {
# This is a function
if (a == b) { # if a equals b
c <- 1 # c is 1
}
}
#
I found a regexp to do a replacement for those of the first kind, so then I was able to align them per paragraph (mark-paragraph). That worked kind of well.
Problem is then the backsubstitution:
(replace-regexp "^\\s-+#+" "bla" nil (point-min) (point-max))
This replaces from the start of a line, with any amount of whitespace and any amount of comment characters like:
#########
into
bla
The problem is that I would like to replace them back into what they are originally, so "bla" has to go back into the same amount of whitespace and same amount of #.
Hopefully someone understands what I am trying to do and has either a better idea for an approach or knows how to solve this regexp part.
Well, here's some crazy attempt at doing something I thought you were after. It seems to work, but it needs a lot of testing and polishing:
(defun has-face-at-point (face &optional position)
(unless position (setq position (point)))
(unless (consp face) (setq face (list face)))
(let ((props (text-properties-at position)))
(loop for (key value) on props by #'cddr
do (when (and (eql key 'face) (member value face))
(return t)))))
(defun face-start (face)
(save-excursion
(while (and (has-face-at-point face) (not (bolp)))
(backward-char))
(- (point) (save-excursion (move-beginning-of-line 1)) (if (bolp) 0 -1))))
(defun beautify-side-comments ()
(interactive)
;; Because this function does a lot of insertion, it would
;; be better to execute it in the temporary buffer, while
;; copying the original text of the file into it, such as
;; to prevent junk in the formatted buffer's history
(let ((pos (cons (save-excursion
(beginning-of-line)
(count-lines (point-min) (point)))
(- (save-excursion (end-of-line) (point)) (point))))
(content (buffer-string))
(comments '(font-lock-comment-face font-lock-comment-delimiter-face)))
(with-temp-buffer
(insert content)
(goto-char (point-min))
;; thingatpt breaks if there are overlays with their own faces
(let* ((commentp (has-face-at-point comments))
(margin
(if commentp (face-start comments) 0))
assumed-margin pre-comment commented-lines)
(while (not (eobp))
(move-end-of-line 1)
(cond
((and (has-face-at-point comments)
commentp) ; this is a comment continued from
; the previous line
(setq assumed-margin (face-start comments)
pre-comment
(buffer-substring-no-properties
(save-excursion (move-beginning-of-line 1))
(save-excursion (beginning-of-line)
(forward-char assumed-margin) (point))))
(if (every
(lambda (c) (or (char-equal c ?\ ) (char-equal c ?\t)))
pre-comment)
;; This is the comment preceded by whitespace
(setq commentp nil margin 0 commented-lines 0)
(if (<= assumed-margin margin)
;; The comment found starts on the left of
;; the margin of the comments found so far
(save-excursion
(beginning-of-line)
(forward-char assumed-margin)
(insert (make-string (- margin assumed-margin) ?\ ))
(incf commented-lines))
;; This could be optimized by going forward and
;; collecting as many comments there are, but
;; it is simpler to return and re-indent comments
;; (assuming there won't be many such cases anyway.
(setq margin assumed-margin)
(move-end-of-line (1- (- commented-lines))))))
((has-face-at-point comments)
;; This is the fresh comment
;; This entire block needs refactoring, it is
;; a repetition of the half the previous blockp
(setq assumed-margin (face-start comments)
pre-comment
(buffer-substring-no-properties
(save-excursion (move-beginning-of-line 1))
(save-excursion (beginning-of-line)
(forward-char assumed-margin) (point))))
(unless (every
(lambda (c)
(or (char-equal c ?\ ) (char-equal c ?\t)))
pre-comment)
(setq commentp t margin assumed-margin commented-lines 0)))
(commentp
;; This is the line directly after a block of comments
(setq commentp nil margin assumed-margin commented-lines 0)))
(unless (eobp) (forward-char)))
;; Retrieve back the formatted contnent
(setq content (buffer-string))))
(erase-buffer)
(insert content)
(beginning-of-buffer)
(forward-line (car pos))
(end-of-line)
(backward-char (cdr pos))))
I've also duplicated it on pastebin for better readability: http://pastebin.com/C2L9PRDM
EDIT: This should restore the mouse position but will not restore the scroll position (could be worked to, perhaps, I'd just need to look for how scrolling is stored).
align-regexp is the awesome bit of emacs magic you need:
(defun align-comments ()
"align R comments depending on whether at start or in the middle."
(interactive)
(align-regexp (point-min) (point-max)
"^\\(\\s-*?\\)\\([^[:space:]]+\\)\\(\\s-+\\)#" 3 1 nil) ;type 2 regex
(align-regexp (point-min) (point-max)
"^\\(\\s-*\\)\\(\\s-*\\)#" 2 0 nil)) ;type 1 regex
before:
# a comment type 1
## another comment type 1
a=1 ###### and a comment type 2 with lots of #####'s
a.much.longer.variable.name=2 # and another, slightly longer type 2 comment
## and a final type 1
after:
# a comment type 1
## another comment type 1
a=1 ###### and a comment type 2 with lots of #####'s
a.much.longer.variable.name=2 # and another, slightly longer type 2 comment
## and a final type 1
Try
(replace-regexp "^\\(\\s-+\\)#" "\\1bla" nil (point-min) (point-max))
then
(replace-regexp "^\\(\\s-+\\)bla+" "\\1#" nil (point-min) (point-max))
but If I understood you well, I would probably do something like :
(align-string "\b\s-#" begin end)

lisp - String to struct or list

I have a problem with common lisp.
I want to pass a string to a function
and want that this strings become a structure.
I can't use external library.
For example with this input:
(testfu "ftp/http.ok:3345")
This is the struct:
(defstruct test-struct
scheme
part
ans
port)
I want this result:
scheme: "ftp" part: "http" ans "ok" port "3345"
How can I do the testfu ?
here my bad try :(
(defun testfu (x)
(setq ur1 (make-test-struct :scheme frist x :host second x)))
I'd recommend using a regex to parse this. Using CL-PPCRE which is the Common Lisp regex library, the code would look like this:
(defun testfu (x)
(multiple-value-bind (result values)
(ppcre:scan-to-strings "^([a-z]+)/([a-z]+)\\.([a-z]+):([0-9]+)$" x)
(unless result
(error "String ~s is not valid" x))
(make-test-struct :scheme (aref values 0)
:part (aref values 1)
:ans (aref values 2)
:port (aref values 3))))
Note that you probably would have to adjust the regex to better represent the actual format of the input string, in particular if any of the fields are optional.
You will have to parse the data out of the string in order that you might use it for your strut. Lisp won't do that magically.
Split Sequence is a good library for doing that
If you don't want a library, then some code to get you on the correct track. This will tokenize string based on a predicate function fn ( which returns true when a character is a delimiter and false otherwise )
(defun split-by-fn (fn string)
(let* ((STARTING 0)
(TOKEN 1)
(DELIM 2)
(state STARTING)
(a-token "")
(the-list '())
(str-length (length string)))
(dotimes (i str-length)
(if (funcall fn (char string i))
(progn
(if (eq state TOKEN)
(progn
(setq the-list (cons a-token the-list))
(setq a-token "")))
(setq state DELIM))
(progn
(setq a-token
(concatenate 'string a-token (string (char string i))))
(setq state TOKEN))))
(if (eq state TOKEN)
(setq the-list (cons a-token the-list)))
(setq the-list (reverse the-list))))
I don't usually write code for people but here is an example parser, it's not the most lisp-y, there are better ways of doing this, but it works.
(defun parser ( string )
(labels ((set-field (state struct token)
(let ((SCHEME 0)
(PART 1)
(ANS 2)
(PORT 3))
(cond ((= state SCHEME)
(setf (example-struct-SCHEME struct) token))
((= state PART)
(setf (example-struct-PART struct) token))
((= state ANS)
(setf (example-struct-ANS struct) token))
((= state PORT)
(setf (example-struct-PORT struct) token))))))
(let ((state 0)
(token "")
(check 0)
(a-list '())
(struct (make-example-struct)))
(loop for char across string do
(progn
(setq check (position char "/.:"))
(if check
(progn
(set-field state struct token)
(setq token "")
(setq state (+ check 1)))
(setq token (concatenate 'string token (string char))))))
(progn
(if (/= 0 (length token))
(set-field state struct token))
struct))))