Regexp Emacs for R comments - regex

I would like to build a regexp in Emacs for cleaning up my R code.
One of the problems I ran into was that there are different types of comments:
You have those with a certain amount of whitespace (1), e.g.:
# This is a comment:
# This is also a comment
or you have situations like this (2):
require(lattice) # executable while the comment is informative
The idea is that I want to align the comments when they are of the second kind (after something that's executable), while excluding those of the first kind.
Ideally, it will align all the comments BETWEEN those of the first kind, but not those of the first kind.
Example:
funfun <- function(a, b) {
# This is a function
if (a == b) { # if a equals b
c <- 1 # c is 1
}
}
#
To:
funfun <- function(a, b) {
# This is a function
if (a == b) { # if a equals b
c <- 1 # c is 1
}
}
#
I found a regexp to do a replacement for those of the first kind, so then I was able to align them per paragraph (mark-paragraph). That worked kind of well.
Problem is then the backsubstitution:
(replace-regexp "^\\s-+#+" "bla" nil (point-min) (point-max))
This replaces from the start of a line, with any amount of whitespace and any amount of comment characters like:
#########
into
bla
The problem is that I would like to replace them back into what they are originally, so "bla" has to go back into the same amount of whitespace and same amount of #.
Hopefully someone understands what I am trying to do and has either a better idea for an approach or knows how to solve this regexp part.

Well, here's some crazy attempt at doing something I thought you were after. It seems to work, but it needs a lot of testing and polishing:
(defun has-face-at-point (face &optional position)
(unless position (setq position (point)))
(unless (consp face) (setq face (list face)))
(let ((props (text-properties-at position)))
(loop for (key value) on props by #'cddr
do (when (and (eql key 'face) (member value face))
(return t)))))
(defun face-start (face)
(save-excursion
(while (and (has-face-at-point face) (not (bolp)))
(backward-char))
(- (point) (save-excursion (move-beginning-of-line 1)) (if (bolp) 0 -1))))
(defun beautify-side-comments ()
(interactive)
;; Because this function does a lot of insertion, it would
;; be better to execute it in the temporary buffer, while
;; copying the original text of the file into it, such as
;; to prevent junk in the formatted buffer's history
(let ((pos (cons (save-excursion
(beginning-of-line)
(count-lines (point-min) (point)))
(- (save-excursion (end-of-line) (point)) (point))))
(content (buffer-string))
(comments '(font-lock-comment-face font-lock-comment-delimiter-face)))
(with-temp-buffer
(insert content)
(goto-char (point-min))
;; thingatpt breaks if there are overlays with their own faces
(let* ((commentp (has-face-at-point comments))
(margin
(if commentp (face-start comments) 0))
assumed-margin pre-comment commented-lines)
(while (not (eobp))
(move-end-of-line 1)
(cond
((and (has-face-at-point comments)
commentp) ; this is a comment continued from
; the previous line
(setq assumed-margin (face-start comments)
pre-comment
(buffer-substring-no-properties
(save-excursion (move-beginning-of-line 1))
(save-excursion (beginning-of-line)
(forward-char assumed-margin) (point))))
(if (every
(lambda (c) (or (char-equal c ?\ ) (char-equal c ?\t)))
pre-comment)
;; This is the comment preceded by whitespace
(setq commentp nil margin 0 commented-lines 0)
(if (<= assumed-margin margin)
;; The comment found starts on the left of
;; the margin of the comments found so far
(save-excursion
(beginning-of-line)
(forward-char assumed-margin)
(insert (make-string (- margin assumed-margin) ?\ ))
(incf commented-lines))
;; This could be optimized by going forward and
;; collecting as many comments there are, but
;; it is simpler to return and re-indent comments
;; (assuming there won't be many such cases anyway.
(setq margin assumed-margin)
(move-end-of-line (1- (- commented-lines))))))
((has-face-at-point comments)
;; This is the fresh comment
;; This entire block needs refactoring, it is
;; a repetition of the half the previous blockp
(setq assumed-margin (face-start comments)
pre-comment
(buffer-substring-no-properties
(save-excursion (move-beginning-of-line 1))
(save-excursion (beginning-of-line)
(forward-char assumed-margin) (point))))
(unless (every
(lambda (c)
(or (char-equal c ?\ ) (char-equal c ?\t)))
pre-comment)
(setq commentp t margin assumed-margin commented-lines 0)))
(commentp
;; This is the line directly after a block of comments
(setq commentp nil margin assumed-margin commented-lines 0)))
(unless (eobp) (forward-char)))
;; Retrieve back the formatted contnent
(setq content (buffer-string))))
(erase-buffer)
(insert content)
(beginning-of-buffer)
(forward-line (car pos))
(end-of-line)
(backward-char (cdr pos))))
I've also duplicated it on pastebin for better readability: http://pastebin.com/C2L9PRDM
EDIT: This should restore the mouse position but will not restore the scroll position (could be worked to, perhaps, I'd just need to look for how scrolling is stored).

align-regexp is the awesome bit of emacs magic you need:
(defun align-comments ()
"align R comments depending on whether at start or in the middle."
(interactive)
(align-regexp (point-min) (point-max)
"^\\(\\s-*?\\)\\([^[:space:]]+\\)\\(\\s-+\\)#" 3 1 nil) ;type 2 regex
(align-regexp (point-min) (point-max)
"^\\(\\s-*\\)\\(\\s-*\\)#" 2 0 nil)) ;type 1 regex
before:
# a comment type 1
## another comment type 1
a=1 ###### and a comment type 2 with lots of #####'s
a.much.longer.variable.name=2 # and another, slightly longer type 2 comment
## and a final type 1
after:
# a comment type 1
## another comment type 1
a=1 ###### and a comment type 2 with lots of #####'s
a.much.longer.variable.name=2 # and another, slightly longer type 2 comment
## and a final type 1

Try
(replace-regexp "^\\(\\s-+\\)#" "\\1bla" nil (point-min) (point-max))
then
(replace-regexp "^\\(\\s-+\\)bla+" "\\1#" nil (point-min) (point-max))
but If I understood you well, I would probably do something like :
(align-string "\b\s-#" begin end)

Related

elisp function to change string case from camel to upcase snake case

I spent 3 hours trying to figure out to modify a string at point to different cases, e.g. isFailedUpgrade to IS_FAILED_UPGRADE.
I got to the point where i can get the string at point to a var text but has no idea how to update a string text to the desired case.
(defun change-case ()
(interactive)
(let* ((bounds (if (use-region-p)
(cons (region-beginning) (region-end))
(bounds-of-thing-at-point 'symbol)))
(text (buffer-substring-no-properties (car bounds) (cdr bounds))))
(when bounds
(delete-region (car bounds) (cdr bounds))
(insert (change-case-helper text)))))
# the following code is rubbish
(defun change-case-helper (text)
(let ((output ""))
(dotimes (i (length text))
(concat output (char-to-string (aref text i))))
output))
Since i am on the journey to learn a little emacs function myself, i prefer to write this function myself instead of use an existing magical function.
ok after another 2 hours, i think i've figured it out:
(defun change-case ()
(interactive)
(let* ((bounds (if (use-region-p)
(cons (region-beginning) (region-end))
(bounds-of-thing-at-point 'symbol)))
(text (buffer-substring-no-properties (car bounds) (cdr bounds))))
(when bounds
(delete-region (car bounds) (cdr bounds))
(insert (change-case-helper text)))))
(defun change-case-helper (text)
(when (and text (> (length text) 0))
(let ((first-char (string-to-char (substring text 0 1)))
(rest-str (substring text 1)))
(concat (if (upcasep first-char) (string ?_ first-char) (string (upcase first-char)))
(change-case-helper rest-str))))
)
(defun upcasep (c) (and (= ?w (char-syntax c)) (= c (upcase c))))
still feel this is pretty awkward, please comment let me know if there is a better way of writing this function.

How to use regexp in Elisp to match ',' in the line but not inside quotation mark

How could I write a regexp to match , in the line but not inside ""?
For example:
`uvm_info("body", $sformatf("Value: a = %d, b = %d, c = %d", a, b, c), UVM_MEDIUM)
Hope to match those with ^ under it:
`uvm_info("body", $sformatf("Value: a = %d, b = %d, c = %d", a, b, c), UVM_MEDIUM)
^ ^ ^ ^ ^
The following function doesn't use a regular expression but rather parses a region of the buffer as sexps and returns a list of buffer positions of all commas excluding those within strings, or nil if there are no such commas.
(defun find-commas (start end)
(save-excursion
(goto-char start)
(let (matches)
(while (< (point) end)
(cond ((= (char-after) ?,)
(push (point) matches)
(forward-char))
((looking-at "[]\\[{}()]")
(forward-char))
(t
(forward-sexp))))
(nreverse matches))))
It works for the example you show, but might need tweaking for other examples or languages. If your example is in a buffer by itself, calling
(find-commas (point-min) (point-max))
returns
(17 60 63 66 70)
try this
"[^"]+"|(,)
the , in capture group 1
You can use the fact that font-lock first fontifies comments and strings, then applies your font-lock keywords.
The standard solution is to replace your regexp with a function that search for the regexp, and skips any occurrences in comments and strings.
The following is from my package lisp-extra-font-lock (a package that highlights variables bound by let, quoted expressions etc.) It search for quotes and backquotes, but the principle is the same:
(defun lisp-extra-font-lock-is-in-comment-or-string ()
"Return non-nil if point is in comment or string.
This assumes that Font Lock is active and has fontified comments
and strings."
(let ((props (text-properties-at (point)))
(faces '()))
(while props
(let ((pr (pop props))
(value (pop props)))
(if (eq pr 'face)
(setq faces value))))
(unless (listp faces)
(setq faces (list faces)))
(or (memq 'font-lock-comment-face faces)
(memq 'font-lock-string-face faces)
(memq 'font-lock-doc-face faces))))
(defun lisp-extra-font-lock-match-quote-and-backquote (limit)
"Search for quote and backquote in in code.
Set match data 1 if character matched is backquote."
(let (res)
(while (progn (setq res (re-search-forward "\\(?:\\(`\\)\\|'\\)" limit t))
(and res
(or
(lisp-extra-font-lock-is-in-comment-or-string)
;; Don't match ?' and ?`.
(eq (char-before (match-beginning 0)) ??)))))
res))
The font-lock keyword is as follows:
(;; Quote and backquote.
;;
;; Matcher: Set match-data 1 if backquote.
lisp-extra-font-lock-match-quote-and-backquote
(1 lisp-extra-font-lock-backquote-face nil t)
;; ...)

how to do unit test on functions that take active region as input?

For example, here are two versions of function to count the instances of "a" in a region or in a string:
(defun foo (beg end)
(interactive "r")
(let ((count 0))
(save-excursion
(while (/= (point) end)
(if (equal (char-after) ?a)
(setq count (1+ count)))
(forward-char)))
count))
(defun foo1 (str)
(let ((count 0))
(mapcar #'(lambda (x) (if (equal x ?a) (setq count (1+ count))))
str)
count))
This is the test to check the function foo1:
(require 'ert)
(ert-deftest foo-test ()
(should (equal (foo1 "aba") 2)))
but how can I test the function foo that takes a region as input, using ert framework of unit testing?
I would second #sds's suggestion, with the added suggestion to do something like:
(with-temp-buffer
(insert <text>)
(set-mark <markpos>)
(goto-char <otherend>)
(should (equal (call-interactively 'foo) <n>)))
I would do something like this:
(with-temp-buffer
(insert ....) ; prepare the buffer
(should (equal ... (foo (point-min) (point-max)))))

Looping over items from a list Emacs

Let's say we have a list like the following:
("These" "Are "Some" "Words"), let us call it listy
How to call a function on each of those items of the list?
Perhaps call a function like:
(defun messager (somelist)
(interactive)
(message somelist)
)
Running the function:
(messager listy)
I would expect in the buffer to see seperate lines for each item of the list.
The part that is not working though, is to loop or something over the items from the list.
Use
(mapc 'messager listy)
or
(dolist (item listy)
(messager item))
Now, I'm going for self-advertising once again :P But in hopes that I will communicate some useful info on the way:
;; Here is what `dolist' expands to:
(dolist (item listy)
(messager item))
(identity
(catch (quote --cl-block-nil--)
(let ((--dolist-tail-- listy) item)
(while --dolist-tail--
(setq item (car --dolist-tail--))
(messager item)
(setq --dolist-tail-- (cdr --dolist-tail--))))))
;; And here is what `i-iterate' expands to:
(++ (for item in listy)
(messager item))
(let* ((--0 listy) item)
(while --0
(setq item (car --0) --0 (cdr --0))
(messager item)))
Some commentary: dolist will create a (catch ...) block whether there or not there is a conditional exit, while i-iterate will try to do that only if such conditional exit was identified. Generally, executing code inside (catch ...) form is a little slower.
Also, dolist will wrap the code into a special "block" (which is basically just the call to identity function. This is, too, a sort of cruft, that is a default, but not always needed.
Now, to your other question about alist, you could use loop macro like so:
(loop for (key . value) in '((a . b) (c . d)) do
(message "key: %s -> value: %s" key value))
;; Which expands to:
(identity
(catch (quote --cl-block-nil--)
(let* ((--cl-var-- (quote ((a . b) (c . d)))) (value nil) (key nil))
(while (consp --cl-var--)
(setq value (car --cl-var--)
key (car (prog1 value (setq value (cdr value)))))
(message "key: %s -> value: %s" key value)
(setq --cl-var-- (cdr --cl-var--))) nil)))
;; Compared to i-iterate
(++ (for (key . value) in '((a . b) (c . d)))
(message "key: %s -> value: %s" key value))
;; Which expands to:
(let* ((--0 (quote ((a . b) (c . d)))) value key)
(while --0
(setq key (caar --0) value (cdar --0) --0 (cdr --0))
(message "key: %s -> value: %s" key value)))
Where, in this particular case using pop was not justified. Likewise the use of (catch ...) block (since there wasn't a conditional exit).
Oh, and the link to the library: http://code.google.com/p/i-iterate/ :)
The benefits and downsides of using mapc for this purpose: The high-order functions combine well with already existing functions. So, if you already had one you wanted to apply to every element - that would be, probably, the best way to solve your problem. However, if you are going to create a function only to use it with a high-order function - then it rarely pays off, as you will create a "redundant" instance, which you could've otherwise avoided. It is not always the case, and some times, especially when used with macros, this can be a powerful tool, but as is your case, the iteration seems to be better suited.

How can I capture the results of splitting a string in elisp?

I am working in elisp and I have a string that represents a list of items. The string looks like
"apple orange 'tasty things' 'my lunch' zucchini 'my dinner'"
and I'm trying to split it into
("apple" "orange" "tasty things" "my lunch" "zucchini" "my dinner")
This is a familiar problem. My obstacles to solving it are less about the regex, and more about the specifics of elisp.
What I want to do is run a loop like :
(while (< (length my-string) 0) do-work)
where that do-work is:
applying the regex \('[^']*?'\|[[:alnum:]]+)\([[:space:]]*\(.+\) to my-string
appending \1 to my results list
re-binding my-string to \2
However, I can't figure out how to get split-string or replace-regexp-in-string to do that.
How can I split this string into values I can use?
(alternatively: "which built-in emacs function that does this have I not yet found?")
Something similar, but w/o regexp:
(defun parse-quotes (string)
(let ((i 0) result current quotep escapedp word)
(while (< i (length string))
(setq current (aref string i))
(cond
((and (char-equal current ?\ )
(not quotep))
(when word (push word result))
(setq word nil escapedp nil))
((and (char-equal current ?\')
(not escapedp)
(not quotep))
(setq quotep t escapedp nil))
((and (char-equal current ?\')
(not escapedp))
(push word result)
(setq quotep nil word nil escapedp nil))
((char-equal current ?\\)
(when escapedp (push current word))
(setq escapedp (not escapedp)))
(t (setq escapedp nil)
(push current word)))
(incf i))
(when quotep
(error (format "Unbalanced quotes at %d"
(- (length string) (length word)))))
(when word (push result word))
(mapcar (lambda (x) (coerce (reverse x) 'string))
(reverse result))))
(parse-quotes "apple orange 'tasty things' 'my lunch' zucchini 'my dinner'")
("apple" "orange" "tasty things" "my lunch" "zucchini" "my dinner")
(parse-quotes "apple orange 'tasty thing\\'s' 'my lunch' zucchini 'my dinner'")
("apple" "orange" "tasty thing's" "my lunch" "zucchini" "my dinner")
(parse-quotes "apple orange 'tasty things' 'my lunch zucchini 'my dinner'")
;; Debugger entered--Lisp error: (error "Unbalanced quotes at 52")
Bonus: it also allows escaping the quotes with "\" and will report it if the quotes aren't balanced (reached the end of the string, but didn't find the match for the opened quote).
Here is a straightforward way to implement your algorithm using a temporary buffer. I don't know if there would be a way to do this using replace-regexp-in-string or split-string.
(defun my-split (string)
(with-temp-buffer
(insert string " ") ;; insert the string in a temporary buffer
(goto-char (point-min)) ;; go back to the beginning of the buffer
(let ((result nil))
;; search for the regexp (and just return nil if nothing is found)
(while (re-search-forward "\\('[^']*?'\\|[[:alnum:]]+\\)\\([[:space:]]*\\(.+\\)\\)" nil t)
;; (match-string 1) is "\1"
;; append it after the current list
(setq result (append result (list (match-string 1))))
;; go back to the beginning of the second part
(goto-char (match-beginning 2)))
result)))
Example:
(my-split "apple orange 'tasty things' 'my lunch' zucchini 'my dinner'")
==> ("apple" "orange" "'tasty things'" "'my lunch'" "zucchini" "'my dinner'")
You might like to take a look at split-string-and-unquote.
If you manipulate strings often, you should install s.el library via package manager, it introduces a huge load of string utility functions under a constistent API. For this task you need function s-match, its optional 3rd argument accepts starting position. Then, you need a correct regexp, try:
(concat "\\b[a-z]+\\b" "\\|" "'[a-z ]+'")
\| means matching either sequence of letters that constitute a word (\b means a word boundary), or sequence of letters and space inside quotes. Then use loop:
;; let s = given string, r = regex
(loop for start = 0 then (+ start (length match))
for match = (car (s-match r s start))
while match
collect match)
For an educational purpose, i also implemented the same functionality with a recursive function:
;; labels is Common Lisp's local function definition macro
(labels
((i
(start result)
;; s-match searches from start
(let ((match (car (s-match r s start))))
(if match
;; recursive call
(i (+ start (length match))
(cons match result))
;; push/nreverse idiom
(nreverse result)))))
;; recursive helper function
(i 0 '()))
As Emacs lacks tail call optimization, executing it over a big list can cause stack overflow. Therefore you can rewrite it with do macro:
(do* ((start 0)
(match (car (s-match r s start)) (car (s-match r s start)))
(result '()))
((not match) (reverse result))
(push match result)
(incf start (length match)))