Emacs Lisp Regular Expression Match everything until character sequence - regex

I am trying to write a regular expression in emacs lisp that will match multi line comments.
For example:
{-
Some
Comment
Here
-}
Should match as a comment. Basically, anything between {- and -}. I am able to almost do it by doing the following:
"\{\-[^-]*\-\}"
However, this will fail if the comment includes a - not immediately followed by }
So, it will not match correctly in this case:
{-
Some -
Comment -
Here -
-}
Which should be valid.
Basically, I would like to match on everything (including newlines) up to the sequence -}
Thanks in advance!

Doesn't this work for you? {-[^-]*[^}]*-}
(You didn't specify things precisely, so I'm just guessing what you want. Must the {- and -} be at the line beginning? Must they be on lines by themselves? Must there be some other characters between them? Etc. For example, should it match a line like this? {--}?)

Made a toolkit for such cases. It comes with a parser, beg-end.el.
Remains to write a function, which will determine the beginning resp. the end of the object.
In pseudo-code:
(put 'MY-FORM 'beginning-op-at
(lambda () (search-forward "-}")))
(put 'MY-FORM 'end-op-at
(lambda () (search-backward "{-")))
When done, it's should be available, i.e. copied and returned like this
(defun MY-FORM-atpt (&optional arg)
" "
(interactive "p")
(ar-th 'MY-FORM arg))
Get it here:
https://launchpad.net/s-x-emacs-werkstatt/

Related

mit-scheme multiple regex matches in string

I'm using MIT/GNU Scheme 9.2. If I define a string:
(define a-string "00:00 11:11 22:22")
I can match and get the first time a pattern appears:
(re-match-extract a-string
(re-string-match
"..:.." a-string) 0)
;Value 3: "00:00"
That's great, but I want to match the other times "..:.." appears. It seems like there should be some simple way, but am I missing something? Do I need to write a recursive function that matches the first pattern then cuts it off and runs the match on the rest of the string until it's exhausted?
What I would like to end up with is a list that looks like:
("00:00" "11:11" "22:22")
At some point between when 9.2 was current and now (11.2 is the most recent version as of this writing), the regular expression support of MIT-Scheme was overhauled, and the questions' re-XXX functions no longer exist. Instead, it supports some basic matching functions and SRFI-115 regular expressions, both using different variations of a s-expression based syntax instead of traditional stringy REs. So this is really a "How to get a list of all matches using SRFI-115" answer.
The key here is regexp-fold, which invokes a function for each non-overlapping match in a string:
(define (all-matches re str)
(regexp-fold re
(lambda (i match s matches)
(cons (regexp-match-submatch match 0) matches))
'()
str
(lambda (i match s matches) (reverse! matches))))
;;; Returns: ("00:00" "11:11" "22:22")
(all-matches (rx any any #\: any any) "00:00 11:11 22:22")

AND operator for text search in Emacs

I am new to Emacs. I can search for text and show all lines in a separate buffer using "M-x occur". I can also search for multiple text items using OR operator as : one\|two , which will find lines with "one" or "two" (as explained on Emacs occur mode search for multiple strings). How can I search for lines with both "one" and "two"? I tried using \& and \&& but they do not work. Will I need to create a macro or function for this?
Edit:
I tried writing a function for above in Racket (a Scheme derivative). Following works:
#lang racket
(define text '("this is line number one"
"this line contains two keyword"
"this line has both one and two keywords"
"this line contains neither"
"another two & one words line"))
(define (srch . lst) ; takes variable number of arguments
(for ((i lst))
(set! text (filter (λ (x) (string-contains? x i)) text)))
text)
(srch "one" "two")
Ouput:
'("this line has both one and two keywords" "another two & one words line")
But how can I put this in Emacs Lisp?
Regex doesn't support "and" because it has very limited usefulness and weird semantics when you try to use it in any nontrivial regex. The usual fix is to just search for one.*two\|two.*one ... or in the case of *Occur* maybe just search for one and then M-x delete-non-matching-lines two.
(You have to mark the *Occur* buffer as writable before you can do this. read-only-mode is a toggle; the default keybinding is C-x C-q. At least in my Emacs, you have to move the cursor away from the first line or you'll get "Text is read-only".)
(defun occur2 (regex1 regex2)
"Search for lines matching both REGEX1 and REGEX2 by way of `occur'.
We first (occur regex1) and then do (delete-non-matching-lines regex2) in the
*Occur* buffer."
(interactive "sFirst term: \nsSecond term: ")
(occur regex1)
(save-excursion
(other-window 1)
(let ((buffer-read-only nil))
(forward-line 1)
(delete-non-matching-lines regex2))))
The save-excursion and other-window is a bit of a wart but it seemed easier than hardcoding the name of the *Occur* buffer (which won't always be true; you can have several occur buffers) or switching there just to fetch the buffer name, then Doing the Right Thing with set-buffer etc.

Open file in racket and use regex on said file to print matches

I have been trying to use regular expressions in racket on a text file full of random words separated by the end of line character \n. I'm trying to read in the file as a string or list (whichever is easiest and most intuitive) and use regex to print all the words in the file of length 6 that does not contain a certain letter (in this case the letter t). Below you can see how I read in the file but I am not sure how to use its resulting list because of the lack of variables. Also you can see below I try a test with regex that's true outcome is #f when I actually want the words grumpy and foobar returned excluding stumpy.
#lang racket
(require 2htdp/batch-io)
(require racket/match)
;(file->string "words.txt");;reads in a file to a string
;(file->list "words.txt);; reads in a file to a list
(define (listMatches)
(regexp-match #rx"\b[^<t> | ^<T> | ^<\n>]{<6>}\b" "grumpy\nstumpy\nfoobar" )
)
I am very new to Racket and would love some input, useful links, and any other help.
I would not use a regex at all, but rather use for/list, in combination with string-length and string-countains? to solve the problem. The overall solution looks something like this:
(call-with-input-file* "words.txt"
(lambda (f)
(for/list ([i (in-lines f)]
#:when (and (= (string-length i) 6)
(not (string-contains? i "t"))))
i)))
The use of call-with-input-file* takes a procedure, and in this case binds f to an open file. This way we do not need to close the file ourselves when we are done with it.
Finally, string-contains? was added relatively recently to Racket. And if you need to support older versions of Racket, you can use regexp-match to just search for "t", which is much easier.
One of the things Racket regular expressions can take as a value to match a regular expression against is an input port. This means you can look for matches in a file without having to first read from it; the matching code will do that part for you. Combine with using multi-line mode so that ^ and $ match after and before newlines as well as the very beginning and end of the input, and you get a simple approach using regexp-match* and a RE that matches 6 non-t characters on a line by themselves:
#lang racket/base
(require racket/port)
;;; Using a string port to demonstrate
(define input "grumpy\nstumpy\nfoobar")
(define (list-matches inp)
(map bytes->string/utf-8 (regexp-match* #px"(?m:^[^t]{6}$)" inp)))
(println (call-with-input-string input list-matches)) ; '("grumpy" "foobar")
The big thing to remember about using an input port is that what it returns are byte strings; you have to convert them to strings yourself.

Rewriting C macro code with VIM search & replace

I've got a file that uses an outdated macro to read 32 bit integers,
READ32(dest, src)
I need to replace all calls with
dest = readUint32(&src);
I'm trying to write a SED style Vim search & replace command, but not having luck.
I can match the 1st part using READ32([a-z]\+, cmd) using the / search prompt, but it does not seem to match in the :s syntax.
Here's what I finally figured out to work:
:%s/READ32(\(\a\+\),\(\a\+\)/\1 = readUint32(\&\2);
The trick is wrapping the values you want to store in \1 & \2 in \( and \) The other trick was you have to escape the & operator as & in vim replacement is "the whole match".
EDIT: improved further as I refined it:
:%s/READ32(\(\w\+\),\s*\(\w\+\)/\1 = readUint32(\&\2);
Changed \a to \w as I had variables with _ in them.
Added \s* to take care of white space issues between the , and second variable.
Now just trying to deal with c++ style variables of style class.variable.subvariable
EDIT 2:
replaced \w with [a-zA-Z0-9_.] to catch all of the ways my variables were named.
This should do what you want or at least get you started:
%s-READ32(\s*\(\i\+\)\s*,\s*\(\i\+\)\s*)-\1 = readUint32(\&\2);-g
I'd do the macro style again: hit * to 'highlight' search for READ32.
Now, we are going to record a macro (q..qq):
n (move to next match)
cwreadUint32Esc (change the function name)
wwdt, (delete the first argument)
"_dw (remove the redundant ,)
bbPa=Esc (insert the result variable appending = before readUint32)
A; (append ; to the end of the line)
Now you can just repeat the macro (1000#q).

regexp for elisp

In Emacs I would like to write some regexp that does the following:
First, return a list of all dictionary words that can be formed within "hex space". By this I mean:
#000000 - #ffffff
so #00baba would be a word (that can be looked up in the dictionary)
so would #baba00
and #abba00
and #0faded
...where trailing and leading 0's are considered irrelevant. How would I write this? Is my question clear enough?
Second, I would like to generate a list of words that can be made using numbers as letters:
0 = o
1 = i
3 = e
4 = a
...and so on. How would I write this?
First, load your dictionary. I'll assume that you're using /var/share/dict/words, which is nearly always installed by default when you're running Linux. It lists one word per line, which is a very handy format for this sort of thing.
Next run M-x keep-lines. It'll ask you for a regular expression and then delete any line that doesn't match it. Use the regex ^[a-f]\{,6\}$ and it will filter out anything that can't be part of a color.
Specifically, the ^ makes the regex start at the beginning of the line, the [a-f] matches any one character that is between a and f (inclusive), the {,6} lets it match between 0 and 6 instances of the previous item (in this case the character class [a-f] and finally the $ tells it that the next thing must be the end of the line.
This will return a list of all instances of #000000 - #ffffff in the buffer, although this pattern may not be restrictive enough for your purposes.
(let ((my-colour-list nil))
(save-excursion
(goto-char (point-min))
(while (re-search-forward "#[0-9a-fA-F]\\{6\\}" nil t)
(add-to-list 'my-colour-list (match-string-no-properties 0)))
my-colour-list))
I'm not actually certain that this is what you were asking for. What do you mean by "dictionary"?
A form that will return you a hash table with all the elements you specify in it could be this:
(let ((hash-table (make-hash-table :test 'equal)))
(dotimes (i (exp 256 3))
(puthash (concat "#" (format "%06x" i)) t hash-table))
hash-table)
I'm not sure how Emacs will manage that size of elements (16 million). As you don't want the 0, you can generate the space without that format, and removing trailing 0's. I don't know what do you want to do with the rest of the numbers. You can write the function step by step like this then:
(defun find-hex-space ()
(let (return-list)
(dotimes (i (exp 256 3))
(let* ((hex-number (strip-zeros (format "%x" i)))
(found-word (gethash hex-number *dictionary*)))
(if found-word (push found-word return-list))))
return-list))
Function strip-zeros is easy to write, and here I suppose your words are in a hash called *dictionary*. strip-zeros could be something like this:
(defun strip-zeros (string)
(let ((sm (string-match "^0*\\(.*?\\)0*$" string)))
(if sm (match-string 1 string) string)))
I don't quite understand your second question. The words would be also using the hex space? Would you then consider only the words formed by numbers, or would also include the letters in the word?