mit-scheme multiple regex matches in string - regex

I'm using MIT/GNU Scheme 9.2. If I define a string:
(define a-string "00:00 11:11 22:22")
I can match and get the first time a pattern appears:
(re-match-extract a-string
(re-string-match
"..:.." a-string) 0)
;Value 3: "00:00"
That's great, but I want to match the other times "..:.." appears. It seems like there should be some simple way, but am I missing something? Do I need to write a recursive function that matches the first pattern then cuts it off and runs the match on the rest of the string until it's exhausted?
What I would like to end up with is a list that looks like:
("00:00" "11:11" "22:22")

At some point between when 9.2 was current and now (11.2 is the most recent version as of this writing), the regular expression support of MIT-Scheme was overhauled, and the questions' re-XXX functions no longer exist. Instead, it supports some basic matching functions and SRFI-115 regular expressions, both using different variations of a s-expression based syntax instead of traditional stringy REs. So this is really a "How to get a list of all matches using SRFI-115" answer.
The key here is regexp-fold, which invokes a function for each non-overlapping match in a string:
(define (all-matches re str)
(regexp-fold re
(lambda (i match s matches)
(cons (regexp-match-submatch match 0) matches))
'()
str
(lambda (i match s matches) (reverse! matches))))
;;; Returns: ("00:00" "11:11" "22:22")
(all-matches (rx any any #\: any any) "00:00 11:11 22:22")

Related

(Clojure) Count how many times any character appear in a string

I am trying to write a function (char-count) which takes a pattern and a string, then returns a number (count) which represents how many times any of the characters in the pattern appear in the string.
For example:
(char-count "Bb" "Best buy")
would return 2 since there is 1 match for B and 1 match for b, so added together we get 2
(char-count "AaR" "A Tale of Recursion")
would return 3 and so on
I tried using re-seq in my function, but it seems to work only for continuous strings. As in (re-seq #Bb "Best Buy) only looks for the pattern Bb, not for each individual character.
This is what my function looks like so far:
(defn char-count [pattern text]
(count (re-seq (#(pattern)) text)))
But it does not do what I want. Can anybody help?
P.s. Very new to clojure (and functional programming in general).
You don't need anything nearly as powerful as a regular expression here, so just use the simple tools your programming language comes with: sets and functions. Build a set of the characters you want to find, and count how many characters from the input string are in the set.
(defn char-count [chars s]
(count (filter (set chars) s)))
Try wrapping the characters in [...] within the RegEx:
(count (re-seq #"[Bb]" "Best buy"))
Or, since you need that pattern to be dynamic:
(count (re-seq (re-pattern (str "[" pattern "]")) text))
But note that the solution might not work properly if the pattern contains special RegEx characters such as [, ], \, -, ^ - you'd have to escape them by prepending \\ in front of each one.

AND operator for text search in Emacs

I am new to Emacs. I can search for text and show all lines in a separate buffer using "M-x occur". I can also search for multiple text items using OR operator as : one\|two , which will find lines with "one" or "two" (as explained on Emacs occur mode search for multiple strings). How can I search for lines with both "one" and "two"? I tried using \& and \&& but they do not work. Will I need to create a macro or function for this?
Edit:
I tried writing a function for above in Racket (a Scheme derivative). Following works:
#lang racket
(define text '("this is line number one"
"this line contains two keyword"
"this line has both one and two keywords"
"this line contains neither"
"another two & one words line"))
(define (srch . lst) ; takes variable number of arguments
(for ((i lst))
(set! text (filter (λ (x) (string-contains? x i)) text)))
text)
(srch "one" "two")
Ouput:
'("this line has both one and two keywords" "another two & one words line")
But how can I put this in Emacs Lisp?
Regex doesn't support "and" because it has very limited usefulness and weird semantics when you try to use it in any nontrivial regex. The usual fix is to just search for one.*two\|two.*one ... or in the case of *Occur* maybe just search for one and then M-x delete-non-matching-lines two.
(You have to mark the *Occur* buffer as writable before you can do this. read-only-mode is a toggle; the default keybinding is C-x C-q. At least in my Emacs, you have to move the cursor away from the first line or you'll get "Text is read-only".)
(defun occur2 (regex1 regex2)
"Search for lines matching both REGEX1 and REGEX2 by way of `occur'.
We first (occur regex1) and then do (delete-non-matching-lines regex2) in the
*Occur* buffer."
(interactive "sFirst term: \nsSecond term: ")
(occur regex1)
(save-excursion
(other-window 1)
(let ((buffer-read-only nil))
(forward-line 1)
(delete-non-matching-lines regex2))))
The save-excursion and other-window is a bit of a wart but it seemed easier than hardcoding the name of the *Occur* buffer (which won't always be true; you can have several occur buffers) or switching there just to fetch the buffer name, then Doing the Right Thing with set-buffer etc.

Open file in racket and use regex on said file to print matches

I have been trying to use regular expressions in racket on a text file full of random words separated by the end of line character \n. I'm trying to read in the file as a string or list (whichever is easiest and most intuitive) and use regex to print all the words in the file of length 6 that does not contain a certain letter (in this case the letter t). Below you can see how I read in the file but I am not sure how to use its resulting list because of the lack of variables. Also you can see below I try a test with regex that's true outcome is #f when I actually want the words grumpy and foobar returned excluding stumpy.
#lang racket
(require 2htdp/batch-io)
(require racket/match)
;(file->string "words.txt");;reads in a file to a string
;(file->list "words.txt);; reads in a file to a list
(define (listMatches)
(regexp-match #rx"\b[^<t> | ^<T> | ^<\n>]{<6>}\b" "grumpy\nstumpy\nfoobar" )
)
I am very new to Racket and would love some input, useful links, and any other help.
I would not use a regex at all, but rather use for/list, in combination with string-length and string-countains? to solve the problem. The overall solution looks something like this:
(call-with-input-file* "words.txt"
(lambda (f)
(for/list ([i (in-lines f)]
#:when (and (= (string-length i) 6)
(not (string-contains? i "t"))))
i)))
The use of call-with-input-file* takes a procedure, and in this case binds f to an open file. This way we do not need to close the file ourselves when we are done with it.
Finally, string-contains? was added relatively recently to Racket. And if you need to support older versions of Racket, you can use regexp-match to just search for "t", which is much easier.
One of the things Racket regular expressions can take as a value to match a regular expression against is an input port. This means you can look for matches in a file without having to first read from it; the matching code will do that part for you. Combine with using multi-line mode so that ^ and $ match after and before newlines as well as the very beginning and end of the input, and you get a simple approach using regexp-match* and a RE that matches 6 non-t characters on a line by themselves:
#lang racket/base
(require racket/port)
;;; Using a string port to demonstrate
(define input "grumpy\nstumpy\nfoobar")
(define (list-matches inp)
(map bytes->string/utf-8 (regexp-match* #px"(?m:^[^t]{6}$)" inp)))
(println (call-with-input-string input list-matches)) ; '("grumpy" "foobar")
The big thing to remember about using an input port is that what it returns are byte strings; you have to convert them to strings yourself.

Emacs Lisp Regular Expression Match everything until character sequence

I am trying to write a regular expression in emacs lisp that will match multi line comments.
For example:
{-
Some
Comment
Here
-}
Should match as a comment. Basically, anything between {- and -}. I am able to almost do it by doing the following:
"\{\-[^-]*\-\}"
However, this will fail if the comment includes a - not immediately followed by }
So, it will not match correctly in this case:
{-
Some -
Comment -
Here -
-}
Which should be valid.
Basically, I would like to match on everything (including newlines) up to the sequence -}
Thanks in advance!
Doesn't this work for you? {-[^-]*[^}]*-}
(You didn't specify things precisely, so I'm just guessing what you want. Must the {- and -} be at the line beginning? Must they be on lines by themselves? Must there be some other characters between them? Etc. For example, should it match a line like this? {--}?)
Made a toolkit for such cases. It comes with a parser, beg-end.el.
Remains to write a function, which will determine the beginning resp. the end of the object.
In pseudo-code:
(put 'MY-FORM 'beginning-op-at
(lambda () (search-forward "-}")))
(put 'MY-FORM 'end-op-at
(lambda () (search-backward "{-")))
When done, it's should be available, i.e. copied and returned like this
(defun MY-FORM-atpt (&optional arg)
" "
(interactive "p")
(ar-th 'MY-FORM arg))
Get it here:
https://launchpad.net/s-x-emacs-werkstatt/

Emacs: regular expression replacing to change case (in scripts)

This is related to
Emacs: regular expression replacing to change case
My additional problem is that I need to script the search-replace but the "\,()" solution works (for me) only when used interactively (emacs 24.2.1). Inside a script it gives the error: "Invalid use of \' in replacement text".
I usually write a "perform-replace" to some file to be loaded when needed. Something like:
(perform-replace "<\\([^>]+\\)>" "<\\,(downcase \1)>" t t nil 1 nil (point-min) (point-max))
It should be possible to call a function to generate the replacement (pg 741 of the emacs lisp manual), but I've tried many variations of the following with no luck:
(defun myfun ()
(downcase (match-string 0)))
(perform-replace "..." (myfun . ()) t t nil)
Can anyone help?
Constructs like \,() are only allowed in interactive calls to query-replace, which is why Emacs complains in your case.
The documentation of perform-replace mentions that you should not use it in elisp code and proposes a better alternative, upon which we can build the following code:
(while (re-search-forward "<\\([^>]+\\)>" nil t)
(replace-match (downcase (match-string 0)) t nil))
If you still want to interactively query the user about the replacements, using perform-replace like you did is probably the right thing to do. There were a few different problems in your code:
As stated in the elisp manual the replacement function must take two arguments (the data you provide in the cons cell and the number of replacements already made).
As stated in the documentation of query-replace-regexp (or the elisp manual), you need to ensure that case-fold-search or case-replace is set to nil so that the case pattern is not transferred to the replacement.
You need to quote the cons cell (myfun . nil), otherwise it will be interpreted as a function call and evaluated too early.
Here is a working version:
(let ((case-fold-search nil))
(perform-replace "<\\([^>]+\\)>"
`(,(lambda (data count)
(downcase (match-string 0))))
t t nil))
C-h f perform-replace says:
Don't use this in your own program unless you want to query and set the mark
just as `query-replace' does. Instead, write a simple loop like this:
(while (re-search-forward "foo[ \t]+bar" nil t)
(replace-match "foobar"))
Now the "<\\,(downcase \1)>" needs to be replaced by an Elisp expression that builds the proper string, such as (format "<%s>" (downcase (match-string 1))).
If you do need the query and stuff, then you might like to try: C-M-% f\(o\)o RET bar \,(downcase \1) baz RET and then C-x RET RET to see what arguments were constructed during the interactive call.
You'll see discover (even better if you click on replace.el in C-h f perform-replace to see the source code of the function), that the replacements argument can take the form (FUNCTION . ARGUMENT). More specifically, the code includes a comment giving some details:
;; REPLACEMENTS is either a string, a list of strings, or a cons cell
;; containing a function and its first argument. The function is
;; called to generate each replacement like this:
;; (funcall (car replacements) (cdr replacements) replace-count)
;; It must return a string.