Non-greedy parsing with fnparse - clojure

I'm trying to parse strings with fnparse and I need to act on a character differently if it is at the end of a word. For this I have rules thus:
(def a-or-s
(rep* (alt (lit \a) (lit \s))))
(def ends-with-s
(conc a-or-s (lit \s)))
I try to match the string "aas". This however doesn't parse because the rep* is greedy and swallows up the last character of the word and the conc rule doesn't work. How can I get round this and match these constructions properly?

For that you'll need to use the followed-by rule, basically you want to repeatedly match 'a' or 's' but without consuming the last token. Here's the code to do that:
(def a-or-s
(lit-alt-seq "as")) ;; same as (alt (lit \a) (lit \s))
(def ends-with-s
(conc
(rep* (conc a-or-s (followed-by a-or-s)))
(lit \s)))
We can refactor that code to create a non-greedy version of rep* like this:
(defn rep*? [subrule]
(rep* (conc subrule (followed-by subrule))))
Then use it instead of rep* and your original code should work as expected. After trying it though...
user> (rule-match (conc (rep*? a-or-s) (lit \s)) identity #(identity %2) {:remainder "aaaaaaaasss"})
([(\a \a) (\a \a) (\a \a) (\a \a) (\a \a) (\a \a) (\a \a) (\a \s) (\s \s) (\s \s)] \s)
...you might ask "what's happening to the output?", well rep*? is giving us pairs of tokens because that's what we asked for. This can fixed using invisi-conc instead of conc:
(defn rep*? [subrule]
(rep* (invisi-conc subrule (followed-by subrule))))
user> (rule-match (conc (rep*? a-or-s) (lit \s)) identity #(identity %2) {:remainder "aaaaaaaasss"})
([\a \a \a \a \a \a \a \a \s \s] \s)

Related

Realizing a Clojure lazy sequence (string) in the REPL

I'm trying to realize a lazy sequence (which should generate a single string) in the REPL, with no luck. The original code works fine:
(def word_list ["alpha" "beta" "gamma" "beta" "alpha" "alpha" "beta" "beta" "beta"])
(def word_string (reduce str (interpose " " word_list)));
word_string ; "alpha beta gamma beta alpha alpha beta beta beta"
But not wanting to leave well enough alone, I wondered what else would work, and tried removing the reduce, thinking that str might have the same effect. It did not...
(def word_string (str (interpose " " word_list)))
word_string ; "clojure.lang.LazySeq#304a9790"
I tried the obvious, using reduce again, but that didn't work either. There's another question about realizing lazy sequences that seemed promising, but nothing I tried worked:
(reduce str word_string) ; "clojure.lang.LazySeq#304a9790"
(apply str word_string) ; "clojure.lang.LazySeq#304a9790"
(println word_string) ; "clojure.lang.LazySeq#304a9790"
(apply list word_string) ; [\c \l \o \j \u \r \e \. \l \a \n \g \. \L \a \z \y...]
(vec word_string) ; [\c \l \o \j \u \r \e \. \l \a \n \g \. \L \a \z \y...]
(apply list word_string) ; (\c \l \o \j \u \r \e \. \l \a \n \g \. \L \a \z \y...)
(take 100 word_string) ; (\c \l \o \j \u \r \e \. \l \a \n \g \. \L \a \z \y...)
The fact that some of the variations, gave me the characters in "clojure.lang.LazySeq" also worries me - did I somehow lose the actual string value, and my reference just has the value "clojure.lang.LazySeq"? If not, how do I actually realize the value?
To clarify: given that word_string is assigned to a lazy sequence, how would I realize it? Something like (realize word_string), say, if that existed.
Update: based on the accepted Answer and how str works, it turns out that I can get the actual sequence value, not just its name:
(reduce str "" word_string) ; "alpha beta gamma beta alpha alpha beta beta beta"
Yes, this is terrible code. :) I was just trying to understand what was going on, why it was breaking, and whether the actual value was still there or not.
What you want is:
(def word_string (apply str (interpose " " word_list)))
Look at the documentation of str:
With no args, returns the empty string. With one arg x, returns
x.toString(). (str nil) returns the empty string. With more than
one arg, returns the concatenation of the str values of the args.
So you're calling .toString on the sequence, which generates that representation instead of applying str to the elements of the sequence as arguments.
BTW, the more idiomatic way of doing what you want is:
(clojure.string/join " " word_list)
Also, a string is not a lazy sequence. interpose returns a lazy sequence, and you're calling .toString on that.
You don't need to do anything special to realize a lazy seq in clojure, you just use it in any place that a seq is expected.
user=> (def word-list ["alpha" "beta" "gamma" "beta" "alpha" "alpha" "beta" "beta" "beta"])
#'user/word-list
user=> (def should-be-a-seq (interpose " " word_list))
#'user/should-be-a-seq
user=> (class should-be-a-seq)
clojure.lang.LazySeq
So we have a seq, if I use it in any case that would go through all of the values in the seq, it will end up fully realized. e.g.
user=> (seq should-be-a-seq)
("alpha" " " "beta" " " "gamma" " " "beta" " " "alpha" " " "alpha" " " "beta" " " "beta" " " "beta")
It's still a seq though, in fact it's still the same object that it was before.
user=> (str should-be-a-seq)
"clojure.lang.LazySeq#304a9790"
As Diego Basch mentioned, calling str on something is just like calling .toString, which for the LazySeq method is apparently just the default .toString that is inherited from Object.
Since it's a seq, you can use it like any seq, whether it's been fully realized previously or not. e.g.
user=> (apply str should-be-a-seq)
"alpha beta gamma beta alpha alpha beta beta beta"
user=> (reduce str should-be-a-seq)
"alpha beta gamma beta alpha alpha beta beta beta"
user=> (str/join should-be-a-seq)
"alpha beta gamma beta alpha alpha beta beta beta"
user=> (str/join (map #(.toUpperCase %) should-be-a-seq))
"ALPHA BETA GAMMA BETA ALPHA ALPHA BETA BETA BETA"

replace-regex to remove first element from camel case function name

I have code like this:
<?= $this->Article->getShowComments(); ?>
and need to convert it to
{{ Article.showComments }}
everything is easy with simple regex: exception .getFooBar to .fooBar, how can I do this in Elisp, is there something like replace using a function like in javascript?
Set case-fold-search to nil:
M-x set-variable RET case-fold-search RET nil
Now, use the following command for the transformation:
M-x replace-regexp RET \_<\(?:[a-z0-9]\|\s_\)+\([A-Z]\) RET \,(downcase \1)
The first argument is the regexp to search for, the second is the replacement text.
The regexp is a little complicated, but essentially matches a symbol starting (\_< is symbol start) with either lowercase letters or digits ([a-z0-9]) or non-word symbol characters (\s_), followed by a single uppercase letter [A-Z]. The first non-grouping parenthesis \(?:…\) just groups the or-operator \|.
The second parenthesis around the uppercase letter is grouping, which creates the “reference” \1 for use in our replacement text.
We wrap the reference to the matched uppercase letter into the function downcase to convert it to lowercase. The \, in the replacement text just tells Emacs, that the following text is a proper sexp to be evaluated and not just a simple string.
Edit 1) The rx variant of this RE is probably easier to understand:
(and symbol-start
(one-or-more (or (any "a-z" "0-9") (syntax symbol)))
(group-n 1 (any "A-Z"))
Unfortunately you can't use RX expressions in replace-match.
Edit 2) replace-regexp is intended for interactive use only. It should not be used non-interactively, i.e. from Emacs Lisp. Notably, when used non-interactively, this function will not compile the replacement text, so the special \, escape will not work!
From Emacs Lisp, use re-search-forward and replace-match:
(let ((case-fold-search nil)
(regexp (rx symbol-start
(one-or-more (or (any "a-z" "0-9") (syntax symbol)))
(group-n 1 (any "A-Z")))))
(while (re-search-forward regexp nil 'no-error)
(replace-match (downcase (match-string 1)) 'fixed-case 'literal)))
Make sure to wrap this in with-current-buffer to make it operate on the right buffer.
Here's what I've got in my attic for this:
(defun CamelCase->underscore (str)
(mapconcat 'identity (CamelCase->list str) "_"))
(defun CamelCase->list (str)
(let ((case-fold-search nil)
(pos 0)
words)
(while (string-match ".[^A-Z]*" str pos)
(let ((word (downcase (match-string-no-properties 0 str))))
(if (> (length word) 1)
(push word words)
(setq words (cons (concat (car words) word)
(cdr words)))))
(setq pos (match-end 0)))
(reverse words)))
(CamelCase->underscore "getShowComments")
;; => "get_show_comments"
Just needs a bit of adapting for your case.
And here's the adaptation:
(defun CamelCase->something (str)
(let ((case-fold-search nil)
(pos 0)
words)
(while (string-match ".[^A-Z]*" str pos)
(let ((word (match-string-no-properties 0 str)))
(if (> (length word) 1)
(push word words)
(setq words (cons (concat (car words) word)
(cdr words)))))
(setq pos (match-end 0)))
(setq words (cdr (reverse words)))
(mapconcat 'identity
(cons (downcase (car words)) (cdr words))
"")))

re-search-backward using (?:x)|(?:y) doesn't work?

I'm trying to make an inferior mode derived from comint-mode automatically "linkify" two variations of file:line:col in the output.
To do so, I have one regexp with two subpatterns in non-capture groups, joined by |. Each subpattern has exactly three capture groups:
(concat
"\\(?:" ;; pattern 1 e.g. "; /path/to/file:1:1"
"; \\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\)"
"\\)"
"\\|"
"\\(?:" ;; pattern 2 e.g. "location: #(<path:/path/to/file> 0 1"
"location: (#<path:\\([^>]+\\)> \\([0-9]+\\) \\([0-9]+\\)"
"\\)")
The matches things matching the first subpattern. But it never matches things matching the second subpattern.
However the existence of the first pattern seems to mean that the second (?: ...) pattern will never match. If I comment out the first pattern, only then will the second one match.
If I remove the first subpattern, leaving
"\\(?:" ;; pattern 2
"location: (#<path:\\([^>]+\\)> \\([0-9]+\\) \\([0-9]+\\)"
"\\)"
it does match, so I know that the second subpattern is correct.
Or, if I retain a first subpattern but change it to be something like "XXX", with no captures:
"\\(?:" ;; pattern 1
"XXXX"
"\\)"
"\\|"
"\\(?:" ;; pattern 2
"location: (#<path:\\([^>]+\\)> \\([0-9]+\\) \\([0-9]+\\)"
"\\)"
it also works. The first subpattern doesn't match example input containing no "XXXX", and the second subpattern is tried next and does match.
I'm stumped. Am I misunderstanding something about regexps in general, or is this unique to Emacs?
More context in case it matters:
(define-derived-mode inferior-foo-mode comint-mode "Inferior Foo"
...
(add-hook 'comint-output-filter-functions 'linkify)
...)
(defun linkify (str)
(save-excursion
(end-of-buffer)
(re-search-backward (concat
"\\(?:" ;; pattern 1
"; \\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\)"
"\\)"
"\\|"
"\\(?:" ;; pattern 2
"location: (#<path:\\([^>]+\\)> \\([0-9]+\\) \\([0-9]+\\)"
"\\)")
(- (buffer-size) (length str))
t)
(when (and (match-beginning 0)
(match-beginning 1) (match-beginning 2) (match-beginning 3))
(make-text-button
(match-beginning 1) (match-end 3)
'file (buffer-substring-no-properties (match-beginning 1) (match-end 1))
'line (buffer-substring-no-properties (match-beginning 2) (match-end 2))
'col (buffer-substring-no-properties (match-beginning 3) (match-end 3))
'action #'go-to-file-line-col
'follow-link t))))
You are counting wrongly. The capturing groups for the second noncapturing group are (match-string 4), (match-string 5), (match-string 6),
Note also, that
(buffer-substring-no-properties (match-beginning 1) (match-end 1))
is equivalent to the short clear version
(match-string-no-properties 1)
I would propose something like:
(let ((m1 (or (match-string-no-properties 1) (match-string-no-properties 4)))
(m2 (or (match-string-no-properties 2) (match-string-no-properties 5)))
(m2 (or (match-string-no-properties 3) (match-string-no-properties 6))))
(when (and m1 m2 m3) ...
Your regexp does not correspond to its comment.
The comment has #(; the regexp has (#. The comment has two spaces after location:; the regexp has 3 spaces. If you make them correspond then it seems to work fine. E.g.:
(concat
"\\(?:" ;; pattern 1
"; \\([^:]+\\):\\([0-9]+\\):\\([0-9]+\\)"
"\\)"
"\\|"
"\\(?:" ;; pattern 2
"location: #(<path:\\([^>]+\\)> \\([0-9]+\\) \\([0-9]+\\)"
"\\)")

Elisp mechanism for converting PCRE regexps to emacs regexps

I admit significant bias toward liking PCRE regexps much better than emacs, if no no other reason that when I type a '(' I pretty much always want a grouping operator. And, of course, \w and similar are SO much more convenient than the other equivalents.
But it would be crazy to expect to change the internals of emacs, of course. But it should be possible to convert from a PCRE experssion to an emacs expression, I'd think, and do all the needed conversions so I can write:
(defun my-super-regexp-function ...
(search-forward (pcre-convert "__\\w: \d+")))
(or similar).
Anyone know of a elisp library that can do this?
Edit: Selecting a response from the answers below...
Wow, I love coming back from 4 days of vacation to find a slew of interesting answers to sort through! I love the work that went into the solutions of both types.
In the end, it looks like both the exec-a-script and straight elisp versions of the solutions would both work, but from a pure speed and "correctness" approach the elisp version is certainly the one that people would prefer (myself included).
https://github.com/joddie/pcre2el is the up-to-date version of this answer.
pcre2el or rxt (RegeXp Translator or RegeXp Tools) is a utility for working with regular expressions in Emacs, based on a recursive-descent parser for regexp syntax. In addition to converting (a subset of) PCRE syntax into its Emacs equivalent, it can do the following:
convert Emacs syntax to PCRE
convert either syntax to rx, an S-expression based regexp syntax
untangle complex regexps by showing the parse tree in rx form and highlighting the corresponding chunks of code
show the complete list of strings (productions) matching a regexp, provided the list is finite
provide live font-locking of regexp syntax (so far only for Elisp buffers – other modes on the TODO list)
The text of the original answer follows...
Here's a quick and ugly Emacs lisp solution (EDIT: now located more permanently here). It's based mostly on the description in the pcrepattern man page, and works token by token, converting only the following constructions:
parenthesis grouping ( .. )
alternation |
numerical repeats {M,N}
string quoting \Q .. \E
simple character escapes: \a, \c, \e, \f, \n, \r, \t, \x, and \ + octal digits
character classes: \d, \D, \h, \H, \s, \S, \v, \V
\w and \W left as they are (using Emacs' own idea of word and non-word characters)
It doesn't do anything with more complicated PCRE assertions, but it does try to convert escapes inside character classes. In the case of character classes including something like \D, this is done by converting into a non-capturing group with alternation.
It passes the tests I wrote for it, but there are certainly bugs, and the method of scanning token-by-token is probably slow. In other words, no warranty. But perhaps it will do enough of the simpler part of the job for some purposes. Interested parties are invited to improve it ;-)
(eval-when-compile (require 'cl))
(defvar pcre-horizontal-whitespace-chars
(mapconcat 'char-to-string
'(#x0009 #x0020 #x00A0 #x1680 #x180E #x2000 #x2001 #x2002 #x2003
#x2004 #x2005 #x2006 #x2007 #x2008 #x2009 #x200A #x202F
#x205F #x3000)
""))
(defvar pcre-vertical-whitespace-chars
(mapconcat 'char-to-string
'(#x000A #x000B #x000C #x000D #x0085 #x2028 #x2029) ""))
(defvar pcre-whitespace-chars
(mapconcat 'char-to-string '(9 10 12 13 32) ""))
(defvar pcre-horizontal-whitespace
(concat "[" pcre-horizontal-whitespace-chars "]"))
(defvar pcre-non-horizontal-whitespace
(concat "[^" pcre-horizontal-whitespace-chars "]"))
(defvar pcre-vertical-whitespace
(concat "[" pcre-vertical-whitespace-chars "]"))
(defvar pcre-non-vertical-whitespace
(concat "[^" pcre-vertical-whitespace-chars "]"))
(defvar pcre-whitespace (concat "[" pcre-whitespace-chars "]"))
(defvar pcre-non-whitespace (concat "[^" pcre-whitespace-chars "]"))
(eval-when-compile
(defmacro pcre-token-case (&rest cases)
"Consume a token at point and evaluate corresponding forms.
CASES is a list of `cond'-like clauses, (REGEXP FORMS
...). Considering CASES in order, if the text at point matches
REGEXP then moves point over the matched string and returns the
value of FORMS. Returns `nil' if none of the CASES matches."
(declare (debug (&rest (sexp &rest form))))
`(cond
,#(mapcar
(lambda (case)
(let ((token (car case))
(action (cdr case)))
`((looking-at ,token)
(goto-char (match-end 0))
,#action)))
cases)
(t nil))))
(defun pcre-to-elisp (pcre)
"Convert PCRE, a regexp in PCRE notation, into Elisp string form."
(with-temp-buffer
(insert pcre)
(goto-char (point-min))
(let ((capture-count 0) (accum '())
(case-fold-search nil))
(while (not (eobp))
(let ((translated
(or
;; Handle tokens that are treated the same in
;; character classes
(pcre-re-or-class-token-to-elisp)
;; Other tokens
(pcre-token-case
("|" "\\|")
("(" (incf capture-count) "\\(")
(")" "\\)")
("{" "\\{")
("}" "\\}")
;; Character class
("\\[" (pcre-char-class-to-elisp))
;; Backslash + digits => backreference or octal char?
("\\\\\\([0-9]+\\)"
(let* ((digits (match-string 1))
(dec (string-to-number digits)))
;; from "man pcrepattern": If the number is
;; less than 10, or if there have been at
;; least that many previous capturing left
;; parentheses in the expression, the entire
;; sequence is taken as a back reference.
(cond ((< dec 10) (concat "\\" digits))
((>= capture-count dec)
(error "backreference \\%s can't be used in Emacs regexps"
digits))
(t
;; from "man pcrepattern": if the
;; decimal number is greater than 9 and
;; there have not been that many
;; capturing subpatterns, PCRE re-reads
;; up to three octal digits following
;; the backslash, and uses them to
;; generate a data character. Any
;; subsequent digits stand for
;; themselves.
(goto-char (match-beginning 1))
(re-search-forward "[0-7]\\{0,3\\}")
(char-to-string (string-to-number (match-string 0) 8))))))
;; Regexp quoting.
("\\\\Q"
(let ((beginning (point)))
(search-forward "\\E")
(regexp-quote (buffer-substring beginning (match-beginning 0)))))
;; Various character classes
("\\\\d" "[0-9]")
("\\\\D" "[^0-9]")
("\\\\h" pcre-horizontal-whitespace)
("\\\\H" pcre-non-horizontal-whitespace)
("\\\\s" pcre-whitespace)
("\\\\S" pcre-non-whitespace)
("\\\\v" pcre-vertical-whitespace)
("\\\\V" pcre-non-vertical-whitespace)
;; Use Emacs' native notion of word characters
("\\\\[Ww]" (match-string 0))
;; Any other escaped character
("\\\\\\(.\\)" (regexp-quote (match-string 1)))
;; Any normal character
("." (match-string 0))))))
(push translated accum)))
(apply 'concat (reverse accum)))))
(defun pcre-re-or-class-token-to-elisp ()
"Consume the PCRE token at point and return its Elisp equivalent.
Handles only tokens which have the same meaning in character
classes as outside them."
(pcre-token-case
("\\\\a" (char-to-string #x07)) ; bell
("\\\\c\\(.\\)" ; control character
(char-to-string
(- (string-to-char (upcase (match-string 1))) 64)))
("\\\\e" (char-to-string #x1b)) ; escape
("\\\\f" (char-to-string #x0c)) ; formfeed
("\\\\n" (char-to-string #x0a)) ; linefeed
("\\\\r" (char-to-string #x0d)) ; carriage return
("\\\\t" (char-to-string #x09)) ; tab
("\\\\x\\([A-Za-z0-9]\\{2\\}\\)"
(char-to-string (string-to-number (match-string 1) 16)))
("\\\\x{\\([A-Za-z0-9]*\\)}"
(char-to-string (string-to-number (match-string 1) 16)))))
(defun pcre-char-class-to-elisp ()
"Consume the remaining PCRE character class at point and return its Elisp equivalent.
Point should be after the opening \"[\" when this is called, and
will be just after the closing \"]\" when it returns."
(let ((accum '("["))
(pcre-char-class-alternatives '())
(negated nil))
(when (looking-at "\\^")
(setq negated t)
(push "^" accum)
(forward-char))
(when (looking-at "\\]") (push "]" accum) (forward-char))
(while (not (looking-at "\\]"))
(let ((translated
(or
(pcre-re-or-class-token-to-elisp)
(pcre-token-case
;; Backslash + digits => always an octal char
("\\\\\\([0-7]\\{1,3\\}\\)"
(char-to-string (string-to-number (match-string 1) 8)))
;; Various character classes. To implement negative char classes,
;; we cons them onto the list `pcre-char-class-alternatives' and
;; transform the char class into a shy group with alternation
("\\\\d" "0-9")
("\\\\D" (push (if negated "[0-9]" "[^0-9]")
pcre-char-class-alternatives) "")
("\\\\h" pcre-horizontal-whitespace-chars)
("\\\\H" (push (if negated
pcre-horizontal-whitespace
pcre-non-horizontal-whitespace)
pcre-char-class-alternatives) "")
("\\\\s" pcre-whitespace-chars)
("\\\\S" (push (if negated
pcre-whitespace
pcre-non-whitespace)
pcre-char-class-alternatives) "")
("\\\\v" pcre-vertical-whitespace-chars)
("\\\\V" (push (if negated
pcre-vertical-whitespace
pcre-non-vertical-whitespace)
pcre-char-class-alternatives) "")
("\\\\w" (push (if negated "\\W" "\\w")
pcre-char-class-alternatives) "")
("\\\\W" (push (if negated "\\w" "\\W")
pcre-char-class-alternatives) "")
;; Leave POSIX syntax unchanged
("\\[:[a-z]*:\\]" (match-string 0))
;; Ignore other escapes
("\\\\\\(.\\)" (match-string 0))
;; Copy everything else
("." (match-string 0))))))
(push translated accum)))
(push "]" accum)
(forward-char)
(let ((class
(apply 'concat (reverse accum))))
(when (or (equal class "[]")
(equal class "[^]"))
(setq class ""))
(if (not pcre-char-class-alternatives)
class
(concat "\\(?:"
class "\\|"
(mapconcat 'identity
pcre-char-class-alternatives
"\\|")
"\\)")))))
I made a few minor modifications to a perl script I found on perlmonks (to take values from the command line) and saved it as re_pl2el.pl (given below). Then the following does a decent job of converting PCRE to elisp regexps, at least for non-exotic the cases that I tested.
(defun pcre-to-elre (regex)
(interactive "MPCRE expression: ")
(shell-command-to-string (concat "re_pl2el.pl -i -n "
(shell-quote-argument regex))))
(pcre-to-elre "__\\w: \\d+") ;-> "__[[:word:]]: [[:digit:]]+"
It doesn't handle a few "corner" cases like perl's shy {N,M}? constructs, and of course not code execution etc. but it might serve your needs or be a good starting place for such. Since you like PCRE I presume you know enough perl to fix any cases you use often. If not let me know and we can probably fix them.
I would be happier with a script that parsed the regex into an AST and then spit it back out in elisp format (since then it could spit it out in rx format too), but I couldn't find anything doing that and it seemed like a lot of work when I should be working on my thesis. :-) I find it hard to believe that noone has done it though.
Below is my "improved" version of re_pl2el.pl. -i means don't double escape for strings, and -n means don't print a final newline.
#! /usr/bin/perl
#
# File: re_pl2el.pl
# Modified from http://perlmonks.org/?node_id=796020
#
# Description:
#
use strict;
use warnings;
# version 0.4
# TODO
# * wrap converter to function
# * testsuite
#--- flags
my $flag_interactive; # true => no extra escaping of backslashes
if ( int(#ARGV) >= 1 and $ARGV[0] eq '-i' ) {
$flag_interactive = 1;
shift #ARGV;
}
if ( int(#ARGV) >= 1 and $ARGV[0] eq '-n' ) {
shift #ARGV;
} else {
$\="\n";
}
if ( int(#ARGV) < 1 ) {
print "usage: $0 [-i] [-n] REGEX";
exit;
}
my $RE='\w*(a|b|c)\d\(';
$RE='\d{2,3}';
$RE='"(.*?)"';
$RE="\0".'\"\t(.*?)"';
$RE=$ARGV[0];
# print "Perlcode:\t $RE";
#--- encode all \0 chars as escape sequence
$RE=~s#\0#\\0#g;
#--- substitute pairs of backslashes with \0
$RE=~s#\\\\#\0#g;
#--- hide escape sequences of \t,\n,... with
# corresponding ascii code
my %ascii=(
t =>"\t",
n=> "\n"
);
my $kascii=join "|",keys %ascii;
$RE=~s#\\($kascii)#$ascii{$1}#g;
#--- normalize needless escaping
# e.g. from /\"/ to /"/, since it's no difference in perl
# but might confuse elisp
$RE=~s#\\"#"#g;
#--- toggle escaping of 'backslash constructs'
my $bsc='(){}|';
$RE=~s#[$bsc]#\\$&#g; # escape them once
$RE=~s#\\\\##g; # and erase double-escaping
#--- replace character classes
my %charclass=(
w => 'word' , # TODO: emacs22 already knows \w ???
d => 'digit',
s => 'space'
);
my $kc=join "|",keys %charclass;
$RE=~s#\\($kc)#[[:$charclass{$1}:]]#g;
#--- unhide pairs of backslashes
$RE=~s#\0#\\\\#g;
#--- escaping for elisp string
unless ($flag_interactive){
$RE=~s#\\#\\\\#g; # ... backslashes
$RE=~s#"#\\"#g; # ... quotes
}
#--- unhide escape sequences of \t,\n,...
my %rascii= reverse %ascii;
my $vascii=join "|",keys %rascii;
$RE=~s#($vascii)#\\$rascii{$1}#g;
# print "Elispcode:\t $RE";
print "$RE";
#TODO whats the elisp syntax for \0 ???
The closest previous work on this have been extensions to M-x re-builder, see
http://www.emacswiki.org/emacs/ReBuilder
or the work of Ye Wenbin on PDE.
http://cpansearch.perl.org/src/YEWENBIN/Emacs-PDE-0.2.16/lisp/doc/pde.html
Possibly relevant is visual-regexp-steroids, which extends query-replace to use a live preview and allows you to use different regexp backends, including PCRE.

Rotate a list-of-list matrix in Clojure

I'm new to Clojure and functional programming in general. I'm at a loss in how to handle this in a functional way.
I have the following matrix:
(def matrix [[\a \b \c]
[\d \e \f]
[\g \h \i]])
I want to transform it into something like this (rotate counterclockwise):
((\a \d \g)
(\b \e \h)
(\c \f \i ))
I've hacked up this bit that gives me the elements in the correct order. If I could collect the data in a string this way I could then split it up with partition. However I'm pretty sure doseq is the wrong path:
(doseq [i [0 1 2]]
(doseq [row matrix]
(println (get (vec row) i))))
I've dabbled with nested map calls but keep getting stuck with that. What's the correct way to build up a string in Clojure or handle this in an even better way?
What you're trying to achieve sounds like transpose. I'd suggest
(apply map list matrix)
; => ((\a \d \g) (\b \e \h) (\c \f \i))
What does it do?
(apply map list '((\a \b \c) (\d \e \f) (\g \h \i)))
is equivalent to
(map list '(\a \b \c) '(\d \e \f) '(\g \h \i))
which takes first elements of each of the three lists, calls list on them, then takes second elements, calls list on them... An returns a sequence of all lists which were generated this way.
A couple more examples of both apply and map can be found on ClojureDocs.
Taking the matrix transposition solution directly from rosettacode:
(vec (apply map vector matrix))
To see what is going on consider:
(map vector [\a \b \c] [\d \e \f] [\g \h \i])
This will work nicely with arbitrary matrix dimensions although it is not good for significant number crunching, for that you would want to consider using a java based matrix manipulation library from Clojure.
You can use core.matrix to do these kind of matrix manipulations very easily. In particular, there is already a transpose function that does exactly what you want:
Example:
(use 'clojure.core.matrix)
(def matrix [[\a \b \c]
[\d \e \f]
[\g \h \i]])
(transpose matrix)
=> [[\a \d \g]
[\b \e \h]
[\c \f \i]]
Here's one way:
(def transposed-matrix (apply map list matrix))
;=> ((\a \d \g) (\b \e \h) (\c \f \i))
(doseq [row transposed-matrix]
(doall (map println row)))
That produces the same output as your original (printing the columns of matrix).