Emacs (TeX): how to search and replace a whole region? - regex

I bother you to have some tips for this problem: I'm working in Latex with a very dirty code, generated by writer2latex (quite good programme, anyway) and, using Emacs, I'm trying to query-replace multiple lines of code, for instance:
{\centering [Warning: Image ignored] % Unhandled or unsupported graphics:
%\includegraphics[width=11.104cm,height=8.23cm]{img34}
have to become:
\begin{figure}[tpb]
\begin{center}
\includegraphics[width=\textwidth]{img34}
Using M-x re-builder, I found out that I could underline the whole region I need to query-replace with the string: \{.*centering.*c-qc-j.*cm] but, if I M-x replace-regexp using this, I only get: Invalid regexp: "Invalid content of \\{\\}"
Any suggestion about how to perform the query? I have a HUGE amount of lines like these to replace... :-)

You're getting this error message because in Emacs' regular expressions the curly braces\{ and \} have special meaning. These braces are used to specify that the part of the regexp immediately before the braces should be matched a certain number of times.
From the GNU Emacs documentation on regexps:
\{n\}
is a postfix operator specifying n repetitions [...]
\{n,m\}
is a postfix operator specifying between n and m repetitions [...]
If you want your regexp to actually match a curly brace, do not escape it with a leading slash:
{.*centering.*C-q C-j.*cm]
In order to use a backslash in the replacement string you have to escape it with another backslash. (When doing this in code, it quickly becomes quite ugly because inside a double-quoted string backslashes themselves have to be escaped already. However, since you are doing your replacements interactively, the double escaping is not necessary and thus two backslashs are enough.)
M-C-% {.*centering.*C-q C-j.*cm] RET \\begin{figure}[tpb]C-q C-j\\begin{center}C-q C-j\\includegraphics[width=\\textwidth] RET

Make sure the re-syntax is "read", C-c tab. Remove the initial backslash. Now the regexp should work if you yank it into replace-regexp

Related

How to do a negative lookbehind within a %r<…>-delimited regexp in Ruby?

I like the %r<…> delimiters because it makes it really easy to spot the beginning and end of the regex, and I don't have to escape any /. But it seems that they have an insurmountable limitation that other delimiters don't have?
Every other delimiter imaginable works fine:
/(?<!foo)/
%r{(?<!foo)}
%r[(?<!foo)]
%r|(?<!foo)|
%r/(?<!foo)/
But when I try to do this:
%r<(?<!foo)>
it gives this syntax error:
unterminated regexp meets end of file
Okay, it probably doesn't like that it's not a balanced pair, but how do you escape it such that it does like it?
Does something need to be escaped?
According to wikibooks.org:
Any single non-alpha-numeric character can be used as the delimiter,
%[including these], %?or these?, %~or even these things~.
By using this notation, the usual string delimiters " and ' can appear
in the string unescaped, but of course the new delimiter you've chosen
does need to be escaped.
Indeed, escaping is needed in these examples:
%r!(?<\!foo)!
%r?(\?<!foo)?
But if that were the only problem, then I should be able to escape it like this and have it work:
%r<(?\<!foo)>
But that yields this error:
undefined group option: /(?\<!foo)/
So maybe escaping is not needed/allowed? wikibooks.org does list %<pointy brackets> as one of the exceptions:
However, if you use
%(parentheses), %[square brackets], %{curly brackets} or
%<pointy brackets> as delimiters then those same delimiters
can appear unescaped in the string as long as they are in balanced
pairs
Is it a problem with balanced pairs?
Balanced pairs are no problem as long as you are doing something in the Regexp that requires them, like...
%r{(?<!foo{1})} # repetition quantifier
%r[(?<![foo])] # character class
%r<(?<name>foo)> # named capture group
But what if you need to insert a left-side delimiter ({, [, or <) inside the regex? Just escape it, right? Ruby seems to have no problem with escaped unbalanced delimiters most of the time...
%r{(?<!foo\{)}
%r[(?<!\[foo)]
%r<\<foo>
It's just when you try to do it in the middle of the "group options" (which I guess is what the <! characters are classified as here) following a (? that it doesn't like it:
%r<(?\<!foo)>
# undefined group option: /(?\<!foo)/
So how do you do that then and make Ruby happy? (without changing the delimiters)
Conclusion
The workaround is easy. I'll just change this particular regex to just use something else instead like %r{…} instead.
But the questions remain...
Is there really no way to escape the < here?
Are there really some regular expression that are simply impossible to write using certain delimiters like %r<…>?
Is %r<…> the only regular expression delimiter pair that has this problem (where some regular expressions are impossible to write when using it). If you know of a similar example with %r{…}/%r[…], do share!
Version info
Not that it probably matters since this syntax probably hasn't changed, but I'm using:
⟫ ruby -v
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-linux]
Reference:
https://ruby-doc.org/core-2.6.3/Regexp.html
% Notation
As others have mentioned, seems like an oversight based on how this character differs from other paired boundaries.
As far as "Is there really no way to escape the < here?" there is a way... but you're not going to like it:
%r<(?#{'<'}!foo)> == %r((?<!foo))
Using interpolation to insert the < character seems to work. But given that there are much better options, I would avoid it unless you were planning on splitting the regex into sections anyway...

Emacs: Replace regex returns 0 matches

I have a bunch of column names from a SQL query, and I want to get rid of everything before the AS using Emacs. In other words, I want to go from
MAX(CASE WHEN maintenance.work_order IS NULL THEN 1 ELSE 0 END) AS Has_work_order,
to
Has_work_order,
I used re-builder to create a simple regex: "\.\*AS " which highlights the appropriate parts of the buffer. However, when I select the entire buffer and run query-replace-regexp using M-x query-replace-regexp <RET> "\.\*AS " <RET> "" <RET>, Emacs displays a Replaced 0 occurrences message.
What am I doing wrong?
By using re-builder (which is a good idea) to create a regexp for interactive use, you are then getting confused between the different regexp syntax options. re-builder defaults to read syntax (which you would use when writing elisp code), whereas for interactive use you want string syntax.
Refer to Why do regular expressions created with the regex builder use syntax different from the interactive regular expressions? for explanation and clarification.
In read syntax, \.\*AS represents the regexp .*AS (because . and * are not special when reading strings, so those backslashes are redundant); but in string syntax \.\*AS is the regexp \.\*AS in which the . and * characters which are special to regexps have been escaped, and therefore lose their special meaning, and will instead match literal . and * characters in the text.
Note, however, that when entering a regexp interactively you should not include the surrounding double-quote characters " that are present in re-builder even for its string syntax mode. If you enter the " characters interactively, then the regexp will be matching text that contains those " characters.
I was able to do this with the following:
M-x query-replace-regexp <RET> \(.+\)AS <RET> <RET>
Note the cursor must be above the line(s) that need replacing. I've not used this before, but it's interactive (pressing 'y' for each replace, this may be able to be done automatically/globally, but I've not played around with it.

Basic Vim - Search and Replace text bounded by specific characters

Say I wanted to replace :
"Christoph Waltz" = "That's a Bingo";
"Gary Coleman" = "What are you talking about, dear Willis?";
to just have :
"Christoph Waltz"
"Gary Coleman"
i.e. I want to remove all the characters including and after the = and the ;
I thought the regex for finding the pattern would be \=.*?\;. In vim, I tried :
:%s/\=.*?\;$//g
but it gave me an Invalid Command error and Nothing after \=. How do I remove the above text? Apologies, but I'm new to this.
Vim's regular expression dialect is different; its escaping is optimized for text searches. See :help perl-patterns for a comparison with Perl regular expressions. As #EvergreenTree has noted, you can influence the escaping behavior with special atoms; cp. :help /\v.
For your example, the non-greedy match is .\{-}, not .*?, and, as mentioned, you mustn't escape several literal characters:
:%s/ =.\{-};$//
(The /g flag is superfluous, too; there can be only one match anchored to the end via $.)
This is because of vim's weird handling of regexes by default. Instead of \= interpreting as a literal =, it interprets it as a special regex character. To make vim's regex system work more normally, you can prefix it with \v, which is "very magic" mode. This should do the trick:
%s/\v\=.*\;$//g
I won't explain how vim enterprets every single character in very magic mode here, but you can read about it in this help topic:
:help /magic

"Or" operator in emacs regexp with `M-x occur`

I've wanted to get an overview of my Python program, so I ran:
M-x occur
and for the regexp I've supplied
(def)|(class)
which failed to match anything.
I've also looked at this post, and tried
(def)\\|(class)
but this failed to match anything either...
How do I get M-x occur to match class or def?
You have to use single backslash (without parentheses, or you should escape parentheses as well):
def\|class

regex implementation to replace group with its lowercase version

Is there any implementation of regex that allow to replace group in regex with lowercase version of it?
If your regex version supports it, you can use \L, like so in a POSIX shell:
sed -r 's/(^.*)/\L\1/'
In Perl, you can do:
$string =~ s/(some_regex)/lc($1)/ge;
The /e option causes the replacement expression to be interpreted as Perl code to be evaluated, whose return value is used as the final replacement value. lc($x) returns the lowercased version of $x. (Not sure but I assume lc() will handle international characters correctly in recent Perl versions.)
/g means match globally. Omit the g if you only want a single replacement.
If you're using an editor like SublimeText or TextMate1, there's a good chance you may use
\L$1
as your replacement, where $1 refers to something from the regular expression that you put parentheses around. For example2, here's something I used to downcase field names in some SQL, getting everything to the right of the 'as' at the end of any given line. First the "find" regular expression:
(as|AS) ([A-Za-z_]+)\s*,$
and then the replacement expression:
$1 '\L$2',
If you use Vim (or presumably gvim), then you'll want to use \L\1 instead of \L$1, but there's another wrinkle that you'll need to be aware of: Vim reverses the syntax between literal parenthesis characters and escaped parenthesis characters. So to designate a part of the regular expression to be included in the replacement ("captured"), you'll use \( at the beginning and \) at the end. Think of \ as—instead of escaping a special character to make it a literal—marking the beginning of a special character (as with \s, \w, \b and so forth). So it may seem odd if you're not used to it, but it is actually perfectly logical if you think of it in the Vim way.
1 I've tested this in both TextMate and SublimeText and it works as-is, but some editors use \1 instead of $1. Try both and see which your editor uses.
2 I just pulled this regex out of my history. I always tweak regexen while using them, and I can't promise this the final version, so I'm not suggesting it's fit for the purpose described, and especially not with SQL formatted differently from the SQL I was working on, just that it's a specific example of downcasing in regular expressions. YMMV. UAYOR.
Several answers have noted the use of \L. However, \E is also worth knowing about if you use \L.
\L converts everything up to the next \U or \E to lowercase. ... \E turns off case conversion.
(Source: https://www.regular-expressions.info/replacecase.html )
So, suppose you wanted to use rename to lowercase part of some file names like this:
artist_-_album_-_Song_Title_to_be_Lowercased_-_MultiCaseHash.m4a
artist_-_album_-_Another_Song_Title_to_be_Lowercased_-_MultiCaseHash.m4a
you could do something like:
rename -v 's/^(.*_-_)(.*)(_-_.*.m4a)/$1\L$2\E$3/g' *
In Perl, there's
$string =~ tr/[A-Z]/[a-z]/;
Most Regex implementations allow you to pass a callback function when doing a replace, hence you can simply return a lowercase version of the match from the callback.