Regex matching, but not inside latex environment - regex

I want to replace quotation marks in a latex document. It's written in German, which means that all quotation marks should be of the form "´text"' but some editors of the document have used these: "text", ´´text''.
The complication here is, that the document contains highlighted code using the lstlisting enviroment. In there the quotation marks should not be replaced.
I have a regex, that matches text inside the unwanted quotes, even if there are multiple words:
((``((\w+\s*)+)'')|("((\w+\s*)+)"))
I also have a regex, that matches a string ("asdf" in this case), only if it is not inside the lstlisting environment:
"asdf"(?=((?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)
They work fine on their own, but when I combine them like this:
((``((\w+\s*)+)'')|("((\w+\s*)+)"))(?=((?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)
some of the quoted strings, that should be matched are not and additionally the whole document is matched.
PS: I am currently using notepad++ for matching, because it allows . to match \n
[EDIT]: It works fine, as long as I limit the first part to single words:
((``((\w)+)'')|("((\w)+)"))(?=((?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)

To match words with whitespaces, you can use
(``[\w\s]+''|"[\w\s]+")(?=(?:(?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)
See regex demo
If you have spaces only between `` and '', or between "s, you will need to unroll the [\w\s]+ part as \w+(?:\s+\w+)*.

Related

Regex to exclude quoted strings

I know that there are tons of similar question; I read hundreds, but...for my litlle knowledge of English and my even lower knowledge of Regex, I'am still in the fog.
I need to elaborate a quite large text file which includes paragraphs in two formats: enclosed in quotes or not; in both cases paragraphs could have one or more Carriage Return. I have to process only the lines enclosed in quotes. So: "This is \r a phrase" must be processed (actually I have to replace the \r with ad dummy character like '#'), while 'This is \r a comment' must be excluded.
I tried this pattern: "[\s\S(\r)]+"
This correctly selects only the enclosed paragraphs, but the regex debugger does not report the \r group to be replaced.
Try this pattern: "[\s\S](\r)[\s\S]"
You need to escape the \ character, since \r means something specific with RegEx.

Find and replace using regular expressions - remove double spaces between letters only

Trying to do this in the Atom editor (1.39.1 x64, uBuntu 18.04), though assume this applies to other text editors using regular expressions.
Say we have this text:
This text has some double-spaces. Lets try to remove them.
But not after a full-stop or if three or more spaces.
Which we would like to change to:
This text has some double-spaces. Lets try to remove them.
But not after a full-stop or if three or more spaces.
Using Find with Regex enabled (.*), all occurrences are correctly found using: [a-zA-Z] [a-zA-Z]. But what goes in the Replace row to enforce the logic:
1st letter, single space, 2nd letter?
You can use this
([a-z])\s{2}([a-z])
and replace by $1 $2
Regex Demo
If your editor supports lookarounds you can use
(?<=[a-z])\s{2}(?=[a-z])
Replace by single space character
Regex demo
Note:- don't forget to use i flag for case insensitivity or just change the character class to [a-zA-Z]

Regular expression to find and replace wrong quotation marks

I have a document which has been copy/pasted from MS Word. All the quotations are copied as ''something'' which basically is creating a mess in my LaTeX document, hence they have to be ``something''.
Is it possible to make a regular expression that finds all these ''something'' where something can be anything (including symbols, numbers etc.), and a regular expression that replaces it with the correct quotation? I am using Sublime Text which is able to use RegEX directly in the editor.
The below regex would match all the double single quoted strings and capture all the characters except the first two single quotes(only in the matched string). Replacing the matched characters with double backticks plus the characters inside group index 1 will give you the desired result.
Regex:
''(.*?'')
Replacemnet string:
``$1
DEMO

Using regex to replace unescaped quotes

I'm trying to use a regex search and replace to find and fix any unescaped quotation marks with escaped question marks. This is not in any particular language - just using regex to search and replace in Sublime Text 2.
I can find them just fine with this regex:
([a-zA-Z0-9!##$%^&*()_+=-\?><:;\/])\"
Trying to replace is giving me some headaches. I thought this would work:
$0\\\"
but it's adding an extra quote in (or leaving the previous one there somehow).
e.g.,
e"
becomes
e"\"
instead of just
e\"
What the hey? I can't seem to find a combination in the replacement that will work!
In the replacement $0 will be a reference to the entire match, including the quote. It looks like you should be using $1 instead which will be the first capturing group, so just the character immediately before the quote. So your replacement string would be "$1\\\"".

Regex to match word only in quotes

I'm trying to prepare a regex to match a word if it is in quotes.
i.e If the text is as follows, I want to match HelloWorld inside quotes but not the other one not in quotes. (should match 2nd instance of HelloWorld but not the 1st one)
HelloWorld " Showing HelloWorld"
Use Case:
I need to find text HelloWorld inside quotes but not other HelloWorld instances used as variable names or class names when I search in Eclipse IDE by regex.
You could use the following.
'HelloWorld " Showing HelloWorld"'.match(/"(.*?)"+/g); //["" Showing HelloWorld""]
As a quick-and-dirty possibility you could scan the string with this:
/"[^"]*?"/
This will match the first pair of quotes (and the contained text) in the string (note the use of *? for a "minimal" match). Do that globally in a loop to extract text inside quotes. Every time you match a new quoted piece, check it to see if it contains what you're looking for (possibly with another regex).
This will fail if there is an escaped quote in the string, although this could of course be dealt with.
if all patterns will be like
" Showing HelloWorld"
you can match using this
/".+\s(\w+)"/g
if your language supports [:alpha:]
you can match only letters
/".+\s([[:alpha:]]+)"/g
matching against \w+ can get both letters and numbers