Regular expression to find and replace wrong quotation marks - regex

I have a document which has been copy/pasted from MS Word. All the quotations are copied as ''something'' which basically is creating a mess in my LaTeX document, hence they have to be ``something''.
Is it possible to make a regular expression that finds all these ''something'' where something can be anything (including symbols, numbers etc.), and a regular expression that replaces it with the correct quotation? I am using Sublime Text which is able to use RegEX directly in the editor.

The below regex would match all the double single quoted strings and capture all the characters except the first two single quotes(only in the matched string). Replacing the matched characters with double backticks plus the characters inside group index 1 will give you the desired result.
Regex:
''(.*?'')
Replacemnet string:
``$1
DEMO

Related

Regex: How to find a string, then get charactes on either side up to a dilimeter?

I have a string like so:
foobar_something_alt=\"Brownfields1.png#asset:919\" /><p>MSG participat
And wish to find all oocurrences via the substring #asset: then select the characters around the match up to the quote marks.
Trying to extract specific ALT tags from a SQL dump. Is this possible with a regular expression?
Put [^"]* before and after the string you want to match. This will match any sequence of characters that aren't ".
[^"]*#asset:[^"]*

Regex matching, but not inside latex environment

I want to replace quotation marks in a latex document. It's written in German, which means that all quotation marks should be of the form "´text"' but some editors of the document have used these: "text", ´´text''.
The complication here is, that the document contains highlighted code using the lstlisting enviroment. In there the quotation marks should not be replaced.
I have a regex, that matches text inside the unwanted quotes, even if there are multiple words:
((``((\w+\s*)+)'')|("((\w+\s*)+)"))
I also have a regex, that matches a string ("asdf" in this case), only if it is not inside the lstlisting environment:
"asdf"(?=((?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)
They work fine on their own, but when I combine them like this:
((``((\w+\s*)+)'')|("((\w+\s*)+)"))(?=((?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)
some of the quoted strings, that should be matched are not and additionally the whole document is matched.
PS: I am currently using notepad++ for matching, because it allows . to match \n
[EDIT]: It works fine, as long as I limit the first part to single words:
((``((\w)+)'')|("((\w)+)"))(?=((?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)
To match words with whitespaces, you can use
(``[\w\s]+''|"[\w\s]+")(?=(?:(?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)
See regex demo
If you have spaces only between `` and '', or between "s, you will need to unroll the [\w\s]+ part as \w+(?:\s+\w+)*.

replace regular expression in sublime text

I have application where few labels are written like
ui-label-Display Not Masked
Now I want to replace it by
ui-label-Display_Not_Masked
so i have written search regex by
ui-label-(\w+ )*
This searches all expression but I am not able to create a expression to replace this text as required.
I have written one regex
$1_
which replaces
ui-label-Display Not Masked
by
ui-label-Display Not_Masked
This cannot be done with a single regex in a single iteration.
You have two choices:
Replace (ui-label-\w+) (note the space at the end) with $1_ until it no longer matches anything.
Make a looong regex with as many capture groups as necessary, i.e. (ui-label-\w+) (?:(\w+)(?: (\w+))?)? and replace with $1_$2_$3.

Regular expression to replace spaces with dashes within a sub string.

I've been struggling to find a way to replace spaces with dashes in a string but only spaces that are within a particular part of the string.
Source:
ABC *This is a sub string* DEF
My attempt at a regular expression:
/\s/g
If I use the regular expression to match spaces and replace I get the following result:
ABC-*This-is-a-sub-string*-DEF
But I only want to replace spaces within the text surrounded by the two asterisks.
Here is what I'm trying to achieve:
ABC *This-is-a-sub-string* DEF
Not sure why type of regular expressions I'm using as I'm using the find and replace in TextMate with Regular Expressions option enabled.
It's important to note that the strings that I will be running this regular expression search and replace on will have different text but it's just the spaces within the asterisks that I want to match.
Any help will be appreciated.
To identify spaces that are surrounded by asterisks, the key observation is, that, if asterisks appear only in pairs, the spaces you look for are always followed by an odd number of asterisks.
The regex
\ (?=[^*]*\*([^*]*\*[^*]*\*)*[^*]*$)
will match the once that should be replaced. Textmate would have to support look-ahead assertions for this to work.
s/(?<!\*)\s(?!\*)(?!$)/-/g
If TextMate supports Perl style regex commands (I have no experience with it all, sorry), this is a one-liner that should work.
try this one
/(?<=\*.*)\s(?=.*\*)/g
but it won't work in javascript if you want to use it in it, since it uses also lookbehind which is not supported in js
Try this: \s(\*[^*]*\*)\s. It will match *This is a sub string* in group 1. Then replace to -$1-.
Use this regexp to get spaces from within asterisks
(.)(*(.(\ ).)*)(.)
Take 4th element of the array provided by regex {4} and replace it with dashes.
I find this site very good for creating regular expressions.
It depends on your programming language but in many of them you can use lambda functions with your regular expression replacement statements and thereby perform further replacement on substrings.
Here's an example in Python:
string = "ABC *This is a sub string* DEF"
import re
new_string = re.sub("\*(.*?)\*", lambda x: '*' + x.group(1).replace(" ", "-") + '*', a)
That should give you ABC *This-is-a-sub-string* DEF.

Regex: finding a string with an undetermined amount of words

I have a tag that is like
tag="text textwithdot. text text"
followed by a further tag that would resemble
tag="text text text"
I wanted to use the following regular expression
tag="\w+"
but that only finds one word, how do I find the whole string within the quotes, what wildcard does that?
This should work for you:
tag="([^"]*)"
That basically means tag=" followed by zero or more characters that are not a double quote, followed by a double quote.
BTW: I'm assuming that there is no such thing as a tag that contains the double quote character. If there is such a thing, it would need some escaping rule applied to it and the regular expression would be more complicated.
Also,
tag=['"]([^"]*)['"]
if that tags could change between ' and "
You could use an ungreedy match everything.
tag="[\s\S]*?"
Or use the . with dot matches newlines flag (assuming \n is a possibility).