Regex to match word only in quotes - regex

I'm trying to prepare a regex to match a word if it is in quotes.
i.e If the text is as follows, I want to match HelloWorld inside quotes but not the other one not in quotes. (should match 2nd instance of HelloWorld but not the 1st one)
HelloWorld " Showing HelloWorld"
Use Case:
I need to find text HelloWorld inside quotes but not other HelloWorld instances used as variable names or class names when I search in Eclipse IDE by regex.

You could use the following.
'HelloWorld " Showing HelloWorld"'.match(/"(.*?)"+/g); //["" Showing HelloWorld""]

As a quick-and-dirty possibility you could scan the string with this:
/"[^"]*?"/
This will match the first pair of quotes (and the contained text) in the string (note the use of *? for a "minimal" match). Do that globally in a loop to extract text inside quotes. Every time you match a new quoted piece, check it to see if it contains what you're looking for (possibly with another regex).
This will fail if there is an escaped quote in the string, although this could of course be dealt with.

if all patterns will be like
" Showing HelloWorld"
you can match using this
/".+\s(\w+)"/g
if your language supports [:alpha:]
you can match only letters
/".+\s([[:alpha:]]+)"/g
matching against \w+ can get both letters and numbers

Related

Regex: How to match a part of the text within two characters e.g. quotes

I need to match a text within a text that is surrounded by two characters, in this case ‘ and ’. So assume that the whole string is:
Regarding the cat, I asked him ‘can you take care of my cat while I am away’ and he
said ‘yes’.
Now, if I use the following regex
(?<=‘)(.*?)(?=’)
It will match
can you take care of my cat while I am away
and
yes
What if I want to search for a single character e.g. "e" (matches in both quoted strings) or word e.g. "cat" within those two groups? How can I do that? I cannot figure out how to replace (.*?) in order to search for a substring/character within those special quotes.
You only need to replace the dot that is too permissive with a class that excludes the closing quote and the first character of your target:
(?<=‘)([^’e]*(e)[^’]*)(?=’)
or
(?<=‘)([^’c]*(?:(?:\Bc|c(?!at\b))[^’c]*)*\b(cat)\b[^’]*)(?=’)

Regex matching, but not inside latex environment

I want to replace quotation marks in a latex document. It's written in German, which means that all quotation marks should be of the form "´text"' but some editors of the document have used these: "text", ´´text''.
The complication here is, that the document contains highlighted code using the lstlisting enviroment. In there the quotation marks should not be replaced.
I have a regex, that matches text inside the unwanted quotes, even if there are multiple words:
((``((\w+\s*)+)'')|("((\w+\s*)+)"))
I also have a regex, that matches a string ("asdf" in this case), only if it is not inside the lstlisting environment:
"asdf"(?=((?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)
They work fine on their own, but when I combine them like this:
((``((\w+\s*)+)'')|("((\w+\s*)+)"))(?=((?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)
some of the quoted strings, that should be matched are not and additionally the whole document is matched.
PS: I am currently using notepad++ for matching, because it allows . to match \n
[EDIT]: It works fine, as long as I limit the first part to single words:
((``((\w)+)'')|("((\w)+)"))(?=((?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)
To match words with whitespaces, you can use
(``[\w\s]+''|"[\w\s]+")(?=(?:(?!\\end\{lstlisting\}).)*\\begin\{lstlisting\}?)
See regex demo
If you have spaces only between `` and '', or between "s, you will need to unroll the [\w\s]+ part as \w+(?:\s+\w+)*.

Regular expression to find and replace wrong quotation marks

I have a document which has been copy/pasted from MS Word. All the quotations are copied as ''something'' which basically is creating a mess in my LaTeX document, hence they have to be ``something''.
Is it possible to make a regular expression that finds all these ''something'' where something can be anything (including symbols, numbers etc.), and a regular expression that replaces it with the correct quotation? I am using Sublime Text which is able to use RegEX directly in the editor.
The below regex would match all the double single quoted strings and capture all the characters except the first two single quotes(only in the matched string). Replacing the matched characters with double backticks plus the characters inside group index 1 will give you the desired result.
Regex:
''(.*?'')
Replacemnet string:
``$1
DEMO

Regex - Match string between quotes (") but do not match (\") before the string

I need to match a string that is in quotations, but make sure the first quotation is not escaped.
For example: First \"string\" is "Hello \"World\"!"
Should match only Hello \"World\"!
I am trying to modify (")(?:(?=(\\?))\2.)*?"
I tried adding [^\\"] to ("), and that kinda works, but it matches either only (") or every other letter that isn't (\") and I can't figure out a way to modify ([\\"]") to only match (") if it is not (\")
This is what I have so far ([^\\"]")(?:(?=(\\?))\2.)*?"
I've been trying to figure it out using these two pages, but still cannot get it.
Can Regex be used for this particular string manipulation?
RegEx: Grabbing values between quotation marks
Thanks
You can use negative look behind like this:
(?<!\\)"(.*?)(?<!\\)"
Check see it in action here on regex101
The first match group contains:
Hello \"World\"!

what can be the regex for the following string

I am doing this in groovy.
Input:
hip_abc_batch hip_ndnh_4_abc_copy_from_stgig abc_copy_from_stgig
hiv_daiv_batch hip_a_de_copy_from_staging abc_a_de_copy_from_staging
I want to get the last column. basically anything that starts with abc_.
I tried the following regex (works for second line but not second.
\abc_.*\
but that gives me everything after abc_batch
I am looking for a regex that will fetch me anything that starts with abc_
but I can not use \^abc_.*\ since the whole string does not start with abc_
It sounds like you're looking for "words" (i.e., sequences that don't include spaces) that begin with abc_. You might try:
/\babc_.*\b/
The \b means (in some regular expression flavors) "word boundary."
Try this:
/\s(abc_.*)$/m
Here is a commented version so you can understand how it works:
\s # match one whitepace character
(abc_.*) # capture a string that starts with "abc_" and is followed
# by any character zero or more times
$ # match the end of the string
Since the regular expression has the "m" switch it will be a multi-line expression. This allows the $ to match the end of each line rather than the end of the entire string itself.
You don't need to trim the whitespace as the second capture group contains just the text. After a cursory scan of this tutorial I believe this is the way to grab the value of a capture group using Groovy:
matcher = (yourString =~ /\s(abc_.*)$/m)
// this is how you would extract the value from
// the matcher object
matcher[0][1]
I think you are looking for this: \s(abc_[a-zA-Z_]*)$
If you are using perl and you read all lines into one string, don't forget to set the the m option on your regex (that stands for "Treat string as multiple lines").
Oh, and Regex Coach is your free friend.