I want to find a regex that catch all strings that are not inside name('stringName') pattern.
For example I have this text:
fdlfksj "hello1" dsffsf "hello2\"hi" name("Tod").name('tod') 'hello3'
I want my regex to catch the strings:
"hello1", "hello2\"hi", 'hello3' (it should also should catch "hello2\"hi" because I want to ignore " escaping).
I want also that my regex will ignore "Tod" because it's inside the pattern name("...")
How should I do it?
Here is my regex that doens't work:
((?<!(name\())("[^"]*"|'[^']*'))
It doesn't work with ignore escaping: \" and \'
and it's also not ignore name("Tod")
How can I fix it?
You can use the following regex:
(?<!name\()(["'])[^\)]+?(?<!\\)\1
It will match anything other than parenthesis ([^\)]+?):
preceeded by (["']) - a quote symbol
followed by (?<!\\)\1 - the same quote symbol, which is not preceeded by a slash
In order to avoid getting the values that come after name(, there's a condition that checks that (?<!name\().
Check the demo here.
(["'])((?:\\\1)|[^\1]*?)\1
Regex Explanation
( Capturing group
["'] Match " (double) or ' (single) quote
) Close group
( Capturing group
(?: Non-capturing group
\\\1 Match \ followed by the quote by which it was started
) Close non-capturing group
| OR
[^\1]*? Non-gready match anything except a quote by which it was started
) Close group
\1 Match the close quote
See the demo
You could get out of the way what you don't want, and use a capture group for what you want to keep.
The matches that you want are in capture group 2.
name\((['"]).*?\1\)|('[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*")
Explanation
name\((['"]).*?\1\) Match name and then from the opening parenthesis till closing parenthesis between the same type of quote
| Or
( Capture group 2
('[^'\\]*(?:\\.[^'\\]*)*' match the single quoted value including matching escapes ones
|Or
[^"\\]*(?:\\.[^"\\]*)*" The same for the double quotes
) Close group 2
Regex demo
Related
Currently I have a regex expression ([^\[\][\[^\[\][\n"]+) to match text between "", but this does not capture whitespaces, for e.g. if I enter " hello ", it will return hello, without the spaces before and after the word.
Is there some expression I can use to just simply catch anything between two quotation marks?
Thank you.
Maybe this will help:
(?<!\\)(\"|')(.+?)(?:(?<!\\)\1)
And to get the text inside the quotes, get the second capture group.
Proof.
Explanation
(?<!\\) - Negative lookbehind. Looks for literal backslash ('')
(\"|') - to test for the start of the "string"
(.+?) - . will match anything but newlines.
+? means as much as possible but only as much needed to match.
(?:(?<!\\)\1) - Non capturing group.
Used here so we can use the (?<!\\) described earlier without looking behind the whole expression. The
\1 matches the first capture group ((\"|')). Can be replaced with $1
You should use following regex:
\"\s*([^\"]+?)\s*\"
([^\"]+?)The text you want to get will be between space and quote.
Demo & Explanation
I am using TextMate to replace expression [my_expression] consisting in characters between open and closed brackets by {my_expression}; so I tried to replace
\[[^]]*\]
by
{$1}
The regex matches the correct expression, but the replacement gives {$1}, so that the variable is not recognised. Can someone has an idea ?
You forgot to escape a character, [^]] should be [^\]].
You also need a capture group. $1 is back-referencing the 1st Capture Group, and you had no capture groups, so use the following Regex:
\[([^\]]*)\]
This adds () around [^\]]*, so the data inside the [] is captured. For more info, see this page on Capture Groups
However, this RegEx is shorter:
\[(.*?)\]
Also substituting with {$1}
Live Demo on Regex101
Use a capturing group (...):
\[([^\]]*)\]
The $1 is a backreference to the text enclosed with [...].
Here is the regex demo and also Numbered Backreferences.
Also, the TextMate docs:
1. Syntax elements
(...) group
20.4.1 Captures
To reference a capture, use $n where n is the capture register number. Using $0 means the entire match.
And also:
If you want to use [, -, ] as a normal character in a character class, you should escape these characters by \.
I want to match certain lines inside any text and inside that match, I want to replace a certain character as often, as it occurs.
Sample Text:
Any text and "much" "more" of it. Don't replace quotes here
CatchThis( no quotes here, "any more text" , "and so on and so forth...")
catchthat("some other text" , "or less")
some text in "between"
CatchAnything ( "even more" , "and more", no quotes there, "wall of text")
more ("text"""") and quotes after...
Now I want to replace every quote inside the round brackets with, lets say, a hash sign.
Desired outcome:
Any text and "much" "more" of it. Don't replace quotes here
CatchThis( no quotes here, #any more text# , #and so on and so forth...#)
catchthat(#some other text# , #or less#)
some text in "between"
CatchAnything ( #even more# , #and more#, no quotes there, #wall of text# )
more ("text"""") and quotes after...
Matching the lines is easy. Here's my pattern for that:
(?i)Catch(?:This|That|Anything)[ \t]*\(.+\)
Unfortunately, I have no idea how to match every quote and replace it...
The common approach to matching all occurrences of some pattern inside 2 different delimiters is via using \G anchor based regular expression.
(?i)(?:\G(?!\A)|Catch(?:This|That|Anything)\s*\()[^()"]*\K"
See the regex demo.
Explanation:
(?i) - case insensitive modifier
(?: - a non-capturing group matching 2 alternatives
\G(?!\A) - a place in the string right after the previous successful match (as \G also matches the start of the string, the (?!\A) is necessary to exclude that possibility)
| - or
Catch(?:This|That|Anything) - Catch followed with either This or That or Anything
\s* - 0+ whitespaces
\( - a literal ( symbol
) - end of the non-capturing group
[^()"]* - any 0+ chars other than (, ) and "
\K - a match reset operator
" - a double quote.
Do you really need to replace this inside regex? If your regex finds what you want, you can replace character on found string
I am using TextMate to replace expression [my_expression] consisting in characters between open and closed brackets by {my_expression}; so I tried to replace
\[[^]]*\]
by
{$1}
The regex matches the correct expression, but the replacement gives {$1}, so that the variable is not recognised. Can someone has an idea ?
You forgot to escape a character, [^]] should be [^\]].
You also need a capture group. $1 is back-referencing the 1st Capture Group, and you had no capture groups, so use the following Regex:
\[([^\]]*)\]
This adds () around [^\]]*, so the data inside the [] is captured. For more info, see this page on Capture Groups
However, this RegEx is shorter:
\[(.*?)\]
Also substituting with {$1}
Live Demo on Regex101
Use a capturing group (...):
\[([^\]]*)\]
The $1 is a backreference to the text enclosed with [...].
Here is the regex demo and also Numbered Backreferences.
Also, the TextMate docs:
1. Syntax elements
(...) group
20.4.1 Captures
To reference a capture, use $n where n is the capture register number. Using $0 means the entire match.
And also:
If you want to use [, -, ] as a normal character in a character class, you should escape these characters by \.
I have a function, translate(), takes multiple parameters. The first param is the only required and is a string, that I always wrap in single quotes, like this:
translate('hello world');
The other params are optional, but could be included like this:
translate('hello world', true, 1, 'foobar', 'etc');
And the string itself could contain escaped single quotes, like this:
translate('hello\'s world');
To the point, I now want to search through all code files for all instances of this function call, and extract just the string. To do so I've come up with the following grep, which returns everything between translate(' and either ') or ',. Almost perfect:
grep -RoPh "(?<=translate\(').*?(?='\)|'\,)" .
The problem with this though, is that if the call is something like this:
translate('hello \'world\', you\'re great!');
My grep would only return this:
hello \'world\
So I'm looking to modify this so that the part that currently looks for ') or ', instead looks for the first occurrence of ' that hasn't been escaped, i.e. doesn't immediately follow a \
Hopefully I'm making sense. Any suggestions please?
You can use this grep with PCRE regex:
grep -RoPh "\btranslate\(\s*\K'(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*'" .
Here is a regex demo
RegEx Breakup:
\b # word boundary
translate # match literal translate
\( # match a (
\s* # match 0 or more whitespace
\K # reset the matched information
' # match starting single quote
(?: # start non-capturing group
[^'\\\\]* # match 0 or more chars that are not a backslash or single quote
) # end non-capturing group
(?: # start non-capturing group
\\\\. # match a backslash followed by char that is "escaped"
[^'\\\\]* # match 0 or more chars that are not a backslash or single quote
)* # end non-capturing group
' # match ending single quote
Here is a version without \K using look-arounds:
grep -oPhR "(?<=\btranslate\(')(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*(?=')" .
RegEx Demo 2
I think the problem is the .*? part: the ? makes it a non-greedy pattern, meaning it'll take the shortest string that matches the pattern. In effect, you're saying, "give me the shortest string that's followed by quote+close-paren or quote+comma". In your example, "world\" is followed by a single quote and a comma, so it matches your pattern.
In these cases, I like to use something like the following reasoning:
A string is a quote, zero or more characters, and a quote: '.*'
A character is anything that isn't a quote (because a quote terminates the string): '[^']*'
Except that you can put a quote in a string by escaping it with a backslash, so a character is either "backslash followed by a quote" or, failing that, "not a quote": '(\\'|[^'])*'
Put it all together and you get
grep -RoPh "(?<=translate\(')(\\'|[^'])*(?='\)|'\,)" .