How do I match all newline chars \n that are not surrounded by " in regex.
For example:
It should match each of these newlines:
"2547026616"
79587329
,"A","2547026616
"79587329","
But shouldn't match these (except for the last \n):
"254702661679587329"
"A"
"254702661679587329"
This regex might be what you're looking for:
(?:(?<!")\n(?!"))|(?:(?<=")\n(?!"))|(?:(?<!")\n(?="))
This will catch any newline that is not preceded by a quotation mark and/or not followed by a quotation mark, it uses negative and positive look-ahead and look-behind, you can read about them more here. Depending on your programming language these might not be supported, and/or you might need to do some escaping in the regex.
Here is an example on regex101, I've used x\n but this is just to make it easy to display the matches.
Try this:
([^"]\n.?|.?\n[^"])
It matches the 2 characters surrounding the \n as well (if present)
If you want also the last line to end with a double quote, even if it is not followed by \n, then extend the regexp to:
([^"](\n.?|$)|.?\n[^"])
... and if the first line must start with a double quote, then extend to:
([^"](\n.?|$)|(^|.?\n)[^"])
Related
Currently I have a regex expression ([^\[\][\[^\[\][\n"]+) to match text between "", but this does not capture whitespaces, for e.g. if I enter " hello ", it will return hello, without the spaces before and after the word.
Is there some expression I can use to just simply catch anything between two quotation marks?
Thank you.
Maybe this will help:
(?<!\\)(\"|')(.+?)(?:(?<!\\)\1)
And to get the text inside the quotes, get the second capture group.
Proof.
Explanation
(?<!\\) - Negative lookbehind. Looks for literal backslash ('')
(\"|') - to test for the start of the "string"
(.+?) - . will match anything but newlines.
+? means as much as possible but only as much needed to match.
(?:(?<!\\)\1) - Non capturing group.
Used here so we can use the (?<!\\) described earlier without looking behind the whole expression. The
\1 matches the first capture group ((\"|')). Can be replaced with $1
You should use following regex:
\"\s*([^\"]+?)\s*\"
([^\"]+?)The text you want to get will be between space and quote.
Demo & Explanation
I'm working in notepad++, and using its find-replace dialog box.
NP++ documentation states: Notepad++ regular expressions use the Boost regular expression library v1.70, which is based on PCRE (Perl Compatible Regular Expression) syntax. ref: https://npp-user-manual.org/docs/searching
What I'm trying to do should be simple, but I'm a regex novice, and after 2-3 hrs of web searches and playing with online regex testers, I give up.
I want to replace all single quotes ' with double quote " , but if and only if the ' is to the RIGHT of one or more #, ie inside a python comment.
For example,
list1 = ['apple','banana','pear'] # All 'single quotes' to LEFT of # remained unchanged.
list2 = ['tomato','carrot'] # All 'single quotes' to RIGHT of one or more # are replaced
# # with "double quotes", like this.
The np++ file is over 800 lines, manual replacement would be tedious & error prone. Advice appreciated.
This regex should do what you want:
(^[^#]*#|(?<!^)\G)[^'\n]*\K'
It looks for a ' which is preceded by either
^[^#]*# : start of line and some number of non-# characters followed by a #; or
(?<!^)\G : the start of line or the end of the previous match (\G), with a negative lookbehind for start of line (?<!^), meaning that it only matches at the end of the previous match
and then some number of non ' or newline (to prevent the match wrapping around the end of the previous line) characters [^'\n]*.
We then use \K to reset the match, so that everything before that is discarded from the match, and the regex only matches the '.
That can then be replaced with ".
Demo on regex101
Update
You can avoid matching apostrophes within words by only matching ones that are either preceded or followed by a non-word character:
(^[^#]*#|(?<!^)\G)[^'\n]*\K('(?=\W)|(?<=\W)')
Demo on regex101
Update 2
You can also deal with the case where there are # characters in strings by qualifying the first part of the regex with the requirement for there to be matched pairs of quotes beforehand:
(?:^[^'#]*(?:'[^']*'[^#']*)*[^'#]*#|(?<!^)\G)[^'\n]*\K(?:'(?=\W)|(?<=\W)')
Demo on regex101
With regular expression I would like to get all characters between round brackets, but \( and \) characters should be also included in the result.
Examples:
input: fo(ob)a)r
output: ob
input: foo(bar\(qwerty\))baz
output: bar\(qwerty\)
This is what I used for finding text between brackets:
(?<=\()([^\s\(\)]+)(?=\)), but I can't make exceptions for brackets preceded by \.
You could do something like this :
.*(?<!\\)\((.*?)(?<!\\)\)
Basically, it matches as many characters as possible until it sees an open parenthesis without a backslash (using a negative lookbehind), then groups the next matching characters until a closing parenthesis (still without a backslash).
Note that this regex may not work properly if you escape the backslashes.
Example : https://regex101.com/r/BqVKZp/1
This regex works for both your examples, without any lookaheads and lookbehinds:
\((.+[^\\])\)
A U flag is needed.
I am using Notepad++ to remove some unwanted strings from the end of a pattern and this for the life of me has got me.
I have the following sets of strings:
myApp.ComboPlaceHolderLabel,
myApp.GridTitleLabel);
myApp.SummaryLabel + '</b></div>');
myApp.NoneLabel + ')') + '</label></div>';
I would like to leave just myApp.[variable] and get rid of, e.g. ,, );, + '...', etc.
Using Notepad++, I can match the strings themselves using ^myApp.[a-zA-Z0-9].*?\b (it's a bit messy, but it works for what I need).
But in reality, I need negate that regex, to match everything at the end, so I can replace it with a blank.
You don't need to go for negation. Just put your regex within capturing groups and add an extra .*$ at the last. $ matches the end of a line. All the matched characters(whole line) are replaced by the characters which are present inside the first captured group. .
matches any character, so you need to escape the dot to match a literal dot.
^(myApp\.[a-zA-Z0-9].*?\b).*$
Replacement string:
\1
DEMO
OR
Match only the following characters and then replace it with an empty string.
\b[,); +]+.*$
DEMO
I think this works equally as well:
^(myApp.\w+).*$
Replacement string:
\1
From difference between \w and \b regular expression meta characters:
\w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.
(^.*?\.[a-zA-Z]+)(.*)$
Use this.Replace by
$1
See demo.
http://regex101.com/r/lU7jH1/5
In regular expression, i know when use \s to represent a space, but, in following case, would they be different:
/a\sb/ ---with a \s
/a b/ ---with empty field
thanks a lot if you can explain to me.
The \s character class matches all "whitespace characters," not just spaces. This includes tabs (\t), and if multiline matching is allowed, it includes carriage return (\r) and newline (\n). Theoretically, if your regular expression engine handles unicode, there are also unicode whitespace characters that \s can match, though your mileage may vary.
So with a string like "a\t b", you can match it with the regex /a\s+b/, in case that is useful to you.