How to match new-line character using regex - regex

I'm trying to match the special characters and the breaks using a regex to cleanse the string so that I'm left with only the following string 'All release related activities' extracted from the following line:
{"Well":"\n\n\t\n\t\n\n\t\n\tAll release related activities\n\n\t"}
I've tried the regex ^{.Well":" and I'm able to match till the first colon appears. How do I match the \n characters in the string?

I am not quite sure about the prefix of "well:" So I am basically providing you with a basic regex:
^\{[^}]*?(?:\\[ntr])+([^}]+)\}
and replace by:
\1
Example

Try:
/":"(?:\\[nt])*(.*)}"$/
See Regex Demo
":" Matches ":".
(?:\\[nt])* Matches 0 or more occurrences of either \n or \t.
(.*) Matches 0 or more characters in Capture Group 1 until:
}"$ Matches }" followed by the end of the string.
The string you are looking for is in Capture Group 1.

Related

Replace characters within a specific string

I have a text file with URLs where space is + and it needs to be %20 to work.
For example:
http://myserver/abc/this+is+my+document.doc
I want it to be:
http://myserver/abc/this%20is%20my%20document.doc
How to replace + with %20, but only when the string starts with http://myserver/abc? Don't want to replace any other +'s in the document.
Thanks in advance!
You can use the following regex:
See it in use here
(?:http://myserver/abc|\G(?!\A))[^\s+]*\K\+
Replace with %20
How the regex works?
(?:http://myserver/abc|\G(?!\A)) matches either http://myserver/abc literally, or the previously matched location (\G is previously matched location or start of the string and (?!\A) prevents \G from matching the start of the string)
[^\s+]* matches any character except whitespace and + (literally) any number of times
\K resets the match. Any previously consumed characters are excluded from the final match
\+ match this character literally

How to replace specific character one time

I want to replace character - using regular expression in my text so it would work like this:
Original text: abcd-efg-hijk-lmno
Text after replacing: abcd-efg-hijk/lmno
As you can see I want to replace character - starting from the end just one time with character /.
Thanks in advance for any tips
Find what: -([^-]*)$
Replace with: /$1
Search Mode: Regular Expression
Explanation:
- : a dash
([^-]*$) : text with no dash,
zero or more times,
to the end of the line,
put in the $1 variable
/$1 : literal "/", contents of $1
Good resource: http://www.grymoire.com/Unix/Regular.html
To replace characters in Notepad++, you can open the Replace window using Ctrl+H, or under the "Search" menu. Once open, enter the following regular expression:
(.{4}-.{3}-.{4})(-)(.{4})
This will find:
a group of four characters (the "." being any character, the "{4}" being the quantity),
a dash,
a group of three characters,
another dash,
a group of four characters,
again another dash,
then a group of four characters.
The parentheses group this search into captured groups, which we will use for the replacement part. See https://www.regular-expressions.info/brackets.html for more info.
If you want to restrict the search to lowercase letters as in your example, you would replace the "." with "[a-z]", or for upper and lower "[a-z,A-Z]".
Now for the replacement. The groups from earlier are referenced by the dollar sign then the number, e.g. $1 would be the first. So we will replace the characters found with the first group ($1), disregard the second group containing the dash and insert the "/" instead, then include the third group ($3):
$1/$3
The settings in the replace window need to have "Regular expression" and "Wrap around" checked, and ". matches newline" unchecked.
You can then click Replace all to replace all occurrences, or go through using Replace individually.
Since the beginning and end of line characters are not included, you can find multiple occurrences of this pattern on a single line.
Note: This answer follows the same procedure as Toto's, however uses a different regular expression.
Ctrl+H
Find what: ^(.+)-([^-]+)$
Replace with: $1/$2
check Wrap around
check Regular expression
DO NOT CHECK . matches newline
Replace all
Explanation:
^ : begining of line
(.+) : 1 or more any character, catch in group 1
- : a dash
([^-]+) : 1 or more any character but dash, catch in group 2
$ : end of line

How to replace all string between second and third occurance of pipe character using regex?

I would like to replace all the characters occurring between second and third pipe character. Example:
|hello|welcome,to|
In this I want replace welcome,to with a blank value, with a huge file to be replaced. I need a regex pattern to be used in notepad++.
Assuming you're parsing each line, i.e. a token cannot include a newline:
Find what:
^((?:[^|\n]*\|){2})[^|\n]*+\|
Replace with:
$1|
Description
^- matches the start of line
((?:[^|\n]*\|){2}) - matches and captures in group 1:
(?:[^|\n]*\|){2} the following expression repeated twice
[^|\n]* - any character except a pipe or a newline
\| - followed by a pipe
[^|\n]*+ - the token you want to remove (any char except pipe or newline)
\| - followed by a pipe
In notepad++ use this regex for search:
^([^|]*\|[^|]*\|)[^|]*
And replace by:
$1
([^|]*\|[^|]*\|) matches and captures text before 2nd | into a capturing group #1.
RegEx Demo

Remove all characters after a certain match

I am using Notepad++ to remove some unwanted strings from the end of a pattern and this for the life of me has got me.
I have the following sets of strings:
myApp.ComboPlaceHolderLabel,
myApp.GridTitleLabel);
myApp.SummaryLabel + '</b></div>');
myApp.NoneLabel + ')') + '</label></div>';
I would like to leave just myApp.[variable] and get rid of, e.g. ,, );, + '...', etc.
Using Notepad++, I can match the strings themselves using ^myApp.[a-zA-Z0-9].*?\b (it's a bit messy, but it works for what I need).
But in reality, I need negate that regex, to match everything at the end, so I can replace it with a blank.
You don't need to go for negation. Just put your regex within capturing groups and add an extra .*$ at the last. $ matches the end of a line. All the matched characters(whole line) are replaced by the characters which are present inside the first captured group. .
matches any character, so you need to escape the dot to match a literal dot.
^(myApp\.[a-zA-Z0-9].*?\b).*$
Replacement string:
\1
DEMO
OR
Match only the following characters and then replace it with an empty string.
\b[,); +]+.*$
DEMO
I think this works equally as well:
^(myApp.\w+).*$
Replacement string:
\1
From difference between \w and \b regular expression meta characters:
\w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.
(^.*?\.[a-zA-Z]+)(.*)$
Use this.Replace by
$1
See demo.
http://regex101.com/r/lU7jH1/5

Is there a way to "recall" a char sequence already matched in the regex itself?

The regex I'm searching has the following constraints:
it starts with "//"
then "[" a non number sequence (called delimiter in this list) and "]"
next line "\n"
"[" 0 or more number separated by the delimiter previously found "]".
For example the following text matches the regex:
//[*#*]
[1*#*34*#*64]
and the following text doesn't match the regex:
//[*#*]
[1#34#64]
because the delimiter is not the same matched in the first row
The regex I currently create is
^//\[(\D)+\]\n\[[(\d)+(\D)+]*(\d)+\]$|^//\[(\D)+\]\n\[\]$|^//\[(\D)+\]\n\[(\d)+\]$
but obviously this regex match with both previous examples.
Is there a way to "recall" a char sequence already matched in the regex itself?
You need something called back-reference (a very good tutorial here).
Use this regex in Python:
r'^//\[([^\]]+)\]\n\[\d+(\1\d+)*\]'
Sample run:
>>> string = """//[*#*]
... [1*#*34*#*64]"""
>>> print re.search(r'^//\[([^\]]+)\]\n\[\d+(\1\d+)*\]',string).group(0)
//[*#*]
[1*#*34*#*64]
will match your string in Python.
Debuggex Demo
You need to use a back-reference, in most languages you can reference a matching group using \n where n is the group number.
This pattern will work:
//\[([^]]++)]\n\[(?>\d++\1?)+]
To break it down:
// just matches the literal
\[([^]]++)] matches some characters in square brackets
\n matches the newline
\[(?:\d++\1?)++] matches one or more digits followed by the match captured in the first pattern section - optionally. This is an atomic group.