Notepad++ regular expressions and replace - regex

I have a couple of sentences that need processing using regular expressions. They're in a text file and I'm opening it in notepad++.
<tag>There are two tags here</tag>
<tag>How am i supposed to
feel when this is happening?</tag>
<tag>I'm not sure.
But oh well<tag>
Is it possible to use notepad++'s regular expressions and replace functionality to produce an output like so:
<tag>There are two tags here</tag>
<tag>How am i supposed to feel when this is happening?</tag>
<tag>I'm not sure. But oh well<tag>
So that sentences that span over two or more lines are joined based on the fact that there is a > at the end of the sentence. Thanks.

Replace this:
[\r\n]+(?!<)
with a space
Click for Demo
Explanation:
[\r\n]+ - matches 1+ occurrences of a \r or \n
(?!<) - negative lookahead to validate that the above match is not followed by an opening tag <
Before Replacement with space:
After replacing the matches with space:

Related

Vim substitute - encase `\ref{eq:x}` with brackets

I have a latex document which has a bunch of strings of the form
Eq.~\ref{eq:x}
where x is in general a different string for each occurrence. I want to replace the above with
Eq.~(\ref{eq:x})
I can match some of the occurrences searching with /\\ref{eq:.*\} but this doesn't work if you have something like
blah Eq.~\ref{eq:x} something something \cite{this}
Note that I don't want to replace \ref{eq: with a latex macro which handles the brackets internally.
* is a greedy quantifier that will match as many characters as possible. So, if you have several } on the line, .*} will match every character up to the last } on the line.
You should use a non-greedy quantifier instead:
/\\ref{eq:.\{-}\}
See :help \{.

Find and replace using regular expressions - remove double spaces between letters only

Trying to do this in the Atom editor (1.39.1 x64, uBuntu 18.04), though assume this applies to other text editors using regular expressions.
Say we have this text:
This text has some double-spaces. Lets try to remove them.
But not after a full-stop or if three or more spaces.
Which we would like to change to:
This text has some double-spaces. Lets try to remove them.
But not after a full-stop or if three or more spaces.
Using Find with Regex enabled (.*), all occurrences are correctly found using: [a-zA-Z] [a-zA-Z]. But what goes in the Replace row to enforce the logic:
1st letter, single space, 2nd letter?
You can use this
([a-z])\s{2}([a-z])
and replace by $1 $2
Regex Demo
If your editor supports lookarounds you can use
(?<=[a-z])\s{2}(?=[a-z])
Replace by single space character
Regex demo
Note:- don't forget to use i flag for case insensitivity or just change the character class to [a-zA-Z]

How to use regex replace whilst preserving content

I am using RegReplace https://github.com/facelessuser/RegReplace to run a regular expression find and replace in sublime text.
I want to add a new line either side of my tags. I know to select a tag the regex is <(.*?)(.)>.
What is the correct regex to add a mew line either side of the tag, without replacing the content? Something like \n <(.*?)(.)> \n?
Use a positive lookahead and \K
(?=<(.*?)(.)>)|<(.*?)(.)>\K
Replace the matched boundary with \n character.
DEMO
OR
You could simply do like this,
(?=<[^<>]*>)|<[^<>]*>\K
Replace the matched boundary with \n character.
DEMO

Regular expression find/replace notepad++

I've a huge text file with lines like this:
080012;Bovalino;RC;CAL;0964;89034;B098;9021;http://www.website-most.en/000/000/
And i would like extract only:
080012;***Bovalino***;***RC***;CAL;***0964***;***89034***;B098;9021;http://www.website-most.en/000/000/
And delete all other text.
Can this be done with regular expressions?
You can capture the stuff you want to keep and use a backreference in the replacement string:
Find what: ^\d*;(\w*;\w*);\w*;(\d*;\d*).*
Replace with: \1;\2
And make sure you do not tick the . matches newline option.
With Notepad++ 6 you can also use $1;$2 for the replacement (with the same meaning).
If the different fields may contain all sorts of characters and not just digits and letters, this is probably your best bet:
Find what: ^[^;]*;([^;]*;[^;]*);[^;]*;([^;]*;[^;]*).*

Correction in regular expression

I have a string that contains a combination of words along with \r\n at few places, and \n at some places.
This is a sample:
\r\nThis is an\nexample\nand I need\na\nsolution\r\nr\nOK\r\n
Now I need to match only This is an\nexample\nand I need\na\nsolution along with \n in it
This is the expression I tried not working though
\r\n([\s\w]+)\r\n
This reads the complete string. Correction please
You don't want there to be \rs within the sentence you match? You could use a negative match:
\r\n([^\r]+)\r\n
I believe using \s will match both \r and \n as spacing characters
Try using the non-greedy quantifier ?:
\r\n([\s\w]+?)\r\n
A multiline lazy match should work:
/\r\n(.+?)\r\n/m