Regex Match That doesn't contain some text - regex

I am tring to create a regex that finds a Start Prefix and an End Prefix that have paragraph tags between them. But the one i have cteated is not working to my expectations.
%%%HL_START%%%(.*?)</p><p>(.*?)%%%HL_END%%%
Correctly Matches
<p>This Should %%%HL_START%%%Work</p><p>This%%%HL_END%%% SHould Match</p>
This also matches but i dont want it to match becasue the </p><p> is not in bettween the Start and End Prefix
<p>%%%HL_START%%%One%%%HL_END%%% Some More Text %%%HL_START%%%Here%%%HL_END%%%</p><p>Some more text %%%HL_START%%%Here%%%HL_END%%%</p>

I'm not entirely comfortable that regex is the right solution here; if you are getting into nested start and stop markers, you might not have a regular language...
For this specific example, try changing the regex to use [^%] instead of . so that the .*?matching can't go past the %%%%H:_END%%%%
%%%HL_START%%%([^%]*?)</p><p>([^%]*?)%%%HL_END%%%

Related

Remove texts based on pattern

I have a file with lots of URLs:
domain1.com/blue
domain1.com/blue/
domain2.com/red
domain2.com/red/
...
[etc]
Is there a way for me to use Regex formula to keep ONLY the "domain1.com/blue" type of text, but DELETE "domain1.com/blue/"?
The pattern is that all these URLs' end have the first part the same, but at the end some have a "/"; basically i want to remove all the URLs that have the "/" at the end but keep the ones without "/" at the end.
In the end the the file should only contain these:
domain1.com/blue
domain2.com/red
...
[etc]
Thank you so much for the help! If anyone has an idea how to do this, it'd be awesome!
There are two things that you could do that I can think of.
1, Match all the lines in the file that do not satisfy your pattern and replace it with a single new line.
The regex for this /^.*\/$/ and then replace with whatever.
2, Match only the lines you want to keep and save them to a new file.
The regex /^((?!(\/$)).)*$/
you can paste these into a regex translator for in depth explanation as to what they're doing
Unfortunately, you did not specify which language you are using with regex, so I can't give you language-specific details. But a line that ends with / followed by possibly one or more white space characters can be tested for with the following regular expression:
/\/\s*$/
So read in each line and test it against the above regex. It it matches, do not write it out to the new file.
See Regex Demo

Regex to get substring from right to left

I want to use a regex in PowerShell to remove certain substrings from inside an XML file. This file somewhat looks like this:
<Name>FixedString1 FixedString2 VariableString</Name><Name>FixedString1 SearchString VariableString</Name>
So in the file there are multiple occasions of "FixedString1" and "FixedString2" inside "Name" tags. The "VariableString" is different in every occasion.
The regex needs to find "SearchString", use this as the starting point to go backwards (right to left) until the closing bracket ">" of the "Name" tag, including "FixedString1" and the "SearchString" itself. So the output of the regex needs to be
FixedString1 SearchString
which I can later delete from the XML file using PowerShell, so that I'm left with
<Name>VariableString</Name>
in the XML file.
What I tried so far in regex101.com is
FixedString1 .*(?<= SearchString )
but this regex matches from the first occasion of "FixedString1" in the file, meaning left to right, until "SearchString":
FixedString1 FixedString2 VariableString</Name><Name>FixedString1 SearchString
I want it to find the "SearchString" and from there go to the left until the first occasion of "FixedString1", including both strings.
Can you please help me with this? Thanks!
I think you want the following :
FixedString1[^>]*SearchString
It matches from FixedString1 up to SearchString, but only strings that do not include >.
regex101 test

RegEx for transforming the next text using PhpStorm's search and replace dialog

I need to transform text using regex
TPI +2573<br>
NM$ +719<br>
Молоко +801<br>
Прод. жизнь +6.5<br>
Оплод-сть +3.6<br>
Л. отела 6.3/3.9<br>
Вымя +1.48<br>
Ноги +1.61<br>
to this one
<strong>TPI</strong> +2573<br>
<strong>NM$</strong> +719<br>
<strong>Молоко</strong> +801<br>
<strong>Прод. жизнь</strong> +6.5<br>
<strong>Оплод-сть</strong> +3.6<br>
<strong>Л. отела</strong> 6.3/3.9<br>
<strong>Вымя</strong> +1.48<br>
<strong>Ноги</strong> +1.61<br>
Is it possible with regex in PhpStorm's search and replace dialog?
Given your text, you can use this regex,
.* +
and replace it with <strong>$0</strong> (Notice there is a space after </strong>)
We're using .* to capture everything but stop just before one (possible one or more) space because that's the point after which we want the text to remain intact. Once we capture the text, we use back-reference $0 to replace the match with <strong>$0</strong> so only matched text is placed within <strong> tags.
Regex Demo
Just in case, if this doesn't work for any of the samples you haven't included in your post, then please list the rules of replacement and I will give you a more robust solution, that will work flawlessly for your given rules.

Regular Expressions - Select the Second Match

I have a txt file with <i> and </i> between words that I would like to remove using Editpad
For example, I'd like to keep when it's like this:
<i>Phrases and words.</i>
And I'd like to remove the </i> and <i> tags inside the phrase, when it's like this:
<i>Phrases</i>and<i> words.</i>
<i>Phrases</i>and <i>words.</i>
I was trying to do that using regex, but I couldn't do it.
As the tag is followed by space or a word character I could find when the line has the double tag with
/ <i>|<\/i> /
but this way I can't just press replace for nothing, I have to edit line by line I search.
There's anyway to accomplish that?
* Edited *
Another example of lines found on the subtitle text
<i>- find me on the chamber.</i>
- What? <i>Go. Go, go, go!</i>
Rule number one: you can't parse html with regex.
That being said, if you know each line follows a certain pattern, you can usually hack something together to work. ;)
If I've understood correctly, it looks like you can simply remove all <i> and </i> that aren't either at the beginning or end of the lines. In that case, one method you could try is the following regex:
(?<=.)\<\/?i\>(?=.)
This will match the tags, with a lookahead and behind to make sure that we aren't at the end/start of a line (by checking if another character exists in front/behind. (Note that typically matched characters in a lookahead/behind won't be replaced when you search/replace.)
Disclaimer: this works on regex101, but notepad++ may have some differences to the pcre regex style.
update to work with Editpad
EDIT: since this question is actually wanting to know how to do this in Editpad, below is a modified alternative:
Try searching for the regex: (.)\<\/?i\>(.). This will match (and capture) exactly one character before and after the <i> tags.
When replacing, use backreferences to replace the entire match with the two captured characters - a replacement string of \1\2 should work.

Matching all occurrences of a html element attribute in notepad++ regex

I have a file which has hundreds of links like this:
<h3>aspnet</h3>
Ex 1
Ex 2
Ex 3
So I want to remove all the elements
icon="..."
from all the lines. I went through the official Notepad++ regex wiki and have come up with this after several trials:
icon=\"[^\.]+\"
The problem with this is, it is selecting past the second double quote and stopping at the next occurring double quote. To illustrate, this will select the following content:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt...">EX 1</a> <a href="
If I modify the above regex to,
icon=\"[^\.]+\">
Then it is almost perfect, but it is also selecting the >:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt...">
The regex I am looking for would select like this:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt..."
I also tried the following, but it doesn't match anything at all
icon=\"[^\.]+\"$
Just match anything but a quote, followed by a quote:
icon="[^"]+"
Just tested with notepad++ 6.2.2 and confirmed that this matches correctly as written.
Broken down:
icon="
This is fairly obvious, match the literal text icon=".
[^"]+
This means to match any character that is not a ". Adding the + after it means "one or more times."
Finally we match another literal ".
I am not a notepad++ user. so don't know how notepad++ plays with regex, but can you try to replace
icon=\"[^>]* to (empty string) ?
Try this solution:
This is I just check was working as you wanted it.
The way achieving your goal:
Find what: (icon.*")|.*?
Replace with: $1