Regex to get substring from right to left - regex

I want to use a regex in PowerShell to remove certain substrings from inside an XML file. This file somewhat looks like this:
<Name>FixedString1 FixedString2 VariableString</Name><Name>FixedString1 SearchString VariableString</Name>
So in the file there are multiple occasions of "FixedString1" and "FixedString2" inside "Name" tags. The "VariableString" is different in every occasion.
The regex needs to find "SearchString", use this as the starting point to go backwards (right to left) until the closing bracket ">" of the "Name" tag, including "FixedString1" and the "SearchString" itself. So the output of the regex needs to be
FixedString1 SearchString
which I can later delete from the XML file using PowerShell, so that I'm left with
<Name>VariableString</Name>
in the XML file.
What I tried so far in regex101.com is
FixedString1 .*(?<= SearchString )
but this regex matches from the first occasion of "FixedString1" in the file, meaning left to right, until "SearchString":
FixedString1 FixedString2 VariableString</Name><Name>FixedString1 SearchString
I want it to find the "SearchString" and from there go to the left until the first occasion of "FixedString1", including both strings.
Can you please help me with this? Thanks!

I think you want the following :
FixedString1[^>]*SearchString
It matches from FixedString1 up to SearchString, but only strings that do not include >.
regex101 test

Related

Remove texts based on pattern

I have a file with lots of URLs:
domain1.com/blue
domain1.com/blue/
domain2.com/red
domain2.com/red/
...
[etc]
Is there a way for me to use Regex formula to keep ONLY the "domain1.com/blue" type of text, but DELETE "domain1.com/blue/"?
The pattern is that all these URLs' end have the first part the same, but at the end some have a "/"; basically i want to remove all the URLs that have the "/" at the end but keep the ones without "/" at the end.
In the end the the file should only contain these:
domain1.com/blue
domain2.com/red
...
[etc]
Thank you so much for the help! If anyone has an idea how to do this, it'd be awesome!
There are two things that you could do that I can think of.
1, Match all the lines in the file that do not satisfy your pattern and replace it with a single new line.
The regex for this /^.*\/$/ and then replace with whatever.
2, Match only the lines you want to keep and save them to a new file.
The regex /^((?!(\/$)).)*$/
you can paste these into a regex translator for in depth explanation as to what they're doing
Unfortunately, you did not specify which language you are using with regex, so I can't give you language-specific details. But a line that ends with / followed by possibly one or more white space characters can be tested for with the following regular expression:
/\/\s*$/
So read in each line and test it against the above regex. It it matches, do not write it out to the new file.
See Regex Demo

Perl, replace multiple matches in string

So, i'm parsing an XML, and got a problem. XML has objects containing script, which looks about that:
return [
['measurement' : org.apache.commons.io.FileUtils.readFileToByteArray(new File('tab_2_1.png')),
'kpi' : org.apache.commons.io.FileUtils.readFileToByteArray(new File('tab_2_2.png'))]]
I need to replace all filenames, saving file format, every entry of regexp template, because string can look like that:
['measurement' : org.apache.commons.io.FileUtils.readFileToByteArray(new File('tab_2_1.png'))('tab_2_1.png'))('tab_2_1.png')),
and i still need to replace all image_name before .png
I used this regexp : .*\(\'(.*)\.png\'\),
but it catches only last match in line before \n, not in whole string.
Can you help me with correcting this regexp?
The problem is that .* is greedy: it matches everything it can. So .*x matches all up to the very last x in the string, even if all that contains xs. You need the non-greedy
s/\('(.*?)\.png/('$replacement.png/g;
where the ? makes .* match up to the first .png. The \(' are needed to suitably delimit the pattern to the filename. This correctly replaces the filenames in the shown examples.
Another way to do this is \('([^.]*)\.png, where [^.] is the negated character class, matching anything that is not a .. With the * quantifier it again matches all up to the first .png
The question doesn't say how exactly you are "parsing an XML" but I dearly hope that it is with libraries like XML::LibXML of XML::Twig. Please do not attempt that with regex. The tool is just not fully adequate for the job, and you'll get to know about it. A lot has been written over years about this, search SO.

Regular Expressions - Select the Second Match

I have a txt file with <i> and </i> between words that I would like to remove using Editpad
For example, I'd like to keep when it's like this:
<i>Phrases and words.</i>
And I'd like to remove the </i> and <i> tags inside the phrase, when it's like this:
<i>Phrases</i>and<i> words.</i>
<i>Phrases</i>and <i>words.</i>
I was trying to do that using regex, but I couldn't do it.
As the tag is followed by space or a word character I could find when the line has the double tag with
/ <i>|<\/i> /
but this way I can't just press replace for nothing, I have to edit line by line I search.
There's anyway to accomplish that?
* Edited *
Another example of lines found on the subtitle text
<i>- find me on the chamber.</i>
- What? <i>Go. Go, go, go!</i>
Rule number one: you can't parse html with regex.
That being said, if you know each line follows a certain pattern, you can usually hack something together to work. ;)
If I've understood correctly, it looks like you can simply remove all <i> and </i> that aren't either at the beginning or end of the lines. In that case, one method you could try is the following regex:
(?<=.)\<\/?i\>(?=.)
This will match the tags, with a lookahead and behind to make sure that we aren't at the end/start of a line (by checking if another character exists in front/behind. (Note that typically matched characters in a lookahead/behind won't be replaced when you search/replace.)
Disclaimer: this works on regex101, but notepad++ may have some differences to the pcre regex style.
update to work with Editpad
EDIT: since this question is actually wanting to know how to do this in Editpad, below is a modified alternative:
Try searching for the regex: (.)\<\/?i\>(.). This will match (and capture) exactly one character before and after the <i> tags.
When replacing, use backreferences to replace the entire match with the two captured characters - a replacement string of \1\2 should work.

Notepad++ replace text with RegEx search result

I would like replace a standard string in a file, with another that is a result of a regular expression. The standard text looks like:
<xsl:variable name="ServiceCode" select="###"/>
I would like to replace ### with a servicecode, that I can find later in the same file, from this URL:
<a href="/Services/xyz" target="_self">
The regular expression (?<=\/Services\/)(.*)(?=\" )
returns the required service code "xyz".
So, I opened Notepad++, added "###" to the "Find what" and this RegEx to the "Replace with" section, and expected that the ### text will be replaced by xyz.
But I got this result:
<xsl:variable name="ServiceCode" select="?<=/Services/.*?=" "/>
I am new to RegEx, do I need to use different syntax in the replace section than I use to find a string? Can someone give me a hint how to achieve the required result? The goal is to standardize tons of files with similar structure as now all servicecodes are hardcoded in several places in the file. Thanks.
You could use a lookahead for capturing the part ahead.
Search for: (?s)###(?=.*/Services/([^"]+)") and replace with: $1
(?s) makes the dot also match newlines (there is also a checkbox available in np++)
[^"] matches a character that is not "
The replacement $1 corresponds to capture of first parenthesized subpattern.
I am no expert at RegEx but I think I may be able to help. It looks like you might be going at this the wrong way. The regex search that you are using would normally work like this:
The parenthesis () in RegEx allow you to select part of your search and use that in the replace section.
You place (?<=\/Services\/)(.*)(?=\" ) into the "Find what" section in Notepad++.
Then in the "Replace with" section you could use \1 or \2 or \3 to replace the contents of your search with what was found in the (?<=\/Services\/) or (.*) or (?=\" ) searches respectively.
Depending on the structure of your files, you would need to use a RegEx search that selects both lines of code (and the specific parts you need), then use a combination of \1\2\3 etc. to replace everything exactly how it was, except for the ### which you could replace with the \number associated with xyz.
See http://docs.notepad-plus-plus.org/index.php/Regular_Expressions for more info.

Regex Match That doesn't contain some text

I am tring to create a regex that finds a Start Prefix and an End Prefix that have paragraph tags between them. But the one i have cteated is not working to my expectations.
%%%HL_START%%%(.*?)</p><p>(.*?)%%%HL_END%%%
Correctly Matches
<p>This Should %%%HL_START%%%Work</p><p>This%%%HL_END%%% SHould Match</p>
This also matches but i dont want it to match becasue the </p><p> is not in bettween the Start and End Prefix
<p>%%%HL_START%%%One%%%HL_END%%% Some More Text %%%HL_START%%%Here%%%HL_END%%%</p><p>Some more text %%%HL_START%%%Here%%%HL_END%%%</p>
I'm not entirely comfortable that regex is the right solution here; if you are getting into nested start and stop markers, you might not have a regular language...
For this specific example, try changing the regex to use [^%] instead of . so that the .*?matching can't go past the %%%%H:_END%%%%
%%%HL_START%%%([^%]*?)</p><p>([^%]*?)%%%HL_END%%%