Notepad++ regular expression for XML replace - regex

I need to replace following lines in XML file:
hashName="'Miecz Nieb. Wojownika+5IMiecz Nieb. Wojownika+5" name="Miecz Nieb. Wojownika+5"
As the above line is not correct, I want it to be replaced like this:
hashName="'Miecz Nieb. Wojownika+5'" name="Miecz Nieb. Wojownika+5"
(It should take the item name from the name="" attr!).
This is what I got at the moment, its not working as expected since it does remove my name="..." attribute.
Search for:
hashName="(')(.*)"(.)name="(.*)"(.)/
Replace with:
hashName="'\4'" name="\4"

For this simple example this is working
Search for
hashName="[^"]*"\s*name="([^"]*)"
and replace with
hashName="'\1'" name="\1"
If you don't want to capture or group characters, don't put brackets around it, therefor I removed most of them.
To avoid that too much is matched, e.g. if you have two "name" attributes in one row, I used [^"]* to do a non greedy matching.

This should work
Search for: hashName=\".+\" name=\"(.+)\"
Replace with: hashName="'\1'" name="\1"

Related

Regular Expressions - Select the Second Match

I have a txt file with <i> and </i> between words that I would like to remove using Editpad
For example, I'd like to keep when it's like this:
<i>Phrases and words.</i>
And I'd like to remove the </i> and <i> tags inside the phrase, when it's like this:
<i>Phrases</i>and<i> words.</i>
<i>Phrases</i>and <i>words.</i>
I was trying to do that using regex, but I couldn't do it.
As the tag is followed by space or a word character I could find when the line has the double tag with
/ <i>|<\/i> /
but this way I can't just press replace for nothing, I have to edit line by line I search.
There's anyway to accomplish that?
* Edited *
Another example of lines found on the subtitle text
<i>- find me on the chamber.</i>
- What? <i>Go. Go, go, go!</i>
Rule number one: you can't parse html with regex.
That being said, if you know each line follows a certain pattern, you can usually hack something together to work. ;)
If I've understood correctly, it looks like you can simply remove all <i> and </i> that aren't either at the beginning or end of the lines. In that case, one method you could try is the following regex:
(?<=.)\<\/?i\>(?=.)
This will match the tags, with a lookahead and behind to make sure that we aren't at the end/start of a line (by checking if another character exists in front/behind. (Note that typically matched characters in a lookahead/behind won't be replaced when you search/replace.)
Disclaimer: this works on regex101, but notepad++ may have some differences to the pcre regex style.
update to work with Editpad
EDIT: since this question is actually wanting to know how to do this in Editpad, below is a modified alternative:
Try searching for the regex: (.)\<\/?i\>(.). This will match (and capture) exactly one character before and after the <i> tags.
When replacing, use backreferences to replace the entire match with the two captured characters - a replacement string of \1\2 should work.

Notepad++ replace text with RegEx search result

I would like replace a standard string in a file, with another that is a result of a regular expression. The standard text looks like:
<xsl:variable name="ServiceCode" select="###"/>
I would like to replace ### with a servicecode, that I can find later in the same file, from this URL:
<a href="/Services/xyz" target="_self">
The regular expression (?<=\/Services\/)(.*)(?=\" )
returns the required service code "xyz".
So, I opened Notepad++, added "###" to the "Find what" and this RegEx to the "Replace with" section, and expected that the ### text will be replaced by xyz.
But I got this result:
<xsl:variable name="ServiceCode" select="?<=/Services/.*?=" "/>
I am new to RegEx, do I need to use different syntax in the replace section than I use to find a string? Can someone give me a hint how to achieve the required result? The goal is to standardize tons of files with similar structure as now all servicecodes are hardcoded in several places in the file. Thanks.
You could use a lookahead for capturing the part ahead.
Search for: (?s)###(?=.*/Services/([^"]+)") and replace with: $1
(?s) makes the dot also match newlines (there is also a checkbox available in np++)
[^"] matches a character that is not "
The replacement $1 corresponds to capture of first parenthesized subpattern.
I am no expert at RegEx but I think I may be able to help. It looks like you might be going at this the wrong way. The regex search that you are using would normally work like this:
The parenthesis () in RegEx allow you to select part of your search and use that in the replace section.
You place (?<=\/Services\/)(.*)(?=\" ) into the "Find what" section in Notepad++.
Then in the "Replace with" section you could use \1 or \2 or \3 to replace the contents of your search with what was found in the (?<=\/Services\/) or (.*) or (?=\" ) searches respectively.
Depending on the structure of your files, you would need to use a RegEx search that selects both lines of code (and the specific parts you need), then use a combination of \1\2\3 etc. to replace everything exactly how it was, except for the ### which you could replace with the \number associated with xyz.
See http://docs.notepad-plus-plus.org/index.php/Regular_Expressions for more info.

Changing some XML tags names but leaving unchanged values between them

In one of my XML file I need to find and replace some opening tags names using regex and Notepad++. Also I need to leave unchanged every text between them.
Example:
<uri>http://domain-name.com/41874_01_home_big.jpg</image_big>
I need to change into:
<image_big>http://domain-name.com/41874_01_home_big.jpg</image_big>
For some reasons I can't just change uri tag, cause there are others closing tags like /image_small in the document (opened with uri of course).
I tried to change it like:
<uri>.*?</image_big>
But I don't know with what I should replace it.
I tried with:
<image_big>\1</image_big>
but result is:
<image_big></image_big>
without any text inside.
I need your help. I'm not good with regex.
Just put .*? inside a capturing group.
<uri>(.*?)<\/image_big>
Then replace the match with <image_big>\1</image_big> or <image_big>$1</image_big>
Your regex <uri>.*?</image_big> matches correctly but in-order to fetch all the characters which are matched by .*? pattern, you must need to put that pattern inside a capturing group. So that we could back-reference it for later use.
DEMO
Find:<uri>(.*?)</image_big>
Replace:<image_big>\1</image_big> or <image_big>$1</image_big>
See demo.
https://www.regex101.com/r/rK5lU1/19

Regex for the value of an HTML Property

I have a load of links that look like this:
Taboola - Content you may like
I want to delete the entire ICON and ADD_DATE attributes and their values.
I'm using sublime with a regex find/replace but I'm not sure how to write the regex to grab everything in between ICON=" AND "
Any help would be appreciated!
This should work (escaping quotes as necessary):
ICON="[^"]*"
The reason ICON=\"(.*)" won't work is that regex can 'be greedy' in what it takes. This means that if it can match more of the string to satisfy the pattern it will.
You can either specify a non greedy search, such as ICON=".*?" or explicitly declare matches on atoms that are not quotes as in the above answer.

multi-line xml regex in sublime

I have a large logfile (+100 000 lines) in XML like so:
<container>
<request:getApples xml="...">
...
</request:getApples>
<request:getOranges xml="...">
...
</request:getOranges>
</container>
...
I want to extract the :getXXXX part to
getApples
getOranges
by doing a regex find & replace in Sublime Text 2.
Something like
Find: [^(request:)]*(.*) xml
Replace: $1\n
Any regex masters that can assist?
Correcting mart1n's answer and actually using ST2 and your sample input, I came up with the following:
First, CtrlA to select all. Then, CtrlH,
Search: .*?(get\w+) .*
Replace: $1
Replace All
Then,
Search: ^[^get].*$
Replace: nothing
Replace All
Finally,
Search: ^\n
Replace: nothing
Replace All
And you're left with:
getApples
getOranges
Not familiar with Sublime Text but you can do in two parts:
Find .*?\(get\w+\) .* and replace with \1. Now those get* strings are on separate lines with nothing else. All that remains is to remove the cruft.
So, many ways to do this. Easy one: find ^[^g][^e][^t].*$ and replace with nothing (an empty string).
Now you have your file that contains just the string you want and some empty lines, which (I hope) Sublime can get rid of with some delete-empty-lines function.
You can quickly throw all of the above in a macro and execute at will for any input following the same format ;-)
If you're willing to take the problem out of sublime text, you can use the dotall flag along with lazy matching to extract only the getXXX parts.
Replacing
.*?(get\w*) .*?
with
$1\n
should get you most of the way, only leaving a bit of easily removeable closing tags at the end of the file that I can't figure out at present.
You can check this solution here.
Maybe someone could take this and figure out a way to remove the extra closing tags.
Try this
Find what: :(\w+)>|.\s?
Replace with: $1
And if didn't work as intended, then let me know?