I have an unformatted xml file in which I would like to delete tags of a specific name that contain some value.
Example:
<XmlElement1>
</XmlElement1>
<XmlElement2 ... >
...
<Xml1SubElement someParameter="...SearchTerm..."/>
...
</XmlElement2>
<XmlElement3/>
... stands for random characters and random multiple lines
In above example I would like to delete all XmlElement2 elements that contain "SearchTerm" in the body. In other words select all text between <XmlElement2 and </XmlElement2> across multiple lines where SearchTerm is in the middle and replace with "".
I'm using UltraEdit on MacOS and am flexible with what tools to use.
Your help is much appreciated!
The Perl regular expression search string for this task can be for example:
(?s)^[\t ]*<XmlElement2(?:.(?!</XmlElement2>))+?SearchTerm.+?</XmlElement2>[\t ]*(?:\r?\n|\r)
Explanation:
(?s) ... flag to match newline characters also by dot in search expression.
^[\t ]* ... start search at beginning of a line and match 0 or more tabs or spaces.
<XmlElement2 ... the start tag of the element to remove on containing SearchTerm.
(?:.(?!</XmlElement2>))+? ... a non marking group to find any character one or more times non-greedy as long as the string after the current character is not </XmlElement2>. The negative lookahead (?!</XmlElement2>) prevents selecting a block starting with <XmlElement2 and matching anything including one or even more </XmlElement2> and <XmlElement2 tags until SearchTerm is found anywhere in file.
SearchTerm ... string which must be found inside element XmlElement2.
.+? ... any character (including newline characters) one or more times non-greedy. Non-greedy means here to stop matching characters on next occurrence of </XmlElement2> and not on last occurrence of </XmlElement2> in file.
</XmlElement2> ... the end tag of the XML element to remove on containing SearchTerm.
[\t ]*(?:\r?\n|\r) ... 0 or more tabs or spaces and either DOS/Windows (carriage return + line-feed) or UNIX (just line-feed) or MAC (just carriage return) line ending.
PS: The Perl regular expression replace was tested with UltraEdit for Windows v22.20.0.49 on Windows XP and v25.20.0.88 on Windows 7 as I don't have a Mac.
I would like to find an regex expression that contains any characters and any carriage returns:
Example . I would like to find the String from "Mytext.." to "EndofMyText"
Anytext
Mytextstartshere
more text
more text
EndofMyText
LastTet
my Problem is the carriage returns.
What is the correct regex match expression ? :
I started with : Mytext(.*?[\r\n])EndofMyText
Maybe you could repeat this part one or more times (?:.*?[\r\n])+ and make it a non capturing group
Mytext(?:.*?[\r\n])+EndofMyText
or use [\s\S]+?
Mytext[\s\S]+?EndofMyText
When using a flag for dot matches newline, you could use Mytext.*?EndofMyText
I want to delete all lines ending with |
I tried
.*[|;]
but it's not the end
Use the following regex:
.*\|$
This says "any character any number of times (.*), followed by a pipe (\| - you have to escape it), and then the end of a line ($)".
If you want to find lines ending with either ; or |, use:
.*[\|;]$
You don't have to escape the pipe in this case, but I prefer to do so anyway.
In either case, make sure you're in "Regular expression" search mode with ". matches newline" unchecked.
I'm trying to understand this as I'm reading tutorials and apply this to what I'm doing.
I have a file with lines of text like:
line1blahblahblahblah
line2blahblahblahblah
...
line10blahblahblahblah
I want to go in and remove the line and the number after it (which is incremented 1-1000 for each line) and replace it with new text leaving all the text after in tact.
Can someone explain how and explain the regex expression?
Search for
^line\d+
And replace with an empty string.
Explanation: The ^ matches the begining of the line, the line matches a literal character sequence, and the \d matches any digit character. The + after the \d makes it match one or more digits characters.
Your Notepad++ search panel should look like this:
I am trying to make simple regex that will check if a line is blank or not.
Case;
" some" // not blank
" " //blank
"" // blank
The pattern you want is something like this in multiline mode:
^\s*$
Explanation:
^ is the beginning of string anchor.
$ is the end of string anchor.
\s is the whitespace character class.
* is zero-or-more repetition of.
In multiline mode, ^ and $ also match the beginning and end of the line.
References:
regular-expressions.info/Anchors, Character Classes, and Repetition.
A non-regex alternative:
You can also check if a given string line is "blank" (i.e. containing only whitespaces) by trim()-ing it, then checking if the resulting string isEmpty().
In Java, this would be something like this:
if (line.trim().isEmpty()) {
// line is "blank"
}
The regex solution can also be simplified without anchors (because of how matches is defined in Java) as follows:
if (line.matches("\\s*")) {
// line is "blank"
}
API references
String String.trim()
Returns a copy of the string, with leading and trailing whitespace omitted.
boolean String.isEmpty()
Returns true if, and only if, length() is 0.
boolean String.matches(String regex)
Tells whether or not this (entire) string matches the given regular expression.
Actually in multiline mode a more correct answer is this:
/((\r\n|\n|\r)$)|(^(\r\n|\n|\r))|^\s*$/gm
The accepted answer: ^\s*$ does not match a scenario when the last line is blank (in multiline mode).
Try this:
^\s*$
Full credit to bchr02 for this answer. However, I had to modify it a bit to catch the scenario for lines that have */ (end of comment) followed by an empty line. The regex was matching the non empty line with */.
New: (^(\r\n|\n|\r)$)|(^(\r\n|\n|\r))|^\s*$/gm
All I did is add ^ as second character to signify the start of line.
The most portable regex would be ^[ \t\n]*$ to match an empty string (note that you would need to replace \t and \n with tab and newline accordingly) and [^ \n\t] to match a non-whitespace string.
Here Blank mean what you are meaning.
A line contains full of whitespaces or a line contains nothing.
If you want to match a line which contains nothing then use '/^$/'.
Somehow none of the answers from here worked for me when I had strings which were filled just with spaces and occasionally strings having no content (just the line terminator), so I used this instead:
if (str.trim().isEmpty()) {
doSomethingWhenWhiteSpace();
}
Well...I tinkered around (using notepadd++) and this is the solution I found
\n\s
\n for end of line (where you start matching) -- the caret would not be of help in my case as the beginning of the row is a string
\s takes any space till the next string
hope it helps
This regex will delete all empty spaces (blank) and empty lines and empty tabs from file
\n\s*