Using - replace and pattern matching with XML tags Powershell - regex

I am trying to replace the contents of a string which contains xml tags as follows
I want to replace the entirety of the below statement, where ABCDEF could be any random value
<originalFileName>ABCDEF</originalFileName>
How would I do this?

Solution
You can try this for your purpose:
(<originalFileName>[\w]*</originalFileName>)
Not recommended
However, note that it is not recommended.
Regular expressions are a tool that is insufficient to understand the constructs employed by XML/HTML/XHTML. XML/HTML/XHTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down XML/HTML/XHTML into its meaningful parts. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing XML/HTML/XHTML.XML/HTML/XHTML is a language of sufficient complexity that it cannot be parsed by regular expressions.
Further details : RegEx match open tags except XHTML self-contained tags

This could help you:
EDITED (this new one will also accept non-characters between your tags and I have scaped the / symbol which can give some errors if not):
(<originalFileName>.*<\/originalFileName>)
Check it here.

Related

Regular Expression to match parent and sub node

I want to development a regular expresion to match the tag :
<claim-text>aaaaaaa
<claim-text>bbbbbbb</claim-text>
<claim-text>ccccccc</claim-text>
</claim-text>
I tried
<claim-text>(.*)</claim-text>
But, only bbbbbbb and ccccccc can be matched. Can I get some help to cover aaaaaaa also?
Thanks
For a generic solution with any depth, you will at least need a stack, which not available for most regular expression implementation. However, if you know the structure will only have the depth you specified, you could use something like this:
<claim-text>([^<\r\n]*)
You can see a working example here: https://regex101.com/r/kbDbwF/1
It will search for your opening tag, and then find anything up to the next opening or closing tag [^<], or to the next line break [^\r\n]. I have combined both character classes to one definition [^<\r\n]. However, this is not a general solution!
Do not under any circumstances try to parse HTML with a regex unless you wish to invoke rite 666 Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.
Use an HTML parsing library see this page for some ways to do it.

Regex force group order

I'm new in regex and I have a question.
Like in this example, https://regex101.com/r/Iak7cF/1/ how do I force
src="wow"
to be in group 1, and
title="toto"
to be in group 2?
I want to capture this kind of text in any order only if it contains:
class="formula"
Am I doing it right?
You'd better use an HTML parser
But if you really want to use regex, you have to use named groups to achieve what you want.
<img(?=[^>]*class="formula")(?=.*(?<src>src=".*"))(?=.*(?<title>title=".*")).*>
DEMO
Regular expressions are very flexible and powerful, but in general, they are not the right tool for parsing XML, HTML, or XHTML. From WinBatch:
Regular Expressions are only good for parsing text that is tightly defined. Since Regular Expressions don't really understand the context of matches, they can be fooled in a big way if the structure of the text changes. In particular, Regular Expressions have difficulty with hierarchy.
PerlMonks has a detailed explanation of why regex is not a good solution for all but the most simple of casess. They summarize it like this:
So I hope it is clear: Please, don't try to parse arbitrary XML/HTML with regexes!

online tool available to validate regex in firestore?

There are tools available to validate the regex used in javascript / prolong etc but i am writing rules in google-cloud-firestore. I want some tool to check my regex.
please suggest.
If you read my original answer. Ignore it.
You can use the matches comparison.
matches
Performs a regular expression match, returns true if the whole
string matches the given regular expression. Uses Google RE2 syntax.
The full list of string validation rules available for Cloud Firestore are shown here.

Regex to match format of valid markup language tags

I am trying to write regex for all type tags either it is html or xml.
I wrote two regex for this
<(\"[^\"]*\"|'[^']*'|[^'\">])*>
<html.*>(.*?)</html>
these are matching all valid tags,,,but it is matching invalid tags too like:
<"font size=12">
...so I want regex for valid tags only. Can anybody please help??
Some people worked for this with code coverage to get a good HTML/XML tag matcher (many traps!)
One of the working solution may be: http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx/
The Regex is <\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>
It matchs individually opening + ending tags, useful if you want to remove tags for instance (in fact you can not expect really more with a simple regex as Jithin answered you)

Problems with finding and replacing

Hey stackoverflow community. Ive need help with huge information file. Is it possible with regular expression to find in this tag:
<category_name><![CDATA[Prekiniai ženklai>Adler|Kita buitinė technika>Buičiai naudingi prietaisai|Kita buitinė technika>Lygintuvai]]></category_name>
Somehow replace all the other data and leave only 'Adler' or 'Lygintuvai'. Im using Altova to edit xml files, so i cant find other way then find-replace. And im new in the regex stuff. So i thought maby you can help me.
#\<category_name\>.+?gt\;([\w]+?)\|.+?gt;([\w]+?)\]\]\>\<\/category_name\>#i
\1 - Adler
\2 - Lygintuvai
PHP
regex101.com
Fields may contain alphanumeric characters without spaces.
If you want to modify the scope of acceptable characters change [\w] to something other:
[a-z] - only letters
[0-9] - only digits
etc.
It's possible, but use of regular expressions to process XML will never be 100% correct (you can prove that using computer science theory), and it may also be very inefficient. For example, the solution given by Luk is incorrect because it doesn't allow whitespace in places where XML allows it. Much better to use XQuery or XSLT, both of which are designed for the job (and both work in Altova). You can then use XPath expressions to locate the element or attribute nodes you are interested in, and you can still use regular expressions (e.g. in the XPath replace() function) to process the content of text or attribute nodes.
Incidentally, your input is rather strange because it uses escape sequences like > within a CDATA section; but XML escape sequences are not recognized in a CDATA section.