Regex find all XML values based on subvalue - regex

I have the following XML code:
<quantity1 value="foo" name="bar">
<subquantity duration="2">
<parameter unit="meters" />
</subquantity>
</quantity1>
I want to export all names for further analysis in another document, but only if they have a certain subvalue. For example, how can I use regex to find all names based on if unit="meters"?
Bonus points if you can instruct how to do this in Notepad++. Open to other suggestions/SO posts as well.

Regular expressions are wrong for parsing XML.
Use XPath in XSLT or a scripting language or xmlstarlet instead.
Examples:
//quantity1[subquantity/parameter/#unit="meters"]/#name
//*[*/*/#unit="meters"]/#name
//*[.//#unit="meters"]/#name

Related

how to use '*' in XPATH starts-with()?

we received banking statements from the SAP System. We sometimes observe the naming convention of the file name will be not as per the standards and the files will be rejected.
We wanted to validate the file name, as per the below example, we get the file name in the name attribute.
Can the country ISO code escape in the validation?
We wanted an Xpath that captures GLO_***_UPLOAD_STATEMENT like this so that ISO code is not validated.
Example XML:
<?xml version="1.0" encoding="UTF-8"?>
<Details name="GLO_ZFA_UPLOAD_STATEMENT" type="banking" version="3.0">
<description/>
<object>
<encrypted data="b528f05b96102f5d99743ff6122bb0984aa16a02893984a9e427a44fcedae1612104a7df1173d9c61a99ebe0c34ea67a46aecc86f41f5924f74dd525"/>
</object>
</Details>
Xpath tried:
Details[#type="banking"]/#name[not(starts-with(., "GLO_***_UPLOAD_STATEMENT"))]
which is not working :(
Can anyone help here, please :)
Thanks in advance!
Try using the matches() function for a regex like this:
Details[#type="banking"]/#name[not(matches(., "^GLO_(.){3}_UPLOAD_STATEMENT"))]
starts-with() is char based, it doesn't recognize patterns.
If your XPath version doesn't support regex then you can use something like:
Details[#type="banking"]/#name[not(starts-with(., "GLO_")) and not(ends-with(., "_UPLOAD_STATEMENT"))]
You can match regular expressions using the matches() function. For example:
//Details[#type="banking" and not(matches(#name, "GLO_[A-Z]*_UPLOAD_STATEMENT"))]/#name
Will only select Details node's name attribute for Details that have type="banking" and name not matching the regular expression "GLO_[A-Z]*_UPLOAD_STATEMENT". You can refine the regex as needed.

Parsing complex xml tag

I have below xml file
<sect2>
<title>Prophylaxis</title>
<para><calc type="weight"/> EXAMPLE DATA</para>
<para>2 months</para>
</sect2>
I tried with some regular expressions, but no luck
I wanted to extract "EXAMPLE DATA" using xslt template.
I am assuming that you want XPath to extract data from XML, refer below or link:
<xsl:value-of select="//sect2/para"/>

Regex - Verify multiline content

i am trying to verify a xml structure, where i want to check that the ns22:statement true tag is found after the postcode DataItem.
<ns21:DataItem name="country" default="false" />
<ns21:DataItem name="postcode" default="false">
<ns22:statement disabled>true</ns22:statement>
</ns21:DataItem>
I have tried this
(?m)\b.*:DataItem name="postcode" (?s)\b.*>$\n.*\bstatement disabled>true\b
but when changing postcode to country (where is supposed not to return anything) it catches all tags country, postcode and statement true.
I have also created this https://regexr.com/3quso
Any suggestions of how to get only the postcode+statement true??
XPath really does look like the best tool for the job given you're trying to validate XML structure as well as content. So, ignoring namespaces, you could use the following XPath in a soapUI XPath Match assertion:
boolean(//*[local-name()='DataItem'][#name='postcode']/*[local-name()='statement' and .='true'])
Also, in <ns22:statement disabled>true</ns22:statement>, is disabled meant to be part of the element name or an attribute? As it stands, it makes the XML invalid, so I've ignored it.
For good reasons not to use regular expressions to parse XML/HTML, see Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms

xml Regex matching the whole xml file

I need a regular expression that given the following XML, will give me all the products (productos) that have 'Bebidas' as a category (categoria), and I have to do this in Sublime Text, so only have the option to use a regular expression (no dedicated XML parser allowed):
XML File www.ethgf.com/electricos.xml
I have a problem when I use (?s)<producto>(.+?Bebidas.+?)<\/producto> because it highlights almost all the XML (the first 'producto' tag through the last tag closure).
Since the question is about selecting the whole <product> nodes, you can use the following regex:
(?s)<product>(?:\s*<(\w+)>[^<]*?<\/\1>\s*)*?\s*<category>Drinks<\/category>(?:\s*<(\w+)>[^<]*?<\/\2>\s*)*?\s*<\/product>
It will highlight all <product> nodes that contain Drinks category, even if the nodes are not following some strict order:

Regular Expression Regex in xml attrubute

I need to get text inbetween single quotes(') will be in blue color what i need to change in my xml BeginBlock attribute using Regular expression?
This is my code :
<lexem BeginBlock='\'[.*]\'' Color="Blue" />
This is not working for me. Any one can tell where i did the mistake.
Help me !
You can try this regular expressions, but as Hamza said you would be better with a XML Parser
[^']\w+(?=')
(?>=').+(?=')
Try <lexem BeginBlock="'[.*]'" Color="Blue" />.
Or even shorter <lexem BeginBlock="'.*'" Color="Blue" />.