how to use '*' in XPATH starts-with()? - regex

we received banking statements from the SAP System. We sometimes observe the naming convention of the file name will be not as per the standards and the files will be rejected.
We wanted to validate the file name, as per the below example, we get the file name in the name attribute.
Can the country ISO code escape in the validation?
We wanted an Xpath that captures GLO_***_UPLOAD_STATEMENT like this so that ISO code is not validated.
Example XML:
<?xml version="1.0" encoding="UTF-8"?>
<Details name="GLO_ZFA_UPLOAD_STATEMENT" type="banking" version="3.0">
<description/>
<object>
<encrypted data="b528f05b96102f5d99743ff6122bb0984aa16a02893984a9e427a44fcedae1612104a7df1173d9c61a99ebe0c34ea67a46aecc86f41f5924f74dd525"/>
</object>
</Details>
Xpath tried:
Details[#type="banking"]/#name[not(starts-with(., "GLO_***_UPLOAD_STATEMENT"))]
which is not working :(
Can anyone help here, please :)
Thanks in advance!

Try using the matches() function for a regex like this:
Details[#type="banking"]/#name[not(matches(., "^GLO_(.){3}_UPLOAD_STATEMENT"))]

starts-with() is char based, it doesn't recognize patterns.
If your XPath version doesn't support regex then you can use something like:
Details[#type="banking"]/#name[not(starts-with(., "GLO_")) and not(ends-with(., "_UPLOAD_STATEMENT"))]

You can match regular expressions using the matches() function. For example:
//Details[#type="banking" and not(matches(#name, "GLO_[A-Z]*_UPLOAD_STATEMENT"))]/#name
Will only select Details node's name attribute for Details that have type="banking" and name not matching the regular expression "GLO_[A-Z]*_UPLOAD_STATEMENT". You can refine the regex as needed.

Related

Xpath Matching a node and getting the value of it

Below is the xml file:
file1.xml
<?xml version="1.0" encoding="UTF-8"?><W4N xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:functx="http://www.functx.com"><LUNGROUP><OBJECT lungroupID="0" lunIds="0,221,228"/></LUNGROUP><LUNGROUP><OBJECT lungroupID="1" lunIds="1,3,5/></LUNGROUP></W4N>
I want to match on lunIds. I have given the below xpath expression /W4N/LUNGROUP/OBJECT[tokenize(#lunIds,',')='228']
Its showing the result as Elements found: 1
Now my requirement is to get the lungroupID of the matched element.How can I do this using xpath? Any help is highly appreciated.
I don't see the XML you intended to post, but you should be able to add the attribute you want at the end of your xpath expression:
/W4N/LUNGROUP/OBJECT[tokenize(#lunIds,',')='228']/#lungroupID

Regex find all XML values based on subvalue

I have the following XML code:
<quantity1 value="foo" name="bar">
<subquantity duration="2">
<parameter unit="meters" />
</subquantity>
</quantity1>
I want to export all names for further analysis in another document, but only if they have a certain subvalue. For example, how can I use regex to find all names based on if unit="meters"?
Bonus points if you can instruct how to do this in Notepad++. Open to other suggestions/SO posts as well.
Regular expressions are wrong for parsing XML.
Use XPath in XSLT or a scripting language or xmlstarlet instead.
Examples:
//quantity1[subquantity/parameter/#unit="meters"]/#name
//*[*/*/#unit="meters"]/#name
//*[.//#unit="meters"]/#name

Regex - Verify multiline content

i am trying to verify a xml structure, where i want to check that the ns22:statement true tag is found after the postcode DataItem.
<ns21:DataItem name="country" default="false" />
<ns21:DataItem name="postcode" default="false">
<ns22:statement disabled>true</ns22:statement>
</ns21:DataItem>
I have tried this
(?m)\b.*:DataItem name="postcode" (?s)\b.*>$\n.*\bstatement disabled>true\b
but when changing postcode to country (where is supposed not to return anything) it catches all tags country, postcode and statement true.
I have also created this https://regexr.com/3quso
Any suggestions of how to get only the postcode+statement true??
XPath really does look like the best tool for the job given you're trying to validate XML structure as well as content. So, ignoring namespaces, you could use the following XPath in a soapUI XPath Match assertion:
boolean(//*[local-name()='DataItem'][#name='postcode']/*[local-name()='statement' and .='true'])
Also, in <ns22:statement disabled>true</ns22:statement>, is disabled meant to be part of the element name or an attribute? As it stands, it makes the XML invalid, so I've ignored it.
For good reasons not to use regular expressions to parse XML/HTML, see Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms

Regex for matching a complete element in a xml file

I would like to know the Regex that match these kind of sequences
<person name="the name I want" ....[other things]>
.... [other tags]
</person>
I tried with something like this:
<person +name="the name I want" +.*
But I'm not going any further, I can only match the first line, but not the complete element
Would you like to help me?
Try this:
<person[*>]*name="the name I want"[^>]*>(.|[\r\n])*?<\/person>
If your language supports the "dotall" flag, you can use that and change (.|[\r\n])* to just .*.
I found this in another stackoverflow thread:
<person(.|\r\n)*?<\/person>
I hope it is useful
Edit:
I forgot to add the name attribute
(<person name="the name I want"(.|\r\n)*?<\/person>)

Search & replace regex - filtering files

little bit of background:
I work at a multilingual communication company, where we’re working with a CMS system. Since its last update, all the files I export out of the system are ‘polluted’ with metadata, which I don't want to see, use or replace. To filter and change a heap of xml files, I use Powergrep, which operates with regexes.
I want my regex to find, e.g. "there is no spoon", "oracle", "I know kung-fu" and "bending method" (all straight quotation marks) and replace it with “there is no spoon”, “oracle”, “I know kung-fu” and “bending method” (all with curly quotation marks).
I don’t want it to find the metadata "concept.dtd" and "map.dtd"
The following lines are the first lines of my xml file. It's this "concept.dtd" that I would like to ignore.
<?xml version="1.0" encoding="UTF-16" standalone="no"?>
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"[
]>
<?ish ishref="GUID-6B84EF92-DA99-4C54-BA91-FD0A113D4A96" version="1" lang="sv" srclng="en"?>
This is somewhere in the middle of the xml file
<row>
<entry colname="col1" valign="middle" align="left">"Bending method" </entry>
<entry colname="col2" valign="middle" align="left">another word</entry>
</row>
So.. this is the original regex:
(?<!=)”\b(.+?)\b”(?! \[)
Replacement:
“1”
Problem:
As the metadata “concept.dtd” and “map.dtd” are part of the file, I don’t want to replace their quotation marks in order not to change anything crucial. So I tried rewriting the regex:
(?<!=)”\b(.+?[\.d])\b”(?! \[)
It almost works: “concept.dtd” and “map.dtd” are skipped, most of the terms between quotation marks are found, but not all: “Bending method” is not found, for example.
What am I missing? Any help or opinions would be greatly appreciated!
Based on your last answers, here is a regexp that can help you:
(?<=<entry)[^>]+>[^<>]*?(".+?")[^<>]*?(?=<\x2Fentry>)
Description
Demo
http://regex101.com/r/lX2cU3
Discussion
I assume that you have one serie of words between straight quotations and that there are no carriage returns ou line feeds inside an <entry> node.