Remove xml header using QRegexp from String in Qt - c++

I want to remove <?xml version="1.0" encoding="UTF-8" ?> from string in Qt for this I wrote code:
outputText.replace(QRegExp("<\?xml.*?\?>"),""); where outputText is QString.
But xml header was not removed even I tried with regular expression "\\<\\?xml(.+?)\\?\\>" too for this xml heaader but this regex is not working so please let me know valid regular expression which will remove above mentioned xml header from the string.

Try using:
QRegExp("\\<\\?xml[^(\\?\\>)]*\\?\\>");
You will have to escape >, <, ? with \, which itself needs to be escaped by \ for being a C/C++ string.
With this actually you match everything starting with <?xml up to the next ?>.

Related

Xpath Matching a node and getting the value of it

Below is the xml file:
file1.xml
<?xml version="1.0" encoding="UTF-8"?><W4N xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:functx="http://www.functx.com"><LUNGROUP><OBJECT lungroupID="0" lunIds="0,221,228"/></LUNGROUP><LUNGROUP><OBJECT lungroupID="1" lunIds="1,3,5/></LUNGROUP></W4N>
I want to match on lunIds. I have given the below xpath expression /W4N/LUNGROUP/OBJECT[tokenize(#lunIds,',')='228']
Its showing the result as Elements found: 1
Now my requirement is to get the lungroupID of the matched element.How can I do this using xpath? Any help is highly appreciated.
I don't see the XML you intended to post, but you should be able to add the attribute you want at the end of your xpath expression:
/W4N/LUNGROUP/OBJECT[tokenize(#lunIds,',')='228']/#lungroupID

how to use '*' in XPATH starts-with()?

we received banking statements from the SAP System. We sometimes observe the naming convention of the file name will be not as per the standards and the files will be rejected.
We wanted to validate the file name, as per the below example, we get the file name in the name attribute.
Can the country ISO code escape in the validation?
We wanted an Xpath that captures GLO_***_UPLOAD_STATEMENT like this so that ISO code is not validated.
Example XML:
<?xml version="1.0" encoding="UTF-8"?>
<Details name="GLO_ZFA_UPLOAD_STATEMENT" type="banking" version="3.0">
<description/>
<object>
<encrypted data="b528f05b96102f5d99743ff6122bb0984aa16a02893984a9e427a44fcedae1612104a7df1173d9c61a99ebe0c34ea67a46aecc86f41f5924f74dd525"/>
</object>
</Details>
Xpath tried:
Details[#type="banking"]/#name[not(starts-with(., "GLO_***_UPLOAD_STATEMENT"))]
which is not working :(
Can anyone help here, please :)
Thanks in advance!
Try using the matches() function for a regex like this:
Details[#type="banking"]/#name[not(matches(., "^GLO_(.){3}_UPLOAD_STATEMENT"))]
starts-with() is char based, it doesn't recognize patterns.
If your XPath version doesn't support regex then you can use something like:
Details[#type="banking"]/#name[not(starts-with(., "GLO_")) and not(ends-with(., "_UPLOAD_STATEMENT"))]
You can match regular expressions using the matches() function. For example:
//Details[#type="banking" and not(matches(#name, "GLO_[A-Z]*_UPLOAD_STATEMENT"))]/#name
Will only select Details node's name attribute for Details that have type="banking" and name not matching the regular expression "GLO_[A-Z]*_UPLOAD_STATEMENT". You can refine the regex as needed.

Regex to get comma within xml tags

I am new to regex. I have an xml like
<Root xmlns="rooter"><add>This is an example, test</add></Root>, 123, test, 8765
I want to find only comma which is within the xml tags
I have tried
<Root.*\,.*</Root>
and
<Root.*>(\,)
It return the xml tag but I want only comma and replace with other character.
I want to replace this comma with other character in atom text editor. If I replace it, it should be like
<Root xmlns="rooter"><add>This is an example# test</add></Root>, 123, test, 8765
The following regex will work if the text is the same format as you have defined above.
,(?=[^\/<]*<\/)
I have used look ahead here. You can check the link for more details.
https://www.regular-expressions.info/lookaround.html

Trying to get v. simple Regex to work on XML file

This is my snippet of XML (the actual full file is 6964 lines):
<?xml version="1.0" encoding="UTF-8"?>
<listings xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchema Location="http://www.gstatic.com/localfeed/local_feed.xsd">
<language>en</language>
<id>43927</id>
<cell1>Andover House</cell1>
<cell2>28-30 Camperdown</cell2>
<cell3>Great Yarmouth</cell3>
<cell4>NR30 3JB</cell4>
<cell5>GB</cell5>
<cell6>52.6003767</cell6>
<cell7>1.7339649</cell7>
<cell8>+44 1493843490</cell8>
<category>British</cell9>
<cell10>http://contentadmin.livebookings.com/dynamaster/image_archive/original/f24c60a52e7ac0874be57e51bce30726.jpg</cell10>
<cell11>http://www.bookatable.co.uk/andover-house-great-yarmouth-norfolk</cell11>
For each category tag in the above snippet, I would simply like to add this text: Restaurant - (with one whitespace after the hyphen)
So the final result will be:
<category>Restaurants - British</category>
I am very new to Regex and find it very difficult, so this is what I've tried so far: https://regex101.com/r/yY5jB6/2
It looks like it is working in Regex 101 but when I bring it into a text editor like Sublime 2 (on Mac) and Notepad ++ (on Windows) using find/replace (specifying regex in settings), it says it can't find anything. Please help! Thanks!
NotePad++ uses \1 instead of $1, if you change your substitution from
$1Restaurants - to \1Restaurants - then it should work. (sourced from this question)
if you search for
<category>([^<]*)<\/.*>
and replace it with
<category>Restaurants - $1</category>
it would even work with your strange input that contains a </item9> tag.

xslt 2.0 how replace $ by escaped dollar (for conversion to LaTeX)

I am new to XSLT. I googled extensively but couldn't figure out how to do the following:
I am transforming XML to LaTeX. Of course, LaTeX needs to escape characters such as $ and #. I tried the following in the replace function but it does not work. (They do work without the replace function.)
<xsl:template match="xyz:doc">
\subsubsection{<xsl:value-of select="replace( xyz:headline, '(\$)', '\$1' )"/>}
...
</xsl:template>
<xsl:template match="xyz:doc">
\subsubsection{<xsl:value-of select="replace( xyz:headline, '\$', '\$' )"/>}
...
</xsl:template>
Possible content to be escaped is:
"Locally defined field #931" or
"Locally defined subfield $b"
What am I doing wrong?
Many thanks for your answers!
If you want to replace a dollar symbol $ in the input with \$ in the output then use replace(xyz:headline, '\$', '\\\$').
If there are several characters that need the same escaping then replace(xyz:headline, '([$#])', '\\$1') should do.
Sample at http://xsltransform.net/bdxtqX/1