Remove XML attribute from string

Remove XML attribute from string - regex

A have a string like '<node attr="some_value">'. How to remove attr="some_value" from this string? I know only attr attribute name and don't know "some_value" value.
P.S. I'm using JavaScript but solution for any language will be great. Thanks in advance.

Try this: Needs jquery.
var xml = '<node attr="some_value">';
var newXml = $(xml).removeAttr('attr');

Using Regexs to play with XML is begging for disaster down the line. I'd use built in Xml functionality to do this.
From w3schools.com
xmlDoc=loadXMLDoc("books.xml");
x=xmlDoc.getElementsByTagName('book');
document.write(x[0].getAttribute('category')); document.write("<br />");
x[0].removeAttribute('category');
document.write(x[0].getAttribute('category'));
Where the XML is
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web" cover="paperback">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

classic solution : use String functions, for exemple :
str = str.substring(0,str.indexOf("attr")-1) + ">"

Related

Can anyone explain how does the below xsl works?

Can anyone please exlain how the below xsl works with an example?
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Remove empty elements or attributes -->
<xsl:template match="#*|node()">
<xsl:if test=". != '' or ./#* != ''">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
When i use above xsl for below xml which is not indented (Note it is showing indented below but consider it not indented. The input text box did not allow me to put not indented xml):
<Book Edition="1234" Type="Novel" TimeStamp="2021-07-09T14:02:55-05:00" Version="1.003">
<BOS>
<LIB>
<RequestorID ID="XXX" Type="10"/>
</LIB>
</BOS>
<Sections>
<Section CreateDateTime="2021-07-03T11:21:43-05:00" CreatorID="XXX" Status="Read">
<UniqueID ID="443791" Type="10"/>
<Chapters>
<Chapter>
<Paragraphs>
<Paragraph NumberOfUnits="10" Lines="100">
<Rates>
<Rate EffectiveDate="2021-12-12" ExpireDate="2021-12-13" RateTimeUnit="Day" UnitMultiplier="1">
<Base AmountBeforeTax="145.90" CurrencyCode="USD"/>
</Rate>
</Rates>
</Paragraph>
</Paragraphs>
<Readers>
<Reader Age="10" Count="1"/>
</Readers>
<TimeSpan End="2021-12-13" Start="2021-12-12"/>
<BasicInfo BookCode="1310"/>
</Chapter>
</Chapters>
<Authors>
<Author AuthorRPH="1">
<Profiles>
<ProfileInfo>
<UniqueID ID="44379" Type="1"/>
<Profile ProfileType="1">
<Author>
<PersonName>
<GivenName>TEST</GivenName>
<Surname>TEST</Surname>
</PersonName>
<Telephone PhoneNumber="0"/>
<Email>test#test.com</Email>
<Address Type="H">
<AddressLine>123 MAIN ST</AddressLine>
</Address>
</Author>
</Profile>
</ProfileInfo>
</Profiles>
</Author>
</Authors>
<GlobalInfo>
<ReadIds>
<ReadId ReadID_Source="ZZZ" ReadID_Type="10" ReadID_Value="1234"/>
</ReadIds>
</GlobalInfo>
</Section>
</Sections>
</Book>
Then i get below output:
<Book Edition="1234" Type="Novel" TimeStamp="2021-07-09T14:02:55-05:00" Version="1.003">
<BOS>
<LIB>
<RequestorID ID="XXX" Type="10"/>
</LIB>
</BOS>
<Sections>
<Section CreateDateTime="2021-07-03T11:21:43-05:00" CreatorID="XXX" Status="Read">
<UniqueID ID="443791" Type="10"/>
<Authors>
<Author AuthorRPH="1">
<Profiles>
<ProfileInfo>
<UniqueID ID="44379" Type="1"/>
<Profile ProfileType="1">
<Author>
<PersonName>
<GivenName>TEST</GivenName>
<Surname>TEST</Surname>
</PersonName>
<Telephone PhoneNumber="0"/>
<Email>test#test.com</Email>
<Address Type="H">
<AddressLine>123 MAIN ST</AddressLine>
</Address>
</Author>
</Profile>
</ProfileInfo>
</Profiles>
</Author>
</Authors>
</Section>
</Sections>
</Book>
As seen above it removes everything inside the Chapters tag and Global info tag
But if i use above xsl for below xml which indented:
<Book Edition="1234" Type="Novel" TimeStamp="2021-07-09T14:02:55-05:00" Version="1.003">
<POS>
<Source>
<RequestorID ID="XXX" Type="10"/>
</Source>
</POS>
<Sections>
<Section CreateDateTime="2021-07-03T11:21:43-05:00" CreatorID="XXX" Status="Read">
<UniqueID ID="443791" Type="10"/>
<Chapters>
<Chapter>
<Paragraphs>
<Paragraph NumberOfUnits="10" Lines="100">
<Rates>
<Rate EffectiveDate="2021-12-12" ExpireDate="2021-12-13" RateTimeUnit="Day" UnitMultiplier="1">
<Base AmountBeforeTax="145.90" CurrencyCode="USD"/>
</Rate>
</Rates>
</Paragraph>
</Paragraphs>
<Readers>
<Reader Age="10" Count="1"/>
</Readers>
<TimeSpan End="2021-12-13" Start="2021-12-12"/>
<BasicInfo BookCode="1310"/>
</Chapter>
</Chapters>
<Authors>
<Author AuthorRPH="1">
<Profiles>
<ProfileInfo>
<UniqueID ID="44379" Type="1"/>
<Profile ProfileType="1">
<Author>
<PersonName>
<GivenName>TEST</GivenName>
<Surname>TEST</Surname>
</PersonName>
<Telephone PhoneNumber="0"/>
<Email>test#test.com</Email>
<Address Type="H">
<AddressLine>123 MAIN ST</AddressLine>
</Address>
</Author>
</Profile>
</ProfileInfo>
</Profiles>
</Author>
</Authors>
<GlobalInfo>
<ReadIds>
<ReadId ReadID_Source="ZZZ" ReadID_Type="10" ReadID_Value="1234"/>
</ReadIds>
</GlobalInfo>
</Section>
</Sections>
</Book>
Then i get correct output:
<Book Edition="1234" Type="Novel" TimeStamp="2021-07-09T14:02:55-05:00" Version="1.003">
<BOS>
<LIB>
<RequestorID ID="XXX" Type="10"/>
</LIB>
</BOS>
<Sections>
<Section CreateDateTime="2021-07-03T11:21:43-05:00" CreatorID="XXX" Status="Read">
<UniqueID ID="443791" Type="10"/>
<Chapters>
<Chapter>
<Paragraphs>
<Paragraph NumberOfUnits="10" Lines="100">
<Rates>
<Rate EffectiveDate="2021-12-12" ExpireDate="2021-12-13" RateTimeUnit="Day" UnitMultiplier="1">
<Base AmountBeforeTax="145.90" CurrencyCode="USD"/>
</Rate>
</Rates>
</Paragraph>
</Paragraphs>
<Readers>
<Reader Age="10" Count="1"/>
</Readers>
<TimeSpan End="2021-12-13" Start="2021-12-12"/>
<BasicInfo BookCode="1310"/>
</Chapter>
</Chapters>
<Authors>
<Author AuthorRPH="1">
<Profiles>
<ProfileInfo>
<UniqueID ID="44379" Type="1"/>
<Profile ProfileType="1">
<Author>
<PersonName>
<GivenName>TEST</GivenName>
<Surname>TEST</Surname>
</PersonName>
<Telephone PhoneNumber="0"/>
<Email>test#test.com</Email>
<Address Type="H">
<AddressLine>123 MAIN ST</AddressLine>
</Address>
</Author>
</Profile>
</ProfileInfo>
</Profiles>
</Author>
</Authors>
<GlobalInfo>
<ReadIds>
<ReadId ReadID_Source="ZZZ" ReadID_Type="10"
ReadID_Value="1234"/>
</ReadIds>
</GlobalInfo>
</Section>
</Sections>
</Book>
Can anyone explain if the xmls are same why does indentation give different output? Does proper indentation affect the xsl transformaion?

The code has a single template rule which matches all element, text, comment, processing instruction, and attribute nodes. If the node has a non-empty string value, or has an attribute with a non-empty string value, then it shallow-copies the node and processes its attributes and children recursively.
The overall effect is to copy the entire document except for elements that have no content and no non-empty attributes (such as <br/>) - plus a few other exceptions such as empty comments.

The XPath expression . != '' or ./#* != '' in the test might not behave as you expect. The specification for XPath 1.0 (usually used with XSLT 1.0), as indicated by version="1.0" says at https://www.w3.org/TR/1999/REC-xpath-19991116/#dt-string-value
For every type of node, there is a way of determining a string-value
for a node of that type. For some types of node, the string-value is
part of the node; for other types of node, the string-value is
computed from the string-value of descendant nodes.
and down in section "5.2 Element Nodes"
The string-value of an element node is the concatenation of the
string-values of all text node descendants of the element node
in document order.
So for an ordinary non-indented source element, string conversion involves recursively descending to elements and getting their values (ignoring attributes). In your case, for an element which also does not have attributes, the whole sub-tree gets eliminated (i.e. not copied to output) if it does not have any element with a value like <elem>value</elem> in it.
With indented source, you also have nodes representing the whitespace between the nodes. (XPath does not know that white-space is irrelevant for you, and assumes mixed content.) This causes to-string conversion of an element with (indented) sub-elements to have (at least) that whitespace in the result of the string conversion, rendering the . != '' XPath expression false.
I hope, this explains the difference in result depending on indentation in source.
You might want to have a look at https://www.w3.org/TR/1999/REC-xpath-19991116/#function-normalize-space for trimming the conversion result. Note that this would potentially also affect honoring of "real values" if these can be all whitespace.
Edit: Depending on what you want to achieve, you might consider having white-space eliminated by the XSLT processor by using xsl:strip-space.

How to limit the number of words in XSLT?

Given the following XML document:
<books>
<book>
<name>The problem of the ages</name>
</book>
<book>
<name>Filtering the tap</name>
</book>
<book>
<name>Legend of Atlantis</name>
</book>
</books>
I want to take at most 2 words from the name of each book. Words can be assumed as being sequences of whitespace-separated characters. Example of output:
<library>
<record>The problem</record>
<record>Filtering the</record>
<record>Legend of</record>
</library>
How would I achieve this using a single XSLT?

Try (in 3.0 with expand-text enabled):
<xsl:template match="book/name">
<record>{tokenize(.) => subsequence(1, 2)}</record>
</xsl:template>

how to concat files XML with xslt

I need to concat a number 'N' of the XML in another XML.
For example:
I've this XML
<BOOKS>
<BOOK>
<TITTLE>Lord of the Rings</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<AUTHORS>J. K. Rowling</AUTHORS>
<YEAR>2015</YEAR>
</BOOK>
<BOOK>
<TITTLE>The Hobbit: The Battle of the Five Armies</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<YEAR>2013</YEAR>
</BOOK>
</BOOKS>
and this:
<BOOKS>
<BOOK>
<TITTLE>A Clash of Kings</TITLE>
<AUTHORS>George R. R. Martin</AUTHORS>
<AUTHORS>J. K. Rowling</AUTHORS>
<YEAR>2016</YEAR>
</BOOK>
<BOOK>
<TITTLE>The Hobbit: The Battle of the Five Armies</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<YEAR>2013</YEAR>
</BOOK>
</BOOKS>
I need to generate a new file like this:
<BOOKS>
<BOOK>
<TITTLE>Lord of the Rings</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<AUTHORS>J. K. Rowling</AUTHORS>
<YEAR>2015</YEAR>
</BOOK>
<BOOK>
<TITTLE>The Hobbit: The Battle of the Five Armies</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<YEAR>2013</YEAR>
</BOOK>
<BOOK>
<TITTLE>A Clash of Kings</TITLE>
<AUTHORS>George R. R. Martin</AUTHORS>
<AUTHORS>J. K. Rowling</AUTHORS>
<YEAR>2016</YEAR>
</BOOK>
<BOOK>
<TITTLE>The Hobbit: The Battle of the Five Armies</TITLE>
<AUTHORS>J. R. R. Tolkien</AUTHORS>
<YEAR>2013</YEAR>
</BOOK>
</BOOKS>
The XMLs are in my directory: E:\books. I want to concat all files, for example: If I've two files, the script will concat them, but if I've three or more files, the script will concat also. How do I do it?

In XSLT 2.0 with Saxon it's
<xsl:template name="main">
<BOOKS>
<xsl:sequence select="collection('dir?select=*.xml')/BOOKS/BOOK"/>
</BOOKS>
</xsl:template>

How to get two or more matching tags in a xml file using regular expression

I need help on regular expression in the below xml code I want to extract values in two tags (title,price) at a time so that my output should look like
Output required:
<title lang="en">Everyday Italian</title>
<price>30.00</price>
<title lang="en">XQuery Kick Start</title>
<price>29.99</price>
<title lang="en">XQuery Kick Start</title>
<price>49.99</price>
<title lang="en">Learning XML</title>
<price>39.95</price>
Right now I am using:
^\s*<title>.*</title>
this code is fetching only <title>
<title lang="en">Everyday Italian</title>
<title lang="en">XQuery Kick Start</title>
<title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>
How to get two tags at a time? can some one help me
XML:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>

Your regex wont match your given xml because you haven't handled attributes for title tag. You can use this regex to get both title and price tags with a single expression:
^\s*<(title|price)[^>]*>(.*)<\/\1>
regex matching price tag example
same regex matching title tag example
Also you can get the tag-name and value using back-reference \1 and \2 to the captured groups.

What is your environment? You can do this easily with grep on a unix-like command line:
grep -E "<(title|price)"

How do I Filter an XML via an XSLT with xml params

Here is my input XML:
<Books>
<Book>
<BookId>1</BookId>
<Des>Dumm1</Des>
<Comments/>
<OrderDateTime>04/06/2009 12:37</OrderDateTime>
</Book>
<Book>
<BookId>2</BookId>
<Des>Dummy2</Des>
<Comments/>
<OrderDateTime>04/07/2009 12:37</OrderDateTime>
</Book>
<Book>
<BookId>3</BookId>
<Des>Dumm12</Des>
<Comments/>
<OrderDateTime>05/06/2009 12:37</OrderDateTime>
</Book>
<Book>
<BookId>4</BookId>
<Des>Dummy2</Des>
<Comments/>
<OrderDateTime>06/07/2009 12:37</OrderDateTime>
</Book>
</Books>
I pass an XML param and my Input XML is
<BookIDs>
<BookID>2</BookID>
<BookID>3</BookID>
</BookIDs>
My output should be like
<Books>
<Book>
<BookId>2</BookId>
<Des>Dummy2</Des>
<Comments/>
<OrderDateTime>04/07/2009 12:37</OrderDateTime>
</Book>
<Book>
<BookId>3</BookId>
<Des>Dumm12</Des>
<Comments/>
<OrderDateTime>05/06/2009 12:37</OrderDateTime>
</Book>
</Books>
How do I accomplish this using XSLT?

This works in Saxon 6.5.5...
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.1">
<xsl:param name="nodeset">
<BookIDs><BookID>2</BookID><BookID>3</BookID></BookIDs>
</xsl:param>
<xsl:template match="/Books">
<Books>
<xsl:variable name="Copy">
<wrap>
<xsl:copy-of select="Book"/>
</wrap>
</xsl:variable>
<xsl:for-each select="$nodeset/BookIDs/BookID">
<xsl:copy-of select="$Copy/wrap/Book[BookId=current()]"/>
</xsl:for-each>
</Books>
</xsl:template>
</xsl:stylesheet>
A pure XSLT solution will be pretty brittle though. Sub-query predicates didn't work, neither did a key. It is dependent upon the param being recognized as a node-set--which I was unable to achieve with a dynamic value (as opposed to the default in my example), even with exsl:node-set. This is also wasteful in that it copies all the Book elements from the source document.
There may be a better solution in XSLT 2.0. Alternately, if you are initiating your transform with some other language/tool, there may be better approaches available there. Another possibility could include the use of exsl:document to load your source document or params.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js