I have a newbie question.
In my XML I have 3 elements:
...
<Person>
<Name>John</Name>
<Surname>Doe</Surname>
<AlternativeName>Unknown Person</AlternativeName>
</Person>
...
The business rules are very simple:
Concatenate Name and Surname i.e. concat(./Person/Name, ' ', ./Person/Surname)
If result from point 1 is blank then use ./Person/AlternativeName
There must be no leading or trailing spaces in the final result
How do I implement the above rules, considering the following:
Name and/or Surname could be empty e.g. <Name></Name>
Name and/or Surname might not be present in the XML e.g.
...
<Person>
<AlternativeName>Unknown Person</AlternativeName>
</Person>
...
If this was Java or C# or Delphi, I would simply concatenate the fields, trim leading and trailing spaces and test the result...
You can use the normalize-space() function to remove leading/trailing spaces. It will also reduce other spaces to the one space in case the concatenation leaves you with multiple spaces between values.
For example like this:
normalize-space(concat(/Person/Name,' ',/Person/Surname))
You can use either xsl:choose or xsl:if to determine wether or not to use the AlternativeName as Martin mentioned in his comment.
Related
So I have several XML files that have persons with unique IDs and they each have a favorite food (a person can be in several xml files):
There are cases where the person with id=300 might have the food right in the beginning of the tag.
<person id="299">
<food>
<type> Hot Dog </type>
</food>
</person>
<person id="300">
<food>
<type> Burger</type>
</food>
</person>
Or there might be other tags before the food tag
<person id="300">
<year>
<birth> 1990 </birth>
<marriage> 2020 </marriage>
</year>
<food>
<type> Vegan </type>
</food>
</person>
I need to use a single Perl RegEx functions to remove the food tags ONLY of the persons whose ID is 300, independently if it is at the beginning, middle, or end of the person tag
I know if it was for the whole person tag I could use something like :
$fileContents =~ s/<person id=\"300\"[^<]+<\/person>//g;
But I must leave the person tag intact, I must only remove the food tag inside the person tag, but I can't remove all the food tags because I need to leave it for people with other ID's.
Could you help me please?? I been struggling a lot with this D:
You can't safely do that with a substitution.
And even a half-assed approach is more complicated than using an existing XML parser.
$_->unbindNode()
for $doc->findnodes('//person[#id="300"]/food');
Full solution:
use XML::LibXML qw( );
# my $doc = XML::LibXML->new->parse_file(...);
# or
# my $doc = XML::LibXML->new->parse_string(...);
$_->unbindNode()
for $doc->findnodes('//person[#id="300"]/food');
# $doc->toFile(...)
# or
# $doc->toString(...)
perl -i.bk -pe'BEGIN{undef$/}s|<person (.*?)>.*?</person>|$p=$&;$1=~/id="300"/?$p=~s,<food>.*?</food>,,sr:$p|esg' files*.xml
...removes <food>.....</food> from persons with id="300" in one or more files*.xml. The original files are kept and renamed with .bk added to their file names. So only run this once if you need to keep the original files...or change -i.bk into for example -i.bk$(date +%Y%m%d%h%M%S).
Note: I think ikegami's answer is much better.
But sometimes one writes perl for systems not allowing extra modules and XML::LibXML sadly isn't a core module. And sometimes half-assed XML might be best/fastest handled with half-assed methods. Perhaps "XML" written by something beyond your control. Maybe it's missing a root node for the list of persons, like in the first example here (the list of <person>s could be surrounded with <list>...</list> to make it readable to XML::LibXML) Or with ' or " missing around attribute values, which also wouldn't be readable to XML::LibXML right away.
I'm trying to do a regular expression matching with REGEXP_LIKE and I'm looking for a regexp to find if the value of a specific tag is not a specific string.
For example:
<person>
<name>John</name>
<age>40</age>
</person>
My goal is to validate that the name tag's value is not John, so the REGEXP_LIKE would return true for input xmls where name is not John.
Thank you in advance for the help!
A quick and easy way to do this is to simply negate the regex search:
... WHERE NOT REGEXP_LIKE('column_name', '<name>John</name>')
However, as should be mentioned every time a question like this is posted, it's generally a bad idea to parse XML with regex. If you find yourself constructing more complex regex patterns to search this XML data, then you should:
Use an XML parser instead of regular expressions, or
Change how you are storing the data! Make person.age a separate table column; don't bung the entire XML structure into a single place.
I have to edit a stored procedure that builds xml strings so that all the element values are wrapped in cdata. Some of the values have already been wrapped in cdata so I need to ignore those.
I figured this is a good attempt to learn some regex
From: <element>~DATA_04</element>
to: <element><![CDATA[~DATA_04]]></element>
What are my options on how to do this? I can do simple regex, this is way more advanced.
NOTE: The <element> is generic for illustration purposes, in reality, it could be anything and is unknown.
Sample text:
declare #sql nvarchar(max) =
' <data>
<header></header>
<docInfo>Blah</docInfo>
<someelement>~DATA_04</someelement>
<anotherelement><![CDATA[~DATA_05]]></anotherelement>
</data>
'
Using the sample xml, the regex would need to find someelement and add cdata to it like <someelement><![CDATA[~DATA_04]]></someelement> and leave the other elements alone.
Bear in mind, I did not write this horrible sql code, i just have to edit it.
This is c#:
string text = Regex.Replace( inputString, #"<element>~(.+)</element>", "<element>![CDATA[~$1]]</element>" , RegexOptions.None );
The find is:
<element>~(.+)</element>
The replace is:
<element>![CDATA[~$1]]</element>
I'm assuming there is a ~ at the start of the inside of the element tag.
You will also want to watch out for whitespace if that is an issue...
You may want to add some
\s*
Any whitespace characters, zero or more matches
Try with (<[^>]+>)(\~data_([^<]+))(<[^>]+>)
and replace for \1<![CDATA[\2]]>\4
this will give you: <element><![CDATA[~DATA_04]]></element>,
where element could be anything else. Check the DEMO
Good luck
I'd appreciate pointers on how to get (non-element) text between tags. For example given the element ABC I'd like to get the text ABC.
Currently, I'm able to use DefaultHandler::(const XMLCh *const chars, const XMLSize_t length) in order to get the characters between two consecutive start or end tags. Unfortunately I'm getting unnecessary newlines and formatting spaces. Between parent tags and child elements. For example in the bit of code below, I'm getting 5 extra formatting characters -- one newline and four spaces:
<Parent> <!-- Newline here -->
<Child>XYX</Child> <!-- Four spaces here -->
</Parent>
What is be the best (standard) way of filtering out these formatting characters?
Solved. For posterity's sake, here's how I did it.
Because the desired characters appear between (consecutive start and end) tags that define an element, In the method DefaultHandler::startElement() I store the local name at the start of an element and compare it with next `local name that is encountered.
If the next local name encountered belongs to a new element then the intervening characters must be formatting characters and should be ignored.
If however the next element encountered has the same local name then the intervening characters form the desired string.
I try to use upper-case() in an XPATH, my parser is MSXML 4.0, and I get :
upper-case is not a valid XSLT or XPath function.
Is it really not implemented ?
There are no functions in xslt 1.0 to convert to uppercase or lowercase. Instead do the following:
If it is required in a lot of places:
Declare these two xsl variables (this is to make the xslt more readable)
<!-- xsl variables up and lo and translate() are used to change case -->
<xsl:variable name="up" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="lo" select="'abcdefghijklmnopqrstuvwxyz'"/>
And use them in your translate function to change the case
<xsl:value-of select="translate(#name,$lo,$up)"/>
If you need to use it in just one place, no need to declare variables
<xsl:value-of select="translate(#name,'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ')"/>
Maybe this can help you:
translate(string, string, string)
The translate function takes a string and, character-by-character, translates characters which match the second string into the corresponding characters in the third string. This is the only way to convert from lower to upper case in XPath. That would look like this (with extra white space added for readability). This code would translate the employee last names to upper case and then select those employees whose last names begin with A.
descendant::employee[
starts-with(
translate(#last-name,
"abcdefghijklmnopqrstuvwxyz",
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"),
"A"
)
]
If the second string has more characters than the third string, these extra characters will be removed from the first string. If the third string has more characters than the second string, the extra characters are ignored.
(from http://tutorials.beginners.co.uk/professional-visual-basic-6-xml-part-1-using-xml-queries-and-transformations.htm?p=3)