Find number of characters matching pattern in XSLT 1 - regex

I need to make an statement where the test pass if there is just one asterisk in a string from the source document.
Thus, something like
<xslt:if test="count(find('\*', #myAttribute)) = 1)>
There is one asterisk in myAttribute
</xslt:if>
I need the functionality for XSLT 1, but answers for XSLT 2 will be appreciated as well, but won't get acceptance unless its impossible in XSLT 1.

In XPath 1.0, we can do it by removing all asterisks using translate and comparing the length:
string-length(#myAttribute) - string-length(translate(#myAttribute, '*', '')) = 1
In XPath 2.0, I'd probably do this:
count(string-to-codepoints(#myAttribute)[. = string-to-codepoints('*')]) = 1

Another solution that should work in XPath 1.0:
contains(#myAttribute, '*') and not(contains(substring-after(#myAttribute, '*'), '*'))

Related

XQuery - How do I extract a substring before the second occurrence of a character?

Let's say I have this string: "123_12345_123456"
I would like to extract everything before the second "_" (underscore)
I tried:
fn:tokenize("123_1234_12345", '_')[position() le 2]
That returns:
123
1234
What I actually want is:
123_1234
How do I achieve that?
I am using XQuery 1.0
Regular expressions are flexible and compact:
replace('123_1234_12345', '_[^_]+$', '')
Another solution that may be better readable is to a) tokenize the string, b) keep the tokens you want to preserve and c) join them again:
string-join(
tokenize('123_1234_12345', '_')[position() = 1 to 2],
'_'
)
Taking the basic idea from Michael Kay's deleted answer, it could be implemented like this:
substring($input, 1, index-of(string-to-codepoints($input), 95)[2] - 1)

Removing extra zeros concatenated with the number in XSLT

I'm working with XSLT and trying to remove all zeros present before and after the numbers.
Examples:
000000004552000 needs to translate to 4552.
Any ideas how to get this done using xslt? Thanks in advance!
Please always say what XSLT version you are using.
In 2.0, you can use replace(num, '^0+|0+$', '').
In 1.0, it's more difficult (everything is).
To remove leading zeroes, use string(number(.)).
To remove trailing zeroes, I think you need a recursive named template with the logic:
if $param mod 10 = 0
then call yourself with param = $param div 10
else $param

How find XPATH with random number value in attribute?

I have div blocks on website like this: <div id="banner-XXX-1"></div>
So I need to query this banner, where XXX is any digit number.
How to do that? Currently I use this way:
//div[contains(#id,'banner-') and contains(#id,'-1')]
But this way is not good if XXX starts with 1. So, is there any way to do like this: //div[contains(#id,'banner-' + <any_decimal> + '-1')]?
It seems match operator on popular Chrome plugin XPath Helper does not work, so I use v1.0
https://chrome.google.com/webstore/detail/xpath-helper/hgimnogjllphhhkhlmebbmlgjoejdpjl?hl=en
XPath 1.0
This XPath 1.0 expression,
//div[ starts-with(#id,'banner-')
and translate(substring(#id, 8, 3), '0123456789', '') = ''
and substring(#id, 11) = '-1']
selects all div elements whose id attribute value
starts with banner-,
followed by 3 digits, which a translate() trick mapped to nothing,
followed by -1,
as requested.
XPath 2.0
This XPath 2.0 expression,
//div[matches(#id,'^banner-\d{3}-1$')]
selects all div elements whose id attribute value matches the shown regex and
starts (^) with banner-,
followed by 3 digits, (\d{3}),
and ends ($) with -1,
as requested.

Reverse a regex?

I am using AHK to automatically do something but it involves parsing XML. I am aware that it is a bad habit to parse XML with regex, however I pretty much have my regex working. The issue is AHK only has regexreplace as a method and I need something along the lines of regexkeep.
So what happens is the part I want to keep gets deleted and the part I want deleted gets kept.
Here is the code:
RegExReplace(response, "(?<=.dt.\n:)(.*)(?=\n..dt.)")
Is there a way to have everything but the match match? If not is there a better way to go about this?
Edit:
I have no attempted using the inverse regex and regexmatch but neither work in AHK. Both regexs work properly at regex101.com however neither work in AHK. The regexmath returns 0 (meaning it found nothing) and the inverse regex returns nothing as well.
Here is a link to what is being searched by the regex:http://www.dictionaryapi.com/api/v1/references/collegiate/xml/Endoderm?key=17594df4-ff21-4045-88d9-a537fd4bcd61
Here is the entire code:
;responses := RegExReplace(response, "([\w\W])(?<=.dt.\n:)(.*)(?=\n..dt.)([\w\W])")
responses := RegExMatch(response, "(?<=.dt.\n:)(.*)(?=\n..dt.)")
MsgBox %responses%
Here is the "reversed" regex:
s).*dt.\n:|\n..dt.*
The parts in the look-arounds need matching with the * quantifier to match from the start and up to the end. To match the newline with a dot, use singleline mode.
Debuggex Demo (where endings are \r\n)
However, there is a better option with RegExMatch OutputVar:
If any capturing subpatterns are present inside NeedleRegEx, their
matches are stored in a pseudo-array whose base name is OutputVar.
Use
RegExMatch(response, "(?<=.dt.\n:)(?<Val>.*)(?=\n..dt.)")
Then, just refer to this value as MatchVal.
Here's a solution that should work, assuming you want to get whatever's between the <dt> tags. Make sure you're using the latest version of AHK if possible.
xml =
(
<entry_list version="1.0">
<entry id="endoderm">
<ew>endoderm</ew>
<subj>EM#AN</subj>
<hw>en*do*derm</hw>
<sound>
<wav>endode01.wav</wav>
<wpr>!en-du-+durm</wpr>
</sound>
<pr>ˈen-də-ˌdərm</pr>
<fl>noun</fl>
<et>French
<it>endoderme,</it>from
<it>end-</it>+ Greek
<it>derma</it>skin
<ma>derm-</ma>
</et>
<def>
<date>1861</date>
<dt>:the innermost of the three primary germ layers of an embryo that is
the source of the epithelium of the digestive tract and
its derivatives and of the lower respiratory tract</dt>
<sd>also</sd>
<dt>:a tissue derived from this layer</dt>
</def>
<uro>
<ure>en*do*der*mal</ure>
<sound>
<wav>endode02.wav</wav>
<wpr>+en-du-!dur-mul</wpr>
</sound>
<pr>ˌen-də-ˈdər-məl</pr>
<fl>adjective</fl>
</uro>
</entry>
</entry_list>
)
; Remove linebreaks and indentation whitespace
xml := RegExReplace(xml, "\n|\s{2,}|\t", "")
matchArray := []
matchPos := 1
; Keep looping until we're out of matches
while ( matchPos := RegExMatch(xml, "<dt>:([^<]*)", matchVar, matchPos + StrLen(matchVar1)) )
{
; Add matches to array
matchArray.insert(matchVar1)
}
; Show what's in the array
for each, value in matchArray {
; Index = Each, Output = Value
msgBox, Ittr: %each%, Value: %value%
}
Esc::ExitApp
You really shouldn't use RegEx for parsing XML though, it's very simple to read XML in AHK using COM, I know it's outside the scope of your question, but here's a simple example using a COM object to read the same data:
xmlData =
(LTrim
<?xml version="1.0" encoding="utf-8" ?>
<entry_list version="1.0">
<entry id="endoderm"><ew>endoderm</ew><subj>EM#AN</subj><hw>en*do*derm</hw><sound><wav>endode01.wav</wav><wpr>!en-du-+durm</wpr></sound><pr>ˈen-də-ˌdərm</pr><fl>noun</fl><et>French <it>endoderme,</it> from <it>end-</it> + Greek <it>derma</it> skin <ma>derm-</ma></et><def><date>1861</date><dt>:the innermost of the three primary germ layers of an embryo that is the source of the epithelium of the digestive tract and its derivatives and of the lower respiratory tract</dt> <sd>also</sd> <dt>:a tissue derived from this layer</dt></def><uro><ure>en*do*der*mal</ure><sound><wav>endode02.wav</wav><wpr>+en-du-!dur-mul</wpr></sound> <pr>ˌen-də-ˈdər-məl</pr> <fl>adjective</fl></uro></entry>
</entry_list>
)
xmlObj := ComObjCreate("MSXML2.DOMDocument.6.0")
xmlObj.loadXML(xmlData)
nodes := xmlObj.selectSingleNode("/entry_list/entry/def").childNodes
for node in nodes {
if (node.nodeName == "dt")
msgBox % node.text
}
Esc::ExitApp
For more information on how to use this, see this post: http://www.autohotkey.com/board/topic/56987-com-object-reference-autohotkey-v11/?p=367838
If the given phrase only occurs once, you can probably just fetch everything around it, can't you?
RegExReplace(response, "([\w\W]*)(?<=.dt.\n:)(.*)(?=\n..dt.)([\w\W]*)", "$1$5")
looks like the easiest solution to me, but surely not the prettiest
update: in your question update, you quoted responses := RegExReplace(response, "([\w\W])(?<=.dt.\n:)(.*)(?=\n..dt.)([\w\W])"), but it should be responses := RegExReplace(response, "([\w\W]*)(?<=.dt.\n:)(.*)(?=\n..dt.)([\w\W]*)", "$1$5") - meaming keep the first ($1) and the last ($5) key of braces, which include an arbitrary amount of any characters ([\w\W]*) around your initial phrase. seems you copied it wrong. I can't say that it will work for sure tho since I don't have any code to test it on
edit - one thing I don't understand - how does regexMatch help here? it just tells us IF and WHERE there is a substring present, but surely doesn't replace anything?

XSLT transformation: Substring after a last special character

I have fields "commercial register code: 1111111" and "commercial register code 2222" I need to take after last space symbols: 1111111 and 2222. There is function to take symbolrs before "space" in xsl?
Regards
Update from comments
I will have "comercial register 21
code:" line
And
"code" can be without ":" symbol
If there is going to be one and only one number, then you could use
translate($string,transtale($string,'0123456789',''),'')
This will remove any not digit character from the string.
If the prefixed label is stable, then you could use something like:
substring-after($string,'commercial register code:')
Abour the question:
There is function to take symbolrs before "space" in xsl?
Answer: Yes, substring-before() function
Update
From comments, it looks like the string pattern would be:
'commercial register' number 'code' (':')? number
Then use:
translate(substring-after($string,'code'), ': ', '')
In XSLT 2.0, use tokenize($in, '\s+')[last()]
If you're stuck with 1.0, you need a recursive template: check out str:tokenize in the EXSLT library.
Can you use EXSLT functions? If so, there is a str:split function and then you can do:
str:split($string, ' ')[position()=last()]