AIML syntax for detecting a keyword present anywhere in a line? - aiml

Let the keyword be 'course'. I tried * COURSE *,_ COURSE * and various other permutations of this but wasn't getting the desired result.(My line could be anything but definitely has the word course somewhere in it).

The answer to this depends if you are using AIML v1 or AIML v2.
For AIML v1, you will need to set up 4 categories. One to detect "course" on its own, one for any input that starts with "course", one for any input that ends with "course" and one to handle input with "course" inside it:
<category>
<pattern>COURSE</pattern>
<template>Your input contained the word "course".</template>
</category>
<category>
<pattern>COURSE *</pattern>
<template><srai>course</srai></template>
</category>
<category>
<pattern>_ COURSE</pattern>
<template><srai>course</srai></template>
</category>
<category>
<pattern>_ COURSE *</pattern>
<template><srai>course</srai></template>
</category>
The reason you need 4 categories is that the * and _ wildcards match one or more words. However, if you are using AIML v2 (which you should be!), you can do this using # wildcard, which matches zero or more words:
<category>
<pattern># COURSE #</pattern>
<template>Your input contained the word "course".</template>
</category>

Related

XSLT white space in concat

I have the following code
<xsl:value-of select="concat(string($var15_cond_result_exists), string($var16_cond_result_exists))"/>
which is concatenating 2 strings. Examle John and Smith to JohnSmith.
What i want is a space between first name and last name.
I can do this with adding ,' ', between them in concat. Howered there is posibility that there is no first name or last name so I don't need the white space.
How can i solve this problem?
Is it possible to use some conditions or there is easier solution.
Wrap the concat in normalize-space() which will trim any excess spaces at the start or end
<xsl:value-of
select="normalize-space(concat(string($var15_cond_result_exists), ' ', string($var16_cond_result_exists)))"/>
Note, you may be able to drop the string function inside the concat. Try this too
<xsl:value-of
select="normalize-space(concat($var15_cond_result_exists, ' ', $var16_cond_result_exists))"/>
You don't say which XSLT version you are using. In XSLT 2.0 you can do
<xsl:value-of select="$var15_cond_result_exists, $var16_cond_result_exists"/>
which will automatically insert a space if and only if both items exist. The conversion to string is automatic in both 1.0 and 2.0.

Get position of specific word

I am new in XSLT and if it is possible to get the position of a specific word? For example, I have a data like this:
<Data>The quick brown fox jumps over the lazy dog!</Data>
I want to get the position of a "brown", "over", "dog" and "!". And, store it in different output name. Like the position of brown is <foo>3</foo>, position of over is <boo>6</boo>, dog <hop>9</hop> and ! <po_df>10</po_df>. Is it possible?
If you were only looking for words you could use tokenize(., '\s+|\p{P}')
<xsl:template match="Data">
<xsl:copy>
<xsl:variable name="words" select="tokenize(., '\s+|\p{P}')"/>
<xsl:for-each select="'brown', 'over', 'dog'">
<matched item="{.}" at-pos="{index-of($words, .)}"/>
</xsl:for-each>
</xsl:copy>
</xsl:template>
which gives
<Data>
<matched item="brown" at-pos="3"/>
<matched item="over" at-pos="6"/>
<matched item="dog" at-pos="9"/>
</Data>
so it has the right positions (I am not sure where the names of the elements you posted (like hop) are to be taken from so I have not tried to implement that.).
As you also want to identify a punctuation character I am not sure tokenize suffices and even with analyze-string it is not straight-forward to match and collect the position. Maybe someone else has a better idea.

Reverse a regex?

I am using AHK to automatically do something but it involves parsing XML. I am aware that it is a bad habit to parse XML with regex, however I pretty much have my regex working. The issue is AHK only has regexreplace as a method and I need something along the lines of regexkeep.
So what happens is the part I want to keep gets deleted and the part I want deleted gets kept.
Here is the code:
RegExReplace(response, "(?<=.dt.\n:)(.*)(?=\n..dt.)")
Is there a way to have everything but the match match? If not is there a better way to go about this?
Edit:
I have no attempted using the inverse regex and regexmatch but neither work in AHK. Both regexs work properly at regex101.com however neither work in AHK. The regexmath returns 0 (meaning it found nothing) and the inverse regex returns nothing as well.
Here is a link to what is being searched by the regex:http://www.dictionaryapi.com/api/v1/references/collegiate/xml/Endoderm?key=17594df4-ff21-4045-88d9-a537fd4bcd61
Here is the entire code:
;responses := RegExReplace(response, "([\w\W])(?<=.dt.\n:)(.*)(?=\n..dt.)([\w\W])")
responses := RegExMatch(response, "(?<=.dt.\n:)(.*)(?=\n..dt.)")
MsgBox %responses%
Here is the "reversed" regex:
s).*dt.\n:|\n..dt.*
The parts in the look-arounds need matching with the * quantifier to match from the start and up to the end. To match the newline with a dot, use singleline mode.
Debuggex Demo (where endings are \r\n)
However, there is a better option with RegExMatch OutputVar:
If any capturing subpatterns are present inside NeedleRegEx, their
matches are stored in a pseudo-array whose base name is OutputVar.
Use
RegExMatch(response, "(?<=.dt.\n:)(?<Val>.*)(?=\n..dt.)")
Then, just refer to this value as MatchVal.
Here's a solution that should work, assuming you want to get whatever's between the <dt> tags. Make sure you're using the latest version of AHK if possible.
xml =
(
<entry_list version="1.0">
<entry id="endoderm">
<ew>endoderm</ew>
<subj>EM#AN</subj>
<hw>en*do*derm</hw>
<sound>
<wav>endode01.wav</wav>
<wpr>!en-du-+durm</wpr>
</sound>
<pr>ˈen-də-ˌdərm</pr>
<fl>noun</fl>
<et>French
<it>endoderme,</it>from
<it>end-</it>+ Greek
<it>derma</it>skin
<ma>derm-</ma>
</et>
<def>
<date>1861</date>
<dt>:the innermost of the three primary germ layers of an embryo that is
the source of the epithelium of the digestive tract and
its derivatives and of the lower respiratory tract</dt>
<sd>also</sd>
<dt>:a tissue derived from this layer</dt>
</def>
<uro>
<ure>en*do*der*mal</ure>
<sound>
<wav>endode02.wav</wav>
<wpr>+en-du-!dur-mul</wpr>
</sound>
<pr>ˌen-də-ˈdər-məl</pr>
<fl>adjective</fl>
</uro>
</entry>
</entry_list>
)
; Remove linebreaks and indentation whitespace
xml := RegExReplace(xml, "\n|\s{2,}|\t", "")
matchArray := []
matchPos := 1
; Keep looping until we're out of matches
while ( matchPos := RegExMatch(xml, "<dt>:([^<]*)", matchVar, matchPos + StrLen(matchVar1)) )
{
; Add matches to array
matchArray.insert(matchVar1)
}
; Show what's in the array
for each, value in matchArray {
; Index = Each, Output = Value
msgBox, Ittr: %each%, Value: %value%
}
Esc::ExitApp
You really shouldn't use RegEx for parsing XML though, it's very simple to read XML in AHK using COM, I know it's outside the scope of your question, but here's a simple example using a COM object to read the same data:
xmlData =
(LTrim
<?xml version="1.0" encoding="utf-8" ?>
<entry_list version="1.0">
<entry id="endoderm"><ew>endoderm</ew><subj>EM#AN</subj><hw>en*do*derm</hw><sound><wav>endode01.wav</wav><wpr>!en-du-+durm</wpr></sound><pr>ˈen-də-ˌdərm</pr><fl>noun</fl><et>French <it>endoderme,</it> from <it>end-</it> + Greek <it>derma</it> skin <ma>derm-</ma></et><def><date>1861</date><dt>:the innermost of the three primary germ layers of an embryo that is the source of the epithelium of the digestive tract and its derivatives and of the lower respiratory tract</dt> <sd>also</sd> <dt>:a tissue derived from this layer</dt></def><uro><ure>en*do*der*mal</ure><sound><wav>endode02.wav</wav><wpr>+en-du-!dur-mul</wpr></sound> <pr>ˌen-də-ˈdər-məl</pr> <fl>adjective</fl></uro></entry>
</entry_list>
)
xmlObj := ComObjCreate("MSXML2.DOMDocument.6.0")
xmlObj.loadXML(xmlData)
nodes := xmlObj.selectSingleNode("/entry_list/entry/def").childNodes
for node in nodes {
if (node.nodeName == "dt")
msgBox % node.text
}
Esc::ExitApp
For more information on how to use this, see this post: http://www.autohotkey.com/board/topic/56987-com-object-reference-autohotkey-v11/?p=367838
If the given phrase only occurs once, you can probably just fetch everything around it, can't you?
RegExReplace(response, "([\w\W]*)(?<=.dt.\n:)(.*)(?=\n..dt.)([\w\W]*)", "$1$5")
looks like the easiest solution to me, but surely not the prettiest
update: in your question update, you quoted responses := RegExReplace(response, "([\w\W])(?<=.dt.\n:)(.*)(?=\n..dt.)([\w\W])"), but it should be responses := RegExReplace(response, "([\w\W]*)(?<=.dt.\n:)(.*)(?=\n..dt.)([\w\W]*)", "$1$5") - meaming keep the first ($1) and the last ($5) key of braces, which include an arbitrary amount of any characters ([\w\W]*) around your initial phrase. seems you copied it wrong. I can't say that it will work for sure tho since I don't have any code to test it on
edit - one thing I don't understand - how does regexMatch help here? it just tells us IF and WHERE there is a substring present, but surely doesn't replace anything?

xslt 1.0 substring-after to ignore case

I have 2 xml nodes like this, for example:
<Model>GRAND MODUS</Model>
<QualifiedDescription>2008 58 Reg Renault Grand Modus 1.2 TCE Dynamique 5drMetallic Flame Red</QualifiedDescription>
I'm trying to use substring-after to split the QualifiedDescription after the Grand Modus like this:
<xsl:variable name="something"><xsl:value-of select='substring-after(QualifiedDescription, Model)' /></xsl:variable>
But obviously it's not working being of it being case sensitive. Is it possible to get substring-after to work case insensitive, but still return the output with case preserved EG.
1.2 TCE Dynamique 5drMetallic Flame Red
Thanks.
You could convert the two strings to the same case using translate in order to work out the character offset of the first within the second, then take a substring of the original QualifiedDescription from that position.
<xsl:variable name="uc" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
<xsl:variable name="lc" select="'abcdefghijklmnopqrstuvwxyz'" />
<xsl:variable name="substrStart" select="
string-length(substring-before(translate(QualifiedDescription, $uc, $lc),
translate(Model, $uc, $lc)))
+ string-length(Model)
+ 1" /><!-- +1 because string indexes in XPath are 1-based -->
<xsl:variable name="something"
select="substring(QualifiedDescription, $substrStart)" />
You'd need slightly more complex logic to take account of cases where the QualifiedDescription does not include the Model (since in this case both substring-before and substring-after return the empty string) but you get the idea.
You can do case insensitive if you uppercase all first and substring on uppercase:
substring-after(upper-case(QualifiedDescription), upper-case(Model))

Xpath search for duplicate

I have the following xml:
<log>
<logentry revision="11956">
<author>avijendran</author>
<date>2013-05-20T10:25:19.678089Z</date>
<msg>
JIRA-1263 - did something
</msg>
</logentry>
<logentry revision="11956">
<author>avijendran</author>
<date>2013-05-20T10:25:19.678089Z</date>
<msg>
JIRA-1263 - did something 22 again
</msg>
</logentry>
</log>
I want to ignore any occurrence of the JIRA-1263 after the first one.
The xpath I am trying is (Which works if the duplicates nodes are following. But if you have duplicates else where(deep down), then it is ignored:
<xsl:variable name="uniqueList" select="//msg[not(normalize-space(substring-before(., '
')) = normalize-space(substring-before(following::msg, '
')))]" />
If you want to get each msg use //msg[starts-with(normalize-space(.), 'JIRA-1263')] to get output JIRA-1263 - did something and JIRA-1263 - did something 22 again.
And if you want to get any element with same codition use //*[starts-with(normalize-space(.), 'JIRA-1263')] which give same result as previous one.
At the end, if you want to get first msg with same condition use //logentry/msg[starts-with(normalize-space(.), 'JIRA-1263')][not(preceding::msg)] to get output JIRA-1263 - did something
You can define a key at the top level of your stylesheet that groups log entries by their first word:
<xsl:key name="logentryByCode" match="logentry"
use="substring-before(normalize-space(msg), ' ')" />
Now you need to select all logentry elements where either
the msg does not start JIRA-nnnn (where nnnn is a number) or
this entry is the first one whose msg starts with this word (i.e. the first occurrence of "JIRA-1234 - anything" for each ticket number)
(note that these two conditions need not be mutually exclusive):
<xsl:variable name="uniqueList" select="log/logentry[
(
not(
starts-with(normalize-space(msg), 'JIRA-') and
boolean(number(substring-before(substring(normalize-space(msg), 6), ' ')))
)
)
or
(
generate-id() = generate-id(key('logentryByCode',
substring-before(normalize-space(msg), ' '))[1])
)
]/msg" />
The boolean(number(...)) part checks whether a string of text can be parsed as a valid non-zero number (the text in this case being the part of the first word of the message that follows JIRA-), and the generate-id trick is a special case of the technique known as Muenchian grouping.
Equally, you could group the msg elements instead of the logentry elements, using match="msg" in the key definition and normalize-space(.) instead of normalize-space(msg).
And here another interpretation of what you try to do.
Find any first logentry which start with JIRA-XXXX.
If this it right try this:
log/logentry[
starts-with(normalize-space(msg), 'JIRA-') and
not
(
substring-before( normalize-space(msg), ' ')= substring-before( normalize-space(preceding::msg), ' ')
)]
This will find any logentry which starts with JIRA- but has not preceding one with the same substring before the first space (JIRA-XXXX) in your example.