AIML - context - why does context not have highest priority in all cases? - aiml

When using an AIML context (via <that>) I get some conversations I cannot explain.
I expected that a (that) context would have priority over anything else.
Below I first show the script. Then I show a few conversations. I marked the inexpected parts with a // behind the response.
I added this Aiml file to the standard ALICE conversations.
The script:
<category><pattern>STEP 1</pattern>
<template>Step 2</template>
</category>
<category><pattern>YES</pattern><that>STEP 2</that>
<template>step 3</template>
</category>
<category><pattern>NO</pattern><that>STEP 2</that>
<template>step 3</template>
</category>
<category><pattern>*</pattern><that>STEP 2</that>
<template>step 3</template>
</category>
<category><pattern>*</pattern><that>STEP 3</that>
<template>Step 4! and you typed '<star/>'</template>
</category>
In the following conversation I marked the unexpected responses with // ?
Human : step 1
Robot : Step 2
Human : yes
Robot : step 3
Human : yes
Robot : Step 4! and you typed 'yes'
Human : step 1
Robot : Step 2
Human : no
Robot : step 3
Human : no
Robot : So. // ? I expected here step 4
Human : step 1
Robot : Step 2
Human : any
Robot : any is a name. // ? I expected here step 3
Can you explain both UNexpected flows of conversation?

The <that> element takes priority over other patterns at the same pattern level. I don't know if you're using AIML v1 or v2, but broadly speaking there are 3 levels of patterns [but see note below]
Most important level = patterns including underscore wildcards (_)
Middle level = atomic patterns without any wildcards
Lowest level = patterns including star wildcards (*)
Your unexpected responses are because there is an ALICE response at a higher priority level. Eg when robot replies "step 3" and human says "no", you want <pattern>*</pattern><that>STEP 3</that> category to take effect. But if there is an ALICE response at a higher level (eg <pattern>NO</pattern> or <pattern>STEP _</pattern>) the ALICE responses will take effect over your level 3 category <pattern>*</pattern><that>STEP 3</that>. The quickest way of finding the ALICE category is just to ask "NO" and see what the bot replies. You could also search the ALICE files but this would be very time consuming.
[note] In AIML v2 there are at least two extra levels: level 0 above underscore wildcards, and level 2.5 using pattern side sets. However the simpler levels of AIML v1 explain your anomalies.

Related

AWS Metrics Filter pattern Extraction

I have awsService.log logs being sent to CloudWatch and I want to create a metric filter to extract the error value.
Example:
06/13/2020 07:35:33 : 578 : 3 : error occurs
05/13/2020 07:35:33 : 3 : 3 : error occurs
The error value I would like to extract is : 3
I tried with many regrex expressions like * : * : 3 : but it doesnot work.
Any help would be appreciated.
Unfortunately no complex patterning (such as Regex) is currently supported with Metric Filters.
According to the documentation you have 3 choices
Trying to match based on an exact string ([": 3 :"])
Using JSON metric filters (not possible for your example as it requires JSON)
Filtering based on condition of this being a space separated event ([date, time, seperator1, int1, seperator2, int2=3, ...])
Regarding extracting the error value, Metrics Filters provide a count for every time this event occurs, they don't count values from the query itself.

AIML - topic - unexpected answer does not match with STAR (*)

When using the AB.jar Google reference (alice) bot:
When having this simple short script:
<category><pattern>TOPIC 1</pattern>
<template>Topic 2 with current topic '<get name="topic"/>'.<think><set name="topic">topic2</set></think></template>
</category>
<topic name="TOPIC2">
<category><pattern>YES</pattern>
<template>Going to topic3-yes <think><set name="topic">topic3-yes</set></think></template>
</category></topic>
<topic name="TOPIC2">
<category><pattern>*</pattern>
<template>Going to topic3-rest on '<star/>' <think><set name="topic">topic3-rest</set></think></template>
</category></topic>
... answering not 'yes' will not navigate to the topic-3 '*' pattern. Why is that?
This is the conversation. I marked the unexpected answer with '// here'
Human : topic 1
Robot : Topic 2 with current topic 'unknown'.
Human : any
Robot : any is a name. // here -- expected to go to topic-3-rest
Putting this '_' pattern (in stead of the '*' pattern) inside a topic answers the question.
Thanks to Ubercoder:
The element takes priority over other patterns at the same pattern level. I don't know if you're using AIML v1 or v2, but broadly speaking there are 3 levels of patterns [but see note below]
Most important level = patterns including underscore wildcards (_)
Middle level = atomic patterns without any wildcards
Lowest level = patterns including star wildcards (*)

Webscraping (potentially) ill-formated HTML in R with xpath or regex

I'm trying to extract the abstract from this link. However, I'm unable to extract only the content of the abstract. Here's what I accomplished so far:
url <- "http://www.scielo.br/scielo.php?script=sci_abstract&pid=S1981-38212013000100001&lng=en&nrm=iso&tlng=en"
textList <- readLines(url)
text <- textList[grep("Abstract[^\\:]", textList)] # get the correct element
text1 <- gsub("\\b(.*?)\\bISSN", "" , text)
Up to this point I got almost what I want, but then I couldn't get rid of the rest of the string that isn't of interest to me.
I even tried another approach, with xpath, but unsuccessfully. I tried something like the code below, but to no effect whatsoever.
library(XML)
arg.xpath <-"//p/#xmlns"
doc <- htmlParse( url) # parseia url
linksAux <- xpathSApply(doc, arg.xpath)
How can I accomplih what I want, either with regex or xpath, or maybe both?
ps.: my general aim is webscraping of several similar pages like the one I provided. I alredy can extract the link. I only need to get the abstract now.
free(doc)
I would strongly recommend the XML approach because regular expressions with HTML can be quite a headache. I think your xpath expression was just a bit off. Try
doc <- htmlParse(url)
xpathSApply(doc, "//p[#xmlns]", xmlValue)
This returns (clipped for length)
[1] "HOLLANDA, Cristina Buarque de. Human rights ..."
[2] "This article is dedicated to recounting the main ..."
[3] "Keywords\n\t\t:\n\t\tHuman rights; transitional ..."
[4] ""
someone better could give you a better answer but this kinda works
reg=regexpr("<p xmlns=\"\">(.*?)</p>",text1)
begin=reg[[1]]+12
end=attr(reg,which = "match.length")+begin-17
substr(text1,begin,end)
Here is another approach, which is klunky as written, but offers the technique of keeping the right parts after splitting at tag tokens:
text2 <- sapply(strsplit(x = text1, ">"), "[", 3)
text2
[1] "This article is dedicated to recounting the main initiative of Nelson Mandela's government to manage the social resentment inherited from the segregationist regime. I conducted interviews with South African intellectuals committed to the theme of transitional justice and with key personalities who played a critical role in this process. The Truth and Reconciliation Commission is presented as the primary institutional mechanism envisioned for the delicate exercise of redefining social relations inherited from the apartheid regime in South Africa. Its founders declared grandiose political intentions to the detriment of localized more palpable objectives. Thus, there was a marked disparity between the ambitious mandate and the political discourse about the commission, and its actual achievements.</p"
text3 <- sapply(strsplit(text2, "<"), "[", 1)

Counting unique login using Map Reduce

Let say I have a very big log file with this kind of format( based on where a user login )
UserId1 , New York
UserId1 , New Jersey
UserId2 , Oklahoma
UserId3 , Washington DC
....
userId999999999, London
Note that UserId1 logged in New York first and then he flied to New Jersey and logged again from there.
If I need to get how many unique user login (means 2 login will same userid considered as 1 login), how should I map and reduce it?
My initial plan is that I want to map it first to this kind of format :
UserId1, 1
UserId1, 1
UserId2, 1
UserId3, 1
And then reduce it to
UserId1, 2
UserId2, 1
UserId3, 1
But would this cause the output to be still big in number (Especially if common behaviour of user is to login 1 or 2 times a day ). Or is there a better way to implement this?
Do map-reduce.
For example, you have 10000 lines of data, but you can only process 1000 lines of data in a time.
Then, process 1000 lines of data for 10 times.
If the sum of lines of the 10 processing's result > 1000:
do the above step again.
else:
use set directly.
I recommend making use of a custom key in the map phase. You can refer the tutorial here for writing and using custom keys. The custom key should have two parts 1) userid 2)placeid. So essentially in the mapper phase you are doing this.
emit(<userid, place>, 1)
In the reduce phase, you just have to access the key and emit the two parts of the key separately.

How to do ANDing of conditions in a regular expression?

I want to match and modify part of a string if following conditions are true:
I want to capture information regarding a project, like project duration, client, technologies used, etc..
So, I want to select string starting with word "project" or string may start with other words like "details of project" or "project details" or "project #1".
RegEx. should first look at word "project" and it should select the string only when few or all of the following words are found after word "project".
1) client
2) duration
3) environment
4) technologies
5) role
I want to select a string if it matches at least 2 of the above words. Words can appear in any order and if the string contains ANY two or three of these words, then the string should get selected.
I have sample text given below.
Details of Projects :
*Project #1: CVC – Customer Value Creation (Sep 2007 – till now) Time
Warner Cable is the world's leading
media and entertainment company, Time
Warner Cable (TWC) makes coaxial
quiver.
Client : Time Warner Cable,US. ETL
Tool : Informatica 7.1.4
Database : Oracle 9i.
Role : ETL Developer/Team Lead.
O/S : UNIX.
Responsibilities: Created Test Plan and Test Case Book. Peer reviewed team members > Mappings. Documented Mappings. Leading the Development Team. Sending Reports to onsite. Bug >fixing for Defects, Data and Performance related.
Details of Project #2: MYER – Sales
Analysis system (Nov 2005 – till now)
Coles Myer is one of Australia's largest retailers with more than 2,000 > stores throughout Australia,
Client : Coles Myer
Retail, Australia. ETL Tool :
Informatica 7.1.3 Database : Oracle
8i. Role : ETL Developer. O/S :
UNIX. Responsibilities: Extraction,
Transformation and Loading of the data
using Informatica. Understanding the
entire source system.
Created and Run Sessions and
Workflows. Created Sort files using
Syncsort Application.*
Does anyone know how to achieve this using regular expressions?
Any clues or regular expressions are welcome!
Many thanks!
(client|duration|environment|technologies|role).+(client|duration|environment|technologies|role)(?!\1)
I would break it down into a few simpler regex's to get these results. The first would select only the chunk of text between projects: (?=Project #).*(?<=Project #)
With the match that this produces, i would run a seperate regex to ask if it contains any of those words : client | duration | environment | technologies | role
If this match comes back with a count of more then 2 distinct matches, you know to select the original string!
Edit:
string originalText;
MatchCollection projectDescriptions = Regex.Matches(originalText, "(?=Project #).(?:(?!Project #).)*", RegexOptions.IgnoreCase | RegexOptions.Singleline);
Foreach(Match projectDescription in projectDescriptions)
{
MatchCollection keyWordMatches = Regex.Matches(projectDescription.value, "client | duration | environment | technologies | role ", RegexOptions.IgnoreCase);
if(keyWordMatches.Distinct.Count > 2)
{
//At this point, do whatever you need to with the original projectDescription match, the Match object will give you the index etc of the match inside the original string.
}
}
Maybe you need to break that requirements in two steps: first, take your key/value pairs from your string, than apply your filter.
string input = #"Project #...";
Regex projects = new Regex(#"(?<key>\S+).:.(?<value>.*?\.)");
foreach (Match project in projects.Matches(input))
{
Console.WriteLine ("{0} : {1}",
project.Groups["key" ].Value,
project.Groups["value"].Value);
}
Try
^(details of )?project.*?((client|duration|environment|technologies|role).*?){2}.*$
One note: This will also match if only one of the terms appears twice.
In C#:
foundMatch = Regex.IsMatch(subjectString, #"\A(?:(details of )?project.*?((client|duration|environment|technologies|role).*?){2}.*)\Z", RegexOptions.Singleline | RegexOptions.IgnoreCase);