Pandorabots - Detect Phone, Email, etc - aiml

Good Day.
Does the Pandorabots AIML support complex REGEX? For example, a visitor typed "+1 (555) 123.4567" (only or inside some text) and the chatbot must understand that it's a phone#.
Is it possible to use something similar with:
.* (+?\d[.-\s]?\(?\d{3}\)?[.-\s]?\d{3}[.-\s]?\d{4}) .*
GET ONLY -> 1$
If NO - How to correctly detect the phone, email or user name in the user's response.

You can't use REGEX in Pandorabots and will have to write categories yourself to handle this. Here is a basic one that uses the built in set called "number" to recognise phone numbers in the format +1 nnn nnn nnn
<category>
<pattern>1 <set>number</set> <set>number</set> <set>number</set></pattern>
<template>Is that a phone number?</template>
</category>
You can create new sets yourself to validate input, so an improvement on my basic category would be to have a set of 3 digit numbers to validate against, rather than any numbers.
Similarly, you can check for emails by seeing if there are # and . characters in the input. Assuming you are using the standard substitutions, you can create a category like this:
<category>
<pattern>* AT * DOT *</pattern>
<template>Thanks for your email.</template>
</category>

Related

Strip dashes from a string?

For web scraping, I need to match the last part of a URL and replace "-" dashes with " " spaces.
Code looks like this...
<div class="tags">
<span class="tag" style="background-color: #5A214A;">
SA
</span>
</div>
I want to be left with "Service Assurance" (this part may contain multiple "-" dashes and require multiple replacements).
Currently being used:
Xpath:
//span[#class="tag"]/a/#href
Regex:
/.*/(.*)/
This produces "Service-Assurance", but does not strip out the "-".
I am told elsewhere that this replacement is not possible since I am already using Regex to find the string between the final "/" slashes.
Can I do both? Can I replace the "-" dashes at the end, too?
Regex is plain, inside an app called import.io, no particular language flavour.
Thank-you very much.
Try this xpath without the regex:
//*[#class='tag-wrapper']/input[1]/#value
althernatively you can also try these methods:
I scrape urls in google-sheets all the time with xpaths and regexes - so if you want to try:
=importXML("url goes here","//span[#class="tag"]/a/#href")
now then if you do at least get the url string back, then you know its working ad we can then modify it to this to get what you want:
=SUBSTITUTE(REGEXEXTRACT(importXML("url goes here","//span[#class="tag"]/a/#href"),".*\/(.*)\/$"),"-"," ")
Let me know if you have issues - there are a couple of weird quirks with google - but if you share the url your pulling that xpath in with I can at least test it myself - i use this method now more than any others, I used to use import.io and outwit hub etc a ton

CFINPUT - How to exclude O, o, I, i from input value

How can I force format in Coldfusion for alphanumeric characters to exclude O, o, I, i? Mask apparently doesn't work.
Update from comments:
The following example is how to force a format with characters and numbers, but it allows O & I. I would like to exclude these two.
<cfinput type="text" name="newPart" mask="EB-9999-XX-999999" />
Leigh and Dan are both correct. More info is needed, and you'll ultimately likely end up with some RegEx.
What is the end goal? Do you want to keep users from entering any character that looks like a zero or a one (my assumption)? That should probably also include L. What about 0 and 1 themselves? Do you want to change these characters as they are entered into a form field or after they are submitted by the form? If you mask them before submission, how will you let the user know they're submitting something different than what they're entering?
Based on my assumptions above, you could start with replacing them in the form field itself with Javascript.
<input name="formStr" type='text'
onKeyUp="this.value = this.value.replace(/o/ig,'0')
.replace(/l|i/ig,'1').replace(/[^0-9a-z]/ig,'')
">
Then also replace the entry before you do anything with it. I've always been a fan of using the Java String functions for string manipulation. It's very fast.
inStr = FORM.formStr.replaceAll("(?i)[o]","0")
.replaceAll("(?i)[il]","1")
.replaceAll("(?i)[^0-9a-z]","")
;
You don't want to just do the masking in the form field. Someone can still Right-Click >> Paste into that field and submit the form with invalid characters. Ctrl-V pasting does involve an onKeyUp event, so the text will get masked that way, but the Java replaceAll() takes care of it on the submission end.
Or if replacing all of the above characters with an empty string:
onKeyUp="this.value=this.value.replace(/[^0-9a-hjkmnp-z]/ig,'')"
and
inStr2 = FORM.formStr.replaceAll("(?i)[^0-9a-hjkmnp-z]","");
But again, without knowing more about how this input will be used, it's hard to give accurate help.
NOTE: The onKeyUp worked in Chrome and IE, but not in Firefox. Odd. It's been a while since I've written JS and I haven't dug too deeply into this one, so I don't know what I missed to make it fail. Perhaps someone else can shed light.
And an added FYI: Ben Nadel has a bunch of excellent information about using RegEx. And plenty of other sources. It's worth checking out.

Regular expression for url rewriting to exclude strings beginning with a year

I have a news page that detects tags based on the query string. So for instance, to filter out all news articles with a tag of 'Popular' I'd have:
<mydomain>/news/?tag=popular
I've set up a url rewrite in my config with the following:
<add name="newsrewrite"
virtualUrl="^~/news/(.*)"
rewriteUrlParameter="ExcludeFromClientQueryString"
destinationUrl="~/news?tag=$1"
ignoreCase="true" />
This works fine. However I've noticed that I now can't access specific news article urls because it treats anything after /news/ as a querystring parameter.
ie. if I try to access /news/2015/news-article-1 then it won't work because the rewrite rule is essentially treating 2015/news-article-1 as the parameter.
Since I've structured my news articles under year folders, all news articles will always be accessed via /news/YYYY/article-title where YYYY is a 4-digit year.
Is there a regular expression I can use here that'll take anything after /news/ and use that as the querystring param EXCEPT those that begin with a 4-digit integer?
Thanks!
If you are looking for a regexp that will work like yours with the exception that it won't match /news/YYYY/.. have a look at this:
^\/news\/(?!\d{4})(.*)$
Note: it makes use of a negative lookahead (check if they are supported in your specific case). Also notice escape characters \.
Reading your problem I also though about a different approach: what about mapping through your rewriting only pages that match the actual tag structure? Something like this:
<add name="newsrewrite"
virtualUrl="^~/news/?tag=(.*)"
rewriteUrlParameter="ExcludeFromClientQueryString"
destinationUrl="~/news?tag=$1"
ignoreCase="true" />
note that $1 will contain only the tag (not ?tag=Popular) like in your code. This should match only urls in the form /news/?tag=SOMETHING thus not matching your article pages.

pattern property of <input> and regex to match against doesn't work

I have an input element of the type text that I wish to allow only input that has a . between every other character.
I'm a bit obsessive about data integrity and wish to build in as many failsaves as I can, starting with the users.
The form is set up to receive input like this:
Firstname: |John____________________|
First Letters: |J.W.H.__________________|
Lastname: |Baker___________________|
I have the following regexp that works beautifully normally, but won't work in the pattern field of the form.
/.{1}\./g
This is the form that I am using to match against, but it won't let me pass if I fill in the correct pattern.
<input type="text" name="firstletters" pattern="/.{1}\./g" title="First name letters with a . in between each letter.">
Anyone have a solution?
The length of the string can vary depending how many first names the person being added to the database.
You should use groups with ^(start of string),$(end of string)
^(.\.)+$

regex for all characters on yahoo pipes

I have an apparently simple regex query for pipes - I need to truncate each item from it's (<img>) tag onwards. I thought a loop with string regex of <img[.]* replaced by blank field would have taken care of it but to no avail.
Obviously I'm missing something basic here - can someone point it out?
The item as it stands goes along something like this:
sample text title
<a rel="nofollow" target="_blank" href="http://example.com"><img border="0" src="http://example.com/image.png" alt="Yes" width="20" height="23"/></a>
<a.... (a bunch of irrelevant hyperlinks I don't need)...
Essentially I only want the title text and hyperlink that's why I'm chopping the rest off
Going one better because all I'm really doing here is making the item string more manageable by cutting it down before further manipulation - anyone know if it's possible to extract a href from a certain link in the page (in this case the 1st one) using Regex in Yahoo Pipes? I've seen the regex answer to this SO q but I'm not sure how to use it to map a url to an item attribute in a Pipes module?
You need to remove the line returns with a RegEx Pipe and replace the pattern [\r\n] with null text on the content or description field to make it a single line of text, then you can use the .* wildcard which will run to the end of the line.
http://www.yemkay.com/2008/06/30/common-problems-faced-in-yahoo-pipes/