Dialogflow: Regexp entity not matched - regex

I am going crazy with this problem, I am sure I am missing something...
I would like to match words that start with 2 characters or digits, followed by 1 or more character/digit/slash.
Some examples:
AM9
B9C
AS/1
etc...
So I have created an entity, let's say EntityOne as follows according to some RegExp tests (I have also tested the same regexp surrounded by "()", all tested on https://regex-golang.appspot.com/assets/html/index.html that it seems to use re2):
and a test Intent with params defined as follows:
REQUIRED | PARAM NAME | ENTITY | VALUE | IS LIST | PROMPTS
yes | name | #EntityOne | $value | no | test:
And inside this intent I try with words similar to the examples above that should be matched.
But I see the prompt "test:" over and over, the entity is never matched.
Any hints please? Tell me if you want me to share additional info, but I think that there is nothing much to share. Thanks in advance

Related

How to replace a period within string and not in Numeric using singlestore (MemSQL) DB REGEXP_REPLACE function

I have a scenario wherein I want to replace a period when its surrounded by Alphabets and not when surrounded by Numbers. I figured out a Regular Expression pattern that can identify only the periods in Key names but the pattern is not working in SQL
SELECT REGEXP_REPLACE("Amount.fee:0.75,Amount.tot:645.55","(?<!\d)(\.)(?!\d)","_","ig");
Expected output: Amount_fee:0.75,Amount_tot:645.55
Note, I am trying this because, In MemSQL I couldn't access JSON key when it has period in it.
Also verified the pattern "(?<!\d)(.)(?!\d)" using https://coding.tools/regex-replace and it working fine. But, SQL is not working. Am using MemSQL 7.1.9 and POSIX Enhanced Regular expression are supposed to be work. Any help is much appreciated.
Since it looks like you are trying to workaround accessing a JSON key with a period, I will show you how to do that.
This can be done by either surrounding the json key name with backtics while using the shorthand json extract syntax:
select col::%`Amount.fee` from (select '{"Amount.fee":0.75,"Amount.tot":645.55}' col);
+--------------------+
| col::%`Amount.fee` |
+--------------------+
| 0.75 |
+--------------------+
or by using the json_extract_ builtins directly:
select json_extract_double('{"Amount.fee":0.75,"Amount.tot":645.55}', 'Amount.fee');
+------------------------------------------------------------------------------+
| json_extract_double('{"Amount.fee":0.75,"Amount.tot":645.55}', 'Amount.fee') |
+------------------------------------------------------------------------------+
| 0.75 |
+------------------------------------------------------------------------------+
Assuming you only want to target dots that are in between two non digit characters, where the dot is not the first or last character in the string, you may match on ([^\d])\.([^\d]) and replace with \1_\2:
SELECT REGEXP_REPLACE("Amount.fee:0.75,Amount.tot:645.55", "([^\d])\.([^\d])", "\1_\2", "ig");
Here is a regex demo showing that the replacement is working. Note that you might have to use $1_$2 instead of \1_\2 as the replacement, depending on the regex flavor of your SQL tool.

How to remove everything but certain words in string variable (Stata)?

I have a string variable response, which contains text as well as categories that have already been coded (categories like "CatPlease", "CatThanks", "ExcuseMe", "Apology", "Mit", etc.).
I would like to erase everything in response except for these previously coded categories.
For example, I would like response to change from:
"I Mit understand CatPlease read it again CatThanks"
to:
"Mit CatPlease CatThanks"
This seems like a simple problem, but I can't get my regex code to work perfectly.
The code below attempts to store the categories in a variable cat_only. It only works if the category appears at the beginning of response. The local macro, cats, contains all of the words I would like to preserve in response:
local cats = "(CatPlease|CatThanks|ExcuseMe|Apology|Mit|IThink|DK|Confused|Offers|CatYG)?"
gen cat_only = strltrim(strtrim(ustrregexs(1)+" "+ustrregexs(2)+" "+ustrregexs(3))) if ustrregexm(response, "`cats'.+?`cats'.+?`cats'")
If I add characters to the beginning of the search pattern in ustrregexm, however, nothing will be stored in cat_only:
gen cat_only = strltrim(strtrim(ustrregexs(1)+" "+ustrregexs(2)+" "+ustrregexs(3))) if ustrregexm(response, ".+?`cats'.+?`cats'.+?`cats'")
Is there a way to fix my code to make it work, or should I approach the problem differently?
* Example generated by -dataex-. To install: ssc install dataex
clear
input str50 response
"I Mit understand CatPlease read it again CatThanks"
end
local regex "(?!CatPlease|CatThanks|ExcuseMe|Apology|Mit|IThink|DK|Confused|Offers|CatYG)\b[^\s]+\b"
gen wanted = strtrim(stritrim(ustrregexra(response, "`regex'", "")))
list
. list
+-------------------------------------------------------------------------------+
| response wanted |
|-------------------------------------------------------------------------------|
1. | I Mit understand CatPlease read it again CatThanks Mit CatPlease CatThanks |
+-------------------------------------------------------------------------------+
I don't regard myself as fluent with Stata's regex functions, but this may be helpful:
. clear
. set obs 1
number of observations (_N) was 0, now 1
. gen test = "I Mit understand CatPlease read it again CatThanks"
. local OK "(CatPlease|CatThanks|ExcuseMe|Apology|Mit|IThink|DK|Confused|Offers|CatYG)"
. ssc install moss
. moss test, match("`OK'") regex
. egen wanted = concat(_match*), p(" ")
. l wanted
+-------------------------+
| wanted |
|-------------------------|
1. | Mit CatPlease CatThanks |
+-------------------------+
Spaces can be handled using regex:
local words = "(?!CatPlease|CatThanks|ExcuseMe|Apology|Mit|IThink|DK|Confused|Offers|CatYG)\b\S+\b"
gen wanted = ustrregexra(response, "`words' | ?`words'", "")
This uses an alternation (a regex OR which is coded |) to match trailing/leading spaces, with the leading space being optional to handle when the entire input is one of the target words.

Vim - regex for changing bool variable checking

I am working on a C project an I want to change all bool-variable checking from
if(!a)
to
if(a == false)
in order to make the code easier to read(I want to do the same with while statements).
Anyway I'm using the following regex, which searches for an exclamation mark followed by a lowercase character and for the last closing parenthesis on the line.
%s/\(.*\)!\([a-z]\)\(.*\))\([^)]+\)/\1\2\3 == false)\4/g
I'm sorry for asking you to look over it but i can't understand why it would fail.
Also, is there an easier way of solving this problem and of using vim regex in general?
One solution should be this one:
%s/\(.*\)(\(\s*\)!\(\w\+\))/\1(\3 == false)/gc
Here, we do the following:
%s/\(.*\)(\(\s*\)!\(\w\+\))/\1(\3 == false)/gc
\--+-/|\--+--/|\---+--/|
| | | | | finally test for a single `)`.
| | | | (\3): then for one or more word characters (the var name).
| | | the single `!`
| | (\2): then for any amount of white space before the `!`
| the single `(`
(\1): test for any characters before a single `(`
Then, it's replaced by the first, third pattern, and then appends the text == false, opening and closing the parentheses as needed.
To do this in vim, you could use the following:
%s/\(if(\)!\([^)]\+\)/\1\2==false/c
make sure that only if(!var)-constructs are matched, you could change that to while for the next task
c asks for confirmation for every occurence
As #Kent said this is not a small undertaking. However for the simple case of just if(!a) it can be done.
:%s/\<if(\zs!\(\k\+\)\ze)/\1 == false/c
Explanation:
Start by making sure if is at a word bound by \<. This ensures it isn't part of some function name.
\zs and \ze set the start and end of the match respectively.
Capture the variable via the keyword class \k (\w works too) ending up with \(\k\+\)
For extra safety use the c flag to confirm each substation.
Thoughts:
This will need to be updated for other constructs, e.g. while
May need to make alterations for extra white-space, e.g. \<if\s*(\s*\zs!\(\k\+\)\ze\s*)
May want to use [a-z0-9_] instead of \k or \w to avoid capturing macros
There are instances where you may not have a construct: foo = !a && b;
This only handles the false cases. Doing a == true may be far trickier
Depending on your case it might be safest to just do the following:
:%s/!\([a-z0-9]\+\)/\1 == false/gc
On top of the answers already presented, I would say that the code does not smell like it needs refactoring. For a global regex replacement, the primary problem is to find
all bool-variables
and distinguish them from pointers, etc.

Regex for finding all namespaces in data

I need a regular expression (dubbed SOME_EXPRESSION below) that allows finding all namespaces for resources used as subject in a SPARQL 1.1 endpoint. The query should look like the following. How can I do this?
SELECT DISTINCT ?ns
WHERE
{
?s ?p ?o.
BIND(REPLACE(str(?s), SOME_EXPRESSION, "")) AS ?ns)
Filter(isURI(?s))
}
Since the harder part of this is processing the IRI strings, I'll show how you can do this for properties (which must be IRIs, so we don't need to check for isIRI). Adapting this to work with the IRIs of subjects won't be hard. However, there is one thing that needs some consideration: URIs for linked data typically (there's no hard requirement, but conventions do emerge) use prefixes that end in / or in #. Whether one is better than the other is the subject of plenty of debate and discussion (e.e., see section 4 of Cool URIs, or HashVsSlash). In general, you're going to want to replace everything after the final slash or hash with the final slash or hash. Since you can use groups in SPARQL's regex and replace, you can handle both cases with one replace:
select distinct ?ns where {
[] ?p [] .
bind( replace( str(?p), "(#|/)[^#/]*$", "$1" ) as ?ns )
}
This matches the regular expression (#|/)[^#/]*$ against the string form of the IRI, remembering # or / in the variable $1, and then grabs the rest of the characters (which must not contain # or /) up until the end of the string, and replaces the whole thing with $1, which is either # or /. For some data that I pulled from Linked Open British National Bibliography data, I get results like these:
$ sparql --query query.rq --data sample.nt
-----------------------------------------------------
| ns |
=====================================================
| "http://www.w3.org/2000/01/rdf-schema#" |
| "http://www.w3.org/1999/02/22-rdf-syntax-ns#" |
| "http://www.w3.org/2004/02/skos/core#" |
| "http://purl.org/ontology/bibo/" |
| "http://purl.org/dc/terms/" |
| "http://iflastandards.info/ns/isbd/elements/" |
| "http://www.bl.uk/schemas/bibliographic/blterms#" |
| "http://www.w3.org/2002/07/owl#" |
| "http://purl.org/NET/c4dm/event.owl#" |
-----------------------------------------------------
This seems like a reasonable set of namespace prefixes. In fact, when I look at the header of the RDF document, original namespaces included:
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:bibo="http://purl.org/ontology/bibo/"
xmlns:dct="http://purl.org/dc/terms/"
xmlns:isbd="http://iflastandards.info/ns/isbd/elements/"
xmlns:blt="http://www.bl.uk/schemas/bibliographic/blterms#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:event="http://purl.org/NET/c4dm/event.owl#"
As applied to your code, we end up with the following query. It's almost exactly what you wanted, since since there's just one regular expression that handles both cases (so just one thing to fill in for SOME_EXPRESSION. However, instead of replacing with "", you do have to replace with "$1". I hope that's not a terrible inconvenience, though.
SELECT DISTINCT ?ns
WHERE
{
?s ?p ?o.
BIND(REPLACE(str(?s), "(#|/)[^#/]*$", "$1") AS ?ns)
Filter(isURI(?s))
}
It's important to note, of course, that this is only a heuristic. A given IRI can be abbreviated using lots of different prefixes. This technique should give some relatively good results, though, because there are conventions that people tend to follow pretty well.

How can I use a regular expression to match something in the form 'stuff=foo' 'stuff' = 'stuff' 'more stuff'

I need a regexp to match something like this,
'text' | 'text' | ... | 'text'(~text) = 'text' | 'text' | ... | 'text'
I just want to divide it up into two sections, the part on the left of the equals sign and the part on the right. Any of the 'text' entries can have "=" between the ' characters though. I was thinking of trying to match an even number of 's followed by a =, but I'm not sure how to match an even number of something.. Also note I don't know how many entries on either side there could be. A couple examples,
'51NL9637X33' | 'ISL6262ACRZ-T' | 'QFN'(~51NL9637X33) = '51NL9637X33' | 'ISL6262ACRZ-T' | 'INTERSIL' | 'QFN7SQ-HT1_P49' | '()'
Should extract,
'51NL9637X33' | 'ISL6262ACRZ-T' | 'QFN'(~51NL9637X33)
and,
'51NL9637X33' | 'ISL6262ACRZ-T' | 'INTERSIL' | 'QFN7SQ-HT1_P49' | '()'
'227637' | 'SMTU2032_1' | 'SKT W/BAT'(~227637) = '227637' | 'SMTU2032_1' | 'RENATA' | 'SKT28_5X16_1-HT5_4_P2' | '()' :SPECIAL_A ='BAT_CR2032', PART_NUM_A='202649'
Should extract,
'227637' | 'SMTU2032_1' | 'SKT W/BAT'(~227637)
and,
'227637' | 'SMTU2032_1' | 'RENATA' | 'SKT28_5X16_1-HT5_4_P2' | '()' :SPECIAL_A ='BAT_CR2032', PART_NUM_A='202649'
Also note the little tilda bit at the end of the first section is optional, so I can't just look for that.
Actually I wouldn't use a regex for that at all. Assuming your language has a split operation, I'd first split on the | character to get a list of:
'51NL9637X33'
'ISL6262ACRZ-T'
'QFN'(~51NL9637X33) = '51NL9637X33'
'ISL6262ACRZ-T'
'INTERSIL'
'QFN7SQ-HT1_P49'
'()'
Then I'd split each of them on the = character to get the key and (optional) value:
'51NL9637X33' <null>
'ISL6262ACRZ-T' <null>
'QFN'(~51NL9637X33) '51NL9637X33'
'ISL6262ACRZ-T' <null>
'INTERSIL' <null>
'QFN7SQ-HT1_P49' <null>
'()' <null>
You haven't specified why you think a regex is the right tool for the job but most modern languages also have a split capability and regexes aren't necessarily the answer to every requirement.
I agree with paxdiablo in that regular expressions might not be the most suitable tool for this task, depending on the language you are working with.
The question "How do I match an even number of characters?" is interesting nonetheless, and here is how I'd do it in your case:
(?:'[^']*'|[^=])*(?==)
This expression matches the left part of your entry by looking for a ' at its current position. If it finds one, it runs forward to the next ' and thereby only matching an even number of quotes. If it does not find a ' it matches anything that is not an equal sign and then assures that an equal sign follows the matched string. It works because the regex engine evaluates OR constructs from left to right.
You could get the left and right parts in two capturing groups by using
((?:'[^']*'|[^=])*)=(.*)
I recommend http://gskinner.com/RegExr/ for tinkering with regular expressions. =)
As paxdiablo said, you almost certainly don't want to use a regex here. The split suggestion isn't bad; I myself would probably use a parser here—there's a lot of structure to exploit. The idea here is that you formally specify the syntax of what you have—sort of like what you gave us, only rigorous. So, for instance: a field is a sequence of non-single-quote characters surrounded by single quotes; a fields is any number of fields separated by white space, a |, and more white space; a tilde is non-right-parenthesis characters surrounded by (~ and ); and an expr is a fields, optional whitespace, an optional tilde, a =, optional whitespace, and another fields. How you express this depends on the language you are using. In Haskell, for instance, using the Parsec library, you write each of those parsers as follows:
import Text.ParserCombinators.Parsec
field :: Parser String
field = between (char '\'') (char '\'') $ many (noneOf "'\n")
tilde :: Parser String
tilde = between (string "(~") (char ')') $ many (noneOf ")\n")
fields :: Parser [String]
fields = field `sepBy` (try $ spaces >> char '|' >> spaces)
expr :: Parser ([String],Maybe String,[String])
expr = do left <- fields
spaces
opt <- optionMaybe tilde
spaces >> char '=' >> spaces
right <- fields
(char '\n' >> return ()) <|> eof
return (left, opt, right)
Understanding precisely how this code works isn't really important; the basic idea is to break down what you're parsing, express it in formal rules, and build it back up out of the smaller components. And for something like this, it'll be much cleaner than a regex.
If you really want a regex, here you go (barely tested):
^\s*('[^']*'((\s*\|\s*)'[^'\n]*')*)?(\(~[^)\n]*\))?\s*=\s*('[^']*'((\s*\|\s*)'[^'\n]*')*)?\s*$
See why I recommend a parser? When I first wrote this, I got at least two things wrong which I picked up (one per test), and there's probably something else. And I didn't insert capturing groups where you wanted them because I wasn't sure where they'd go. Now yes, I could have made this more readable by inserting comments, etc. And after all, regexen have their uses! However, the point is: this is not one of them. Stick with something better.