I have an application where grammar school teachers can place an answer box on a page after a question. The answer box is configured by the teacher with an answer line that specifies acceptable answers to the question. I don't expect them to give me a valid regular expression for the answer so I let them write the answer in a simplified form, where a '*' represents 0 or more of anything and ',' separates multiple acceptable answers. So an answer line that contained
*cup,glass
would accept 'teacup' , 'coffee cup' , 'cup' or 'glass' but not 'cup holder'.
Is there a way I can map the answer line they provide to a single regex that I can compare the student's answer with to give me a true or false answer, i.e., it's an acceptable answer to the question, or it isn't?
Thanks
The language isn't specified in the question as I write this - the exact form of the answer will depend heavily on that. Let's assume JavaScript, as most of the poster's tags seem JavaScript-related.
function toRegexp(e) {
return new RegExp(
"^(?:"+
e.split(/,/).map(
function(x){
return x.replace(/([.?+^$[\]\\(){}|-])/g, "\\$1").replace(/\*/g,".*");
}
).join("|")+
")$", "i");
}
(With thanks to this answer for the bit that escapes the special characters.)
I'm not sure what language you are doing this in but given an input string e.g.*cup,glass
add ^( to the start
add )$ to the end
replace all * with .*
replace all , with |
Giving ^(.*cup|glass)$.
All of those steps should be pretty trivial in any language.
Thanks for all your input. The solution I arrived at is http://jsfiddle.net/76zXf/12/. This seems to do everything I asked for, plus allow any number of spaces before and after the correct answer, and allow "+" and "-" in the answer. (So an Answer Line of "3,+3,-3" works fine for the question "What is the square root of 9?") I was able to do it with a simpler build function than proprelkey posted:
$('#answerLine').change(function() {
s = this.value.replace(/\*/,'.*'); // change all "*" for ".*"
s = s.replace(/\+/,'\\+'); // change all "+" for "\+*"
s = s.replace(/\-/,'\\-'); // change all "-" for "\-*"
a1 = s.split(/,/); // get individual terms into an array
a2 = a1.map( // for each term . . .
function(x){
exp = '^\\s*' + x + '\\s*$'; // build complete reg expression
return( exp ); // return this expression to array
}
);
re = RegExp( a2.join("|"),'i'); // our final, complete regExp
I hope I'm not missing any important cases. Thanks again.
Related
I'm attempting to write a regex to prevent certain user input in mathematical expressions. (e.g. '1+1' would be valid whereas'1++1' should be invalidated)
Acceptable characters include *digits 0-9* (\d works in lieu of 0-9), + - # / ( ) and white-spaces.
I've attempted to put together a regex but I cant find anything in python regular expression syntax that would validate (or consequently invalidate certain characters when typed together.
(( is ok
++, --, +-, */, are not
I hope there is a simple way to do this, but I anticipate if there isn't, I will have to write regex's for every possible combination of characters I don't want to allow together.
I've tried:
re.compile(r"[\d\s*/()+-]")
re.compile(r"[\d]\[\s]\[*]\[/]\[(]\[)]\[+]\[-]")
I expect to be able to invalidate the expression if someone were to type "1++1"
Edit: Someone suggested the below link is similar to my question...it is not :)
Validate mathematical expressions using regular expression?
Probably the way to go is by inverting your logic:
abort if the regex detects any invalid combination - those are much less compared to the amount of valid combinations.
So e.g.:
re.compile(r"++")
Also, is it possible at all to enumerate all valid terms? If the length of the term is not limit, it is impossible to enumerate all vaild terms
Perhaps one option might be to check the string for the unwanted combinations:
[0-9]\s*(?:[+-][+-]|\*/)\s*[0-9]
Regex demo | Python demo
For example
pattern = r"[0-9]\s*(?:[+-][+-]|\*/)\s*[0-9]"
strings = [
'This is test 1 -- 1',
'This is test 2',
'This is test 3+1',
'This is test 4 */4'
]
for s in strings:
res = re.search(pattern, s)
if not res:
print("Valid: " + s)
Result
Valid: This is test 2
Valid: This is test 3+1
Below is a snippet from my code. This is hardly the solution I was originally looking for but it does accomplish what I was trying to do. When a user curls to an api endpoint 1++1, it will return "Forbidden" based on the below regex for "math2 =...." Alternatively, it will return "OK" if a user curls 1+1. I hope I am understanding how Stack Overflow works and have formatted this properly...
# returns True if a valid expression, False if not.
def validate_expression(calc):
math = re.compile(r"^[\d\s*/()+-]+$")
math2 = re.compile(r"^[\d++\d]+$")
print('2' " " 'validate_expression')
if math.search(calc) is not None and math2.search(calc) is None:
return True
else:
return False
I'm trying to determine whether a term appears in a string.
Before and after the term must appear a space, and a standard suffix is also allowed.
Example:
term: google
string: "I love google!!! "
result: found
term: dog
string: "I love dogs "
result: found
I'm trying the following code:
regexPart1 = "\s"
regexPart2 = "(?:s|'s|!+|,|.|;|:|\(|\)|\"|\?+)?\s"
p = re.compile(regexPart1 + term + regexPart2 , re.IGNORECASE)
and get the error:
raise error("multiple repeat")
sre_constants.error: multiple repeat
Update
Real code that fails:
term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'
regexPart1 = r"\s"
regexPart2 = r"(?:s|'s|!+|,|.|;|:|\(|\)|\"|\?+)?\s"
p = re.compile(regexPart1 + term + regexPart2 , re.IGNORECASE)
On the other hand, the following term passes smoothly (+ instead of ++)
term = 'lg incite" OR author:"http+www.dealitem.com" OR "for sale'
The problem is that, in a non-raw string, \" is ".
You get lucky with all of your other unescaped backslashes—\s is the same as \\s, not s; \( is the same as \\(, not (, and so on. But you should never rely on getting lucky, or assuming that you know the whole list of Python escape sequences by heart.
Either print out your string and escape the backslashes that get lost (bad), escape all of your backslashes (OK), or just use raw strings in the first place (best).
That being said, your regexp as posted won't match some expressions that it should, but it will never raise that "multiple repeat" error. Clearly, your actual code is different from the code you've shown us, and it's impossible to debug code we can't see.
Now that you've shown a real reproducible test case, that's a separate problem.
You're searching for terms that may have special regexp characters in them, like this:
term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'
That p++ in the middle of a regexp means "1 or more of 1 or more of the letter p" (in the others, the same as "1 or more of the letter p") in some regexp languages, "always fail" in others, and "raise an exception" in others. Python's re falls into the last group. In fact, you can test this in isolation:
>>> re.compile('p++')
error: multiple repeat
If you want to put random strings into a regexp, you need to call re.escape on them.
One more problem (thanks to Ωmega):
. in a regexp means "any character". So, ,|.|;|:" (I've just extracted a short fragment of your longer alternation chain) means "a comma, or any character, or a semicolon, or a colon"… which is the same as "any character". You probably wanted to escape the ..
Putting all three fixes together:
term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'
regexPart1 = r"\s"
regexPart2 = r"(?:s|'s|!+|,|\.|;|:|\(|\)|\"|\?+)?\s"
p = re.compile(regexPart1 + re.escape(term) + regexPart2 , re.IGNORECASE)
As Ωmega also pointed out in a comment, you don't need to use a chain of alternations if they're all one character long; a character class will do just as well, more concisely and more readably.
And I'm sure there are other ways this could be improved.
The other answer is great, but I would like to point out that using regular expressions to find strings in other strings is not the best way to go about it. In python simply write:
if term in string:
#do whatever
i have an example_str = "i love you c++" when using regex get error multiple repeat Error. The error I'm getting here is because the string contains "++" which is equivalent to the special characters used in the regex. my fix was to use re.escape(example_str ), here is my code.
example_str = "i love you c++"
regex_word = re.search(rf'\b{re.escape(word_filter)}\b', word_en)
Also make sure that your arguments are in the correct order!
I was trying to run a regular expression on some html code. I kept getting the multiple repeat error, even with very simple patterns of just a few letters.
Turns out I had the pattern and the html mixed up. I tried re.findall(html, pattern) instead of re.findall(pattern, html).
A general solution to "multiple repeat" is using re.escape to match the literal pattern.
Example:
>>>> re.compile(re.escape("c++"))
re.compile('c\\+\\+')
However if you want to match a literal word with space before and after try out this example:
>>>> re.findall(rf"\s{re.escape('c++')}\s", "i love c++ you c++")
[' c++ ']
Scenario:
The user can enter any number of parentheses pairs into an equation in String format. However I need to check to be sure that all parentheses ( or ) have an adjacent multiplier symbol *. Hence 3( should be 3*( and )3 should be )*3.
I need to replace all occurrences of possible n( with n*( and )n with )*n.
Example:
1+5(3+4)7/2 ---> 1+5*(3+4)*7/2
What is the correct regex what to do this?
I was thinking of something like [0-9]\( & \)[0-9].
But I don't know the full syntax of search for all assurances of patterns to be replaced with * insert.
Without the regex pain (but maybe not the most beautiful solution) :
equation = '1+5(3+4)7/2'
output = ''
for index, char in enumerate(equation):
if char == '(' and equation[index-1] != '*' or equation[index-1] == ')' and char != '*':
output += '*'
output += char
print('finally:', output)
Turning a comment into an answer.
With the input given so far, the following seems to work:
For the original request regarding n(and )n:
(?<=\d)(?=\()|(?<=\))(?=\d)
To handle )( as well:
(?<=\d)(?=\()|(?<=\))(?=[\d(])
It uses two sets of positive lookarounds.
It does no verification of any sort to be in the middle of an equation / mathematical term.
See it in action:
RegEx101 for n(and )n.
RegEx101 with )( added.
Please comment if adjustment / further detail is required.
edit - misread the question.
Didn't see any regex capability in swift, so I assume its using something imported (or visa-versa).
Not using advanced assertions, one way is to do it in two passes.
Pass 1:
Find: ([\d)])(\()
Replace: $1*$2
Pass 2:
Find: (\))([\d(])
Replace: $1*$2
I want to use a regular expression that would do the following thing ( i extracted the part where i'm in trouble in order to simplify ):
any character for 1 to 5 first characters, then an "underscore", then some digits, then an "underscore", then some digits or dot.
With a restriction on "underscore" it should give something like that:
^([^_]{1,5})_([\\d]{2,3})_([\\d\\.]*)$
But i want to allow the "_" in the 1-5 first characters in case it still match the end of the regular expression, for example if i had somethink like:
to_to_123_12.56
I think this is linked to an eager problem in the regex engine, nevertheless, i tried to do some lazy stuff like explained here but without sucess.
Any idea ?
I used the following regex and it appeared to work fine for your task. I've simply replaced your initial [^_] with ..
^.{1,5}_\d{2,3}_[\d\.]*$
It's probably best to replace your final * with + too, unless you allow nothing after the final '_'. And note your final part allows multiple '.' (I don't know if that's what you want or not).
For the record, here's a quick Python script I used to verify the regex:
import re
strs = [ "a_12_1",
"abc_12_134",
"abcd_123_1.",
"abcde_12_1",
"a_123_123.456.7890.",
"a_12_1",
"ab_de_12_1",
]
myre = r"^.{1,5}_\d{2,3}_[\d\.]+$"
for str in strs:
m = re.match(myre, str)
if m:
print "Yes:",
if m.group(0) == str:
print "ALL",
else:
print "No:",
print str
Output is:
Yes: ALL a_12_1
Yes: ALL abc_12_134
Yes: ALL abcd_134_1.
Yes: ALL abcde_12_1
Yes: ALL a_123_123.456.7890.
Yes: ALL a_12_1
Yes: ALL ab_de_12_1
^(.{1,5})_(\d{2,3})_([\d.]*)$
works for your example. The result doesn't change whether you use a lazy quantifier or not.
While answering the comment ( writing the lazy expression ), i saw that i did a mistake... if i simply use the folowing classical regex, it works:
^(.{1,5})_([\\d]{2,3})_([\\d\\.]*)$
Thank you.
I'm using the following regular expression to find the exact occurrences in infinitives. Flag is global.
(?!to )(?<!\w) (' + word_to_search + ') (?!\w)
To give example of what I'm trying to achieve
looking for out should not bring : to outlaw
looking for out could bring : to be out of line
looking for to should not bring : to etc. just because it matches the first to
I've already done these steps, however, to cross out/off should be in the result list too. Is there any way to create an exception without compromising what I have achieved?
Thank you.
I'm still not sure I understand the question. You want to match something that looks like an infinitive verb phrase and contains the whole word word_to_search? Try this:
"\\bto\\s(?:\\w+[\\s/])*" + word_to_search + "\\b"
Remember, when you create a regex in the form of a string literal, you have to escape the backslashes. If you tried to use "\b" to specify a word boundary, it would have been interpreted as a backspace.
I know OR operator but the question was rather how to organize the structure so it can look ahead and behind. I'm going to explain what I have done so far
var strPattern:String = '(?!to )(?<!\w) (' + word_to_search + ') (?!\w)|';
strPattern+='(?!to )(?<!\w) (' + word_to_search + '\/)|';
strPattern+='(?!to )(\/' + word_to_search + ')';
var pattern:RegExp = new RegExp(strPattern, "g");
First line is the same line in my question, it searches structures like to bail out for cases where you type out. Second line is for matching structures like to cross out/off. But we need something else to match to cross out/off if the word is off. So, the third line add that extra condition.