I am trying to match to automatically grade a student's answer to a question where the correct answer is:
read and execute for owner and read for everyone
The order of their answer doesn't matter so
read for everyone and read and execute for owner
is an acceptable answer. Then all the fluff (and, for) doesn't matter so I am really just looking for either of these
read execute owner read everyone
read everyone read execute owner
I can get a regex to accept either answer
(?=.*read.*execute.*owner)(?=.*read.*everyone)
But obviously that accepts more answer that are clearly wrong like "read execute owner read execute everyone". So I tried using the negative look-ahead for "execute" with everyone, but then it still matches the "execute" for owner and says no regex match.
Is there are way to accomplish what I am trying to do? Thanks.
Just make the and/for optional.
# (?=.*(\bread(?:\s+and)?\s+execute(?:\s+for)?\s+owner\b))(?=.*(\bread(?:\s+for)?\s+everyone\b))
(?=
.*
( # (1 start)
\b read
(?: \s+ and )?
\s+ execute
(?: \s+ for )?
\s+
owner \b
) # (1 end)
)
(?=
.*
( # (2 start)
\b read
(?: \s+ for )?
\s+ everyone \b
) # (2 end)
)
Edit: You could also allow for optionally any words between the
key words by excluding all the keywords from between the keywords.
Like this -
# (?=.*(\bread(?:\s+(?:(?!\b(?:read|execute|owner|everyone)\b).)+?)?\s+execute(?:\s+(?:(?!\b(?:read|execute|owner|everyone)\b).)+?)?\s+owner\b))(?=.*(\bread(?:\s+(?:(?!\b(?:read|everyone|execute|owner)\b).)+?)?\s+everyone\b))
(?=
.*
( # (1 start)
\b read
(?:
# Optional words Between Keywords -
# not any of this or the other ones keywords
\s+
(?:
(?!
\b
(?:
read # this
| execute # this
| owner # this
| everyone # other
)
\b
)
.
)+?
)?
\s+ execute
(?:
# Optional words Between Keywords
\s+
(?:
(?!
\b
(?:
read
| execute
| owner
| everyone
)
\b
)
.
)+?
)?
\s+
owner \b
) # (1 end)
)
(?=
.*
( # (2 start)
\b read
(?:
# Optional words Between Keywords -
# not any of this or the other ones keywords
\s+
(?:
(?!
\b
(?:
read # this
| everyone # this
| execute # other
| owner # other
)
\b
)
.
)+?
)?
\s+ everyone \b
) # (2 end)
)
Related
I am trying to pull numbers between 2 specific words using regex. The problem is that they are multiline. I am trying to extract these from a PDF so it has to be between these 2 words only
WORD1:
(23)
(56)
(78)
END
I tried this
\((.*?)\) and it pulls the numbers between () but I need it to only search between the words WORD1 and END instead of the whole PDF.
Is there a way to do it ?
Expected Output:
23
56
78
Use the \G construct
(?s)(?:(WORD1:)(?=(?:(?!WORD1:|END).)*?\d(?:(?!WORD1:|END).)*END)|(?!^)\G)(?:(?!\d|WORD1:|END).)*?\K\d+
https://regex101.com/r/il00WG/1
Explained
(?s) # Dot-all inline modifier
(?:
( WORD1: ) # (1), Flag start of new set
(?= # Lookahead, must be a digit before the END
(?:
(?! WORD1: | END )
.
)*?
\d
(?:
(?! WORD1: | END )
.
)*
END
)
| # OR,
(?! ^ )
\G # Start where last match left off
)
(?:
(?! \d | WORD1: | END ) # Go past non-digits
.
)*?
\K # Ignor previous match up to here
\d+ # Digits, the only match
You need to include the global modifiers gm in your regex to match what you need.
https://regex101.com/r/c3VLdq/1
(\(.*?\))/gm
m is for multiline
m modifier: multi line. Causes ^ and $ to match the begin/end of each line
I had a similar issue, what I used are LookAhead (?=) and LookBehind(?<=)
So in your case it would look like this (if Lookbehind is supported)
(?<=WORD1:\n)(.*\n)+(?=END)
Note the new line symbol after WORD1: if that symbol is omitted, you will get result starting from the line break
tested here
https://regex101.com/r/qxPQqq/4
I am trying to get this regex dialed-in to validate whether a URL begins with https and if a port is supplied the only valid values are 443 or 5443. This regex is pretty close but not quite there.
^(https:\/\/)([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,})(:5{0,1}443)?(.)*
How do I solve this problem?
This is a mainstream URL validator that tests if it's between whitespace boundary's.
It only allows https device and the port numbers 5443 or 443.
(?<!\S)https://(?:\S+(?::\S*)?#)?(?:(?:(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))|localhost)(?::5?443)?(?:/[^\s]*)?(?!\S)
Readable version
(?<! \S )
https ://
(?:
\S+
(?: : \S* )?
#
)?
(?:
(?:
(?:
[1-9] \d?
| 1 \d\d
| 2 [01] \d
| 22 [0-3]
)
(?:
\.
(?: 1? \d{1,2} | 2 [0-4] \d | 25 [0-5] )
){2}
(?:
\.
(?:
[1-9] \d?
| 1 \d\d
| 2 [0-4] \d
| 25 [0-4]
)
)
| (?:
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)
(?:
\.
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)*
(?:
\.
(?: [a-z\u00a1-\uffff]{2,} )
)
)
| localhost
)
(?: : 5? 443 )?
(?: / [^\s]* )?
(?! \S )
You should append a / after this optional port group so it doesn't allow any digits before a /. Try using this regex,
^(https:\/\/)([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,})(:5?443)?\/\S*
Notice, I've also changed (:5{0,1}443)? to (:5?443)? and changed last .* to \S* so the url doesn't capture spaces as spaces in URL is not a valid thing. Besides that, you can also get rid of so many groups in your regex, unless you need them.
Regex Demo
Edit:
As you said in comments, that you want to match following URLs too,
https://example.com
https:example.com
https:example.com:443
you need to make \/\S* part optional by placing a ? after them. The modified regex becomes this, which will match above URLs.
^https:\/\/([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,})(:5?443)?(\/\S*)?
Demo with filepath part being optional
Your RegEx seems to work okay. You may try using this RegEx and add additional boundaries, just for safety, if you wish so:
^(https:\/\/)([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,}):(5443|443)?$
I only added a $ end char so that to bound your original expression from the right. You may just simply add a few port numbers, if you may have, in this capturing group:
(5443|443)
You can also remove unnecessary boundaries, if you wish.
I have this pattern (?<!')(\w*)\((\d+|\w+|.*,*)\) that is meant to match strings like:
c(4)
hello(54, 41)
Following some answers on SO, I added a negative lookbehind so that if the input string is preceded by a ', the string shouldn't match at all. However, it still partially matches.
For example:
'c(4) returns (4) even though it shouldn't match anything because of the negative lookbehind.
How do I make it so if a string is preceded by ' NOTHING matches?
Since nobody came along, I'll throw this out to get you started.
This regex will match things like
aa(a , sd,,,f,)
aa( as , " ()asdf)) " ,, df, , )
asdf()
but not
'ab(s)
This will fix the basic problem (?<!['\w])\w*
Where (?<!['\w]) will not let the engine skip over a word char just
to satisfy the not quote.
Then the optional words \w* to grab all the words.
And if a 'aaa( quote is before it, then it won't match.
This regex here embellishes what I think you are trying to accomplish
in the function body part of your regex.
It might be a little overwhelming to understand at first.
(?s)(?<!['\w])(\w*)\(((?:,*(?&variable)(?:,+(?&variable))*[,\s]*)?)\)(?(DEFINE)(?<variable>(?:\s*(?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')\s*|[^()"',]+)))
Readable version (via: http://www.regexformat.com)
(?s) # Dot-all modifier
(?<! ['\w] ) # Not a quote, nor word behind
# <- This will force matching a complete function name
# if it exists, thereby blocking a preceding quote '
( \w* ) # (1), Function name (optional)
\(
( # (2 start), Function body
(?: # Parameters (optional)
,* # Comma (optional)
(?&variable) # Function call, get first variable (required)
(?: # More variables (optional)
,+ # Comma (required)
(?&variable) # Variable (required)
)*
[,\s]* # Whitespace or comma (optional)
)? # End parameters (optional)
) # (2 end)
\)
# Function definitions
(?(DEFINE)
(?<variable> # (3 start), Function for a single Variable
(?:
\s*
(?: # Double or single quoted string
"
[^"\\]*
(?: \\ . [^"\\]* )*
"
|
'
[^'\\]*
(?: \\ . [^'\\]* )*
'
)
\s*
| # or,
[^()"',]+ # Not quote, paren, comma (can be whitespace)
)
) # (3 end)
)
I have this iframe code that I want to match for both the text right in the beginning of the string and continue with the code to find the "soundcloud" text:
<iframe src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/297769462&color=%23ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false&show_teaser=true" width="100%" height="166" frameborder="no" scrolling="no"></iframe>
My regex, which is: (<iframe.*?><\/iframe>), which tries to match the iframe and anything in between.
What I want is the + skip everything in between until it finds soundcloud. If both conditions are fulfilled, then it's a match.
Any help would be great thank you.
Try this
(?i)<iframe(?=((?:[^>"']|"[^"]*"|'[^']*')*?\s(src\s*=\s*(['"])(?:(?!\3)[\S\s])*?soundcloud(?:(?!\3)[\S\s])*\3)(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\1\s*</iframe\s*>
https://regex101.com/r/KkJH6x/1
Formatted
(?i) # Case insensitive modifier
< iframe # The iframe tag
(?= # Asserttion (a pseudo atomic group)
( # (1 start)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s
( # (2 start), src attribute with 'soundcloud' in value
src \s* = \s*
( ['"] ) # (3), Quote
(?:
(?! \3 )
[\S\s]
)*?
soundcloud # 'Soundcloud'
(?:
(?! \3 )
[\S\s]
)*
\3 # Close quote
) # (2 end)
# The remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (1 end)
)
\1
\s*
</iframe \s* >
I have an input string ("My Email id is abc # gmail.com"). From the input string I need to validate Email id using Regex and need to replace it with (xxxxxxx).
I am using the below pattern but it doesn't work if the Email Id contains white Space.
\\w+([-+.']\\w+)*#\\w+([-.]\\w+)*\\.\\w+([-.]\\w+)*
Thanks.
If all you want to do is add whitespaces to word characters and maintain the original
regex integrity, it starts to get ugly:
// (?=\\s*\\w)[\\w\\s]+(?:[-+.'](?=\\s*\\w)[\\w\\s]+)*#(?=\\s*\\w)[\\w\\s]+(?:[-.](?=\\s*\\w)[\\w\\s]+)*\\.(?=\\s*\\w)[\\w\\s]+(?:[-.](?=\\s*\\w)[\\w\\s]+)*
(?= \s* \w )
[\w\s]+
(?:
[-+.']
(?= \s* \w )
[\w\s]+
)*
#
(?= \s* \w )
[\w\s]+
(?:
[-.]
(?= \s* \w )
[\w\s]+
)*
\.
(?= \s* \w )
[\w\s]+
(?:
[-.]
(?= \s* \w )
[\w\s]+
)*