Regular Expression for a single occurrence within a String - regex

I am new to Regular Expression and can't seem to do the proper syntax for what I need to do. I need regular expression for an alphanumeric string that can be 1-8 characters long and can contain at most 1 dash, but can't be a single dash alone.
Valid:
A-
-A
1234-678
ABC76-
Invalid:
-
F-1-
ABCD1234-
---
Thanks in advance!

One way. (Sorry if this is already posted)
# ^(?=[a-zA-Z0-9-]{1,8}$)(?=[^-]*-?[^-]*$)(?!-$).*$
^ # BOL
(?= [a-zA-Z0-9-]{1,8} $ ) # 1 - 8 alpha-num or dash
(?= [^-]* -? [^-]* $ ) # at most 1 dash
(?! - $ ) # not just a dash
.* $
Edit: Just extend it for segments separated by comma's
# ^(?!,)(?:(?=(?:^|,)[a-zA-Z0-9-]{1,8}(?:$|,))(?=(?:^|,)[^-]*-?[^-]*(?:$|,))(?!(?:^|,)-(?:$|,)),?[^,]*)+(?<!,)$
^ # BOL
(?! , ) # does not start with comma
(?: # Grouping
(?=
(?: ^ | , )
[a-zA-Z0-9-]{1,8} # 1 - 8 alpha-num or dash
(?: $ | , )
)
(?=
(?: ^ | , )
[^-]* -? [^-]* # at most 1 dash
(?: $ | , )
)
(?!
(?: ^ | , )
- # not just a dash
(?: $ | , )
)
,? [^,]* # consume the segment
)+ # Grouping, do many times
(?<! , ) # does not end with comma
$ # EOL
Edit2: If your engine doesn't support lookbehinds, this is same thing but without
# ^(?!,)(?:(?=(?:^|,)[a-zA-Z0-9-]{1,8}(?:$|,))(?=(?:^|,)[^-]*-?[^-]*(?:$|,))(?!(?:^|,)-(?:$|,))(?!,$),?[^,]*)+$
^ # BOL
(?! , ) # does not start with comma
(?: # Grouping
(?=
(?: ^ | , )
[a-zA-Z0-9-]{1,8} # 1 - 8 alpha-num or dash
(?: $ | , )
)
(?=
(?: ^ | , )
[^-]* -? [^-]* # at most 1 dash
(?: $ | , )
)
(?!
(?: ^ | , )
- # not just a dash
(?: $ | , )
)
(?! , $ ) # does not end with comma
,? [^,]* # consume the segment
)+ # End Grouping, do many times
$ # EOL

Try this regex:
/^(?!([^-]*-){2})[a-zA-Z0-9-]{1,8}$/
^ and $ are to match start and end.
(?!([^-]*-){2}) is a lookahead that makes sure that matching pattern has only one hyphen in it at the most.
[a-zA-Z0-9-]{1,8} match 1 to 8 alpha-numerals or -
Reference: http://regular-expressions.info

Related

how to match the iframe text, then skip and match another string in wordpress

I have this iframe code that I want to match for both the text right in the beginning of the string and continue with the code to find the "soundcloud" text:
<iframe src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/297769462&color=%23ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false&show_teaser=true" width="100%" height="166" frameborder="no" scrolling="no"></iframe>
My regex, which is: (<iframe.*?><\/iframe>), which tries to match the iframe and anything in between.
What I want is the + skip everything in between until it finds soundcloud. If both conditions are fulfilled, then it's a match.
Any help would be great thank you.
Try this
(?i)<iframe(?=((?:[^>"']|"[^"]*"|'[^']*')*?\s(src\s*=\s*(['"])(?:(?!\3)[\S\s])*?soundcloud(?:(?!\3)[\S\s])*\3)(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>))\1\s*</iframe\s*>
https://regex101.com/r/KkJH6x/1
Formatted
(?i) # Case insensitive modifier
< iframe # The iframe tag
(?= # Asserttion (a pseudo atomic group)
( # (1 start)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s
( # (2 start), src attribute with 'soundcloud' in value
src \s* = \s*
( ['"] ) # (3), Quote
(?:
(?! \3 )
[\S\s]
)*?
soundcloud # 'Soundcloud'
(?:
(?! \3 )
[\S\s]
)*
\3 # Close quote
) # (2 end)
# The remainder of the tag parts
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (1 end)
)
\1
\s*
</iframe \s* >

Powershell Regex to match between vertical bar ( | )

Below is just two lines of string that I am matching too
6 |UDP |ENABLED | |15006 |010.247.060.120 | UDP/IP Communications | UDP/IP Communications GH1870
10 |Gway |ONLINE | |41794 |127.000.000.001 | DM-MD64x64 | DM-MD64x64
Below is the regex I have so far, but it only matches the bottom line
(?i)(?<cipid>([\w\.]+))\s*\|\s*(?<ty>\w+)?\s*\|\s*(?<stat>[\w ]+)\s*\|\s*(?<devid>\w+)?\s*\|\s*(?<prt>\d+)\s*\|\s*(?<ip>([\d\.]+))\s*\|\s*(?<mdl>[\w-]+)\s*\|\s*(?<desc>.+)
I was wondering if I could have a regular expression that just matches every character between every vertical line, instead of having to explicitly say what is between the vertical lines
Thanks all
This usually works. (?:^|(?<=\|))[^|]*?(?=\||$)
https://regex101.com/r/KMNc47/1
Formatted
(?: ^ | (?<= \| ) ) # BOS or Pipe behind
[^|]*? # Optional non-pipe chars
(?= \| | $ ) # Pipe ahead or EOS
Here it is with whitespace trim and includes a capture group.
(?:^|(?<=\|))\s*([^|]*?)\s*(?=\||$)
https://regex101.com/r/KMNc47/2
Formatted
(?: ^ | (?<= \| ) ) # BOS or Pipe behind
\s*
( [^|]*? ) # (1), Optional non-pipe chars
\s*
(?= \| | $ ) # Pipe ahead or EOS
Here it is in a Capture Collection configuration.
(?:(?:^|\|)\s*([^|]*?)\s*(?=\||$))+
https://regex101.com/r/KMNc47/3
Formatted
(?:
(?: ^ | \| ) # BOS or Pipe
\s*
( [^|]*? ) # (1), Optional non-pipe chars
\s*
(?= \| | $ ) # Pipe ahead or EOS
)+

Regex matching 2 similar phrases

I am trying to match to automatically grade a student's answer to a question where the correct answer is:
read and execute for owner and read for everyone
The order of their answer doesn't matter so
read for everyone and read and execute for owner
is an acceptable answer. Then all the fluff (and, for) doesn't matter so I am really just looking for either of these
read execute owner read everyone
read everyone read execute owner
I can get a regex to accept either answer
(?=.*read.*execute.*owner)(?=.*read.*everyone)
But obviously that accepts more answer that are clearly wrong like "read execute owner read execute everyone". So I tried using the negative look-ahead for "execute" with everyone, but then it still matches the "execute" for owner and says no regex match.
Is there are way to accomplish what I am trying to do? Thanks.
Just make the and/for optional.
# (?=.*(\bread(?:\s+and)?\s+execute(?:\s+for)?\s+owner\b))(?=.*(\bread(?:\s+for)?\s+everyone\b))
(?=
.*
( # (1 start)
\b read
(?: \s+ and )?
\s+ execute
(?: \s+ for )?
\s+
owner \b
) # (1 end)
)
(?=
.*
( # (2 start)
\b read
(?: \s+ for )?
\s+ everyone \b
) # (2 end)
)
Edit: You could also allow for optionally any words between the
key words by excluding all the keywords from between the keywords.
Like this -
# (?=.*(\bread(?:\s+(?:(?!\b(?:read|execute|owner|everyone)\b).)+?)?\s+execute(?:\s+(?:(?!\b(?:read|execute|owner|everyone)\b).)+?)?\s+owner\b))(?=.*(\bread(?:\s+(?:(?!\b(?:read|everyone|execute|owner)\b).)+?)?\s+everyone\b))
(?=
.*
( # (1 start)
\b read
(?:
# Optional words Between Keywords -
# not any of this or the other ones keywords
\s+
(?:
(?!
\b
(?:
read # this
| execute # this
| owner # this
| everyone # other
)
\b
)
.
)+?
)?
\s+ execute
(?:
# Optional words Between Keywords
\s+
(?:
(?!
\b
(?:
read
| execute
| owner
| everyone
)
\b
)
.
)+?
)?
\s+
owner \b
) # (1 end)
)
(?=
.*
( # (2 start)
\b read
(?:
# Optional words Between Keywords -
# not any of this or the other ones keywords
\s+
(?:
(?!
\b
(?:
read # this
| everyone # this
| execute # other
| owner # other
)
\b
)
.
)+?
)?
\s+ everyone \b
) # (2 end)
)

Regex expression to search for token not in C++ comment

I have a code base of thousands of files and need to grep for headers that have a certain token Q_OBJECT present but not in a comment. This includes single line // comments and multi-line /* ... */ comments.
What is the regex expression for this search?
This should work.
Do a global search, it will return if it matches
either:
Comments group 1
Quoted strings, or Non-token text group 2
Token text group 3
You just care if capture group 3 matched, it contains the token.
# (/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|(?!Q_OBJECT)[\S\s](?:(?!Q_OBJECT)[^/"'\\])*)|(Q_OBJECT)
# '(/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\\]|\\\\\n?)*?\n)|("(?:\\\[\S\s]|[^"\\\])*"|\'(?:\\\[\S\s]|[^\'\\\])*\'|(?!Q_OBJECT)[\S\s](?:(?!Q_OBJECT)[^/"\'\\\])*)|(Q_OBJECT)'
# "(/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\\\n?)*?\\n)|(\"(?:\\\\[\\S\\s]|[^\"\\\\])*\"|'(?:\\\\[\\S\\s]|[^'\\\\])*'|(?!Q_OBJECT)[\\S\\s](?:(?!Q_OBJECT)[^/\"'\\\\])*)|(Q_OBJECT)"
( # (1 start), Comments
/\* # Start /* .. */ comment
[^*]* \*+
(?: [^/*] [^*]* \*+ )*
/ # End /* .. */ comment
|
// # Start // comment
(?: [^\\] | \\ \n? )*? # Possible line-continuation
\n # End // comment
) # (1 end)
|
( # (2 start), Non - comments
"
(?: \\ [\S\s] | [^"\\] )* # Double quoted text
"
| '
(?: \\ [\S\s] | [^'\\] )* # Single quoted text
'
|
(?! Q_OBJECT )
[\S\s] # Any other char, but not these special tokens
# Chars which doesn't start a comment, string, escape,
# or line continuation (escape + newline)
(?: # But not these special tokens
(?! Q_OBJECT )
[^/"'\\]
)*
) # (2 end)
|
( # (3 start), Special Tokens
Q_OBJECT
) # (3 end)

Splitting string having special characters, words, numbers and URL

I have a .txt file which contains:
"'the url address i checked is: https://www.google.com/ for 2times and it's awesome!."
After parsing, the expected output should be:
['"',"'",'the','url','address','i','checked','is',':','https://www.google.com/','for','2','times','and',"it's",'awesome','!','.','"']
How do I split this list to get the output using the re module.
I came up with this pattern:
pattern = re.compile(r"\d+|[a-zA-Z]+[a-zA-Z']*|[^\w\s]")
but this is also splitting my URL.
Can any one please help?
Just pick a url regex from somewhere and make it first in the alternations.
An example only -
# (?!mailto:)(?:(?:https?|ftp)://)?(?:\S+(?::\S*)?#)?(?:(?:(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))|localhost)(?::\d{2,5})?(?:/[^\s]*)?|\d+|[a-zA-Z]+[a-zA-Z']*|[^\w\s]
(?! mailto: )
(?:
(?: https? | ftp )
://
)?
(?:
\S+
(?: : \S* )?
#
)?
(?:
(?:
(?:
[1-9] \d?
| 1 \d\d
| 2 [01] \d
| 22 [0-3]
)
(?:
\.
(?: 1? \d{1,2} | 2 [0-4] \d | 25 [0-5] )
){2}
(?:
\.
(?:
[1-9] \d?
| 1 \d\d
| 2 [0-4] \d
| 25 [0-4]
)
)
| (?:
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)
(?:
\.
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)*
(?:
\.
(?: [a-z\u00a1-\uffff]{2,} )
)
)
| localhost
)
(?: : \d{2,5} )?
(?: / [^\s]* )?
| \d+
| [a-zA-Z]+ [a-zA-Z']*
| [^\w\s]
Outputs:
['"',"'",'the','url','address','i','checked','is',':','https://www.google.com/','for','2','times','and',"it's",'awesome','!','.','"']