I have strings:
TH 8H 5C QS TC
9S 4S JS KS JS
I want the second one to be picked up by reqexp. Help me please to contract the necessary expression.
What I tried so far is: S{5} but of course it look up sequentially.
Could I avoid determining which character I am looking for. I need 5 repetition of any. Could it be like .{5} ?
Thanks in advance!
If you have standalone strings, use
^\wS(?: \wS){4}$
See the regex demo
If these strings appear inside a larger text, replace the ^ and $ anchors with word boundaries \b:
\b\wS(?: \wS){4}\b
See another demo
Note that \w matches any alphanumeric or underscore character. If there can be any non-whitespace character, use \S instead:
\b\SS(?: \SS){4}\b
One more demo
\SS will match a non-whitespace followed with an S and (?: \SS){4} will match 4 same sequences (thus, there will be 5 2-character sequences with S at the end of each).
Related
/[\w|A-Z]{1,3}[a-z]/g
but I want to match only the first 3 char of words.
For example:
I WANt THE FIRst 3 CHAr OF WORds ONLy.
It's for a rapid lector: only uppercase the begining of any words.
The best could be: (First 3 char)(Rest of the word or space)
https://regex101.com/r/PCi8Dn/2
Thank you !
Original answer
Use positive lookahead ((?=[pattern]) to match without including in the match.
[A-Z]{1,3}(?=[a-z])
appears to do what you want (if I've understood your spec correctly).
You can see it in action here.
New answer following clarification on spec
I think this does what you want:
(\S{1,3})(\S*[\s\.]+)
The breakdown is:
1st capturing group: (\S{1,3})
Matches a maximum of 3 non-space characters (\S used instead of \w because I think you want to match characters with diacritics like à and punctuation in the middle of words like '.
2nd capturing group: (\S*[\s\.]+)
Matches zero or more non-space characters (the remaining characters in each word) followed by one or more delimiter characters (space or period). I included period as a delimiter to match the last word. You might want to adjust that part depending on your exact needs.
See it in action here.
I'm trying to search for colons in a given string so as to split the string at the colon for preprocessing based on the following conditions
Preceeded or followed by a word e.g A Book: Chapter 1 or A Book :Chapter 1
Do not match if it is part of emoticons i.e :( or ): or :/ or :-) etc
Do not match if it is part of a given time i.e 16:00 etc
I've come up with a regex as such
(\:)(?=\w)|(?<=\w)(\:)
which satisfies conditions 2 & 3 but still fails on condition 3 as it matches the colon present in the string representation of time. How do I fix this?
edit: it has to be in a single regex statement if possible
You can use
(:\b|\b:)(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b)
See the regex demo. Details:
(:\b|\b:) - Group 1: a : that is either preceded or followed with a word char
(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b) - there should be no one or two digits right after : (followed with a word boundary) if the : is preceded with a single or two digits (preceded with a word boundary).
Note :\b is equal to :(?=\w) and \b: is equal to (?<=\w):.
If you need to get the same capturing groups as in your original pattern, replace (:\b|\b:) with (?:(:)\b|\b(:)).
More flexible solution
Note that excluding matches can be done with a simpler pattern that matches and captures what you need and just matches what you do not need. This is called "best regex trick ever". So, you may use a regex like
8:|:[PD]|\d+(?::\d+)+|(:\b|\b:)
that will match 8:, :P, :D, one or more digits and then one or more sequences of : and one or more digits, or will match and capture into Group 1 a : char that is either preceded or followed with a word char. All you need to do is to check if Group 1 matched, and implement required extraction/replacement logic in the code.
Word characters \w include numbers [a-zA-Z0-9_]
So just use [a-ZA-Z] instead
(\:)(?=[a-zA-Z])|(?<=[a-zA-Z])(\:)
Test Here
I have a simple regex like this [0-9a-zA-Z]{32,45} that matches 0-9,a-z,A-Z 32 to 45 times. Is there a way I can have the regex skip a certain range? For example, I don't want to match if there are 40 characters.
One way to do that would be
\b[0-9a-zA-Z]{32,39}+(?:[0-9a-zA-Z]{2,6})?\b
See proof. You match 32 to 39 occurrences possessively, then an optional occurrence of 2 to 6 repetitions of the pattern.
Another way could be using an alternation | repeating the character class either 41-45 times or 32-39 times.
You could prepend and append a word boundary \b to the pattern.
\b(?:[0-9a-zA-Z]{41,45}|[0-9a-zA-Z]{32,39})\b
Regex demo
I am Using .Net Flavor of Regex.
Suppose i have a string 123456789AB
and i want to match AB (Could be any two Capital letters) only if the string part containing numbers(123456789) has 5 and 8 in it.
So what i came up with was
(?=5)(?=8)([A-Z]{2})
But this is not working.
After some trail error on RegexStorm
I got to
(?=(.*5))(?=(.*8))[A-Z]{2}
What i am expecting is it will start matching from the start of the string as look ahead does not consume any characters.
But the part "[A-Z]{2}" does not move ahead to match AB in the input string.
My question is why is that so?
i know replacing it with .*[A-Z]{2} will make it move ahead but then the string matched has entire string in it.
What is the solution in this case other than putting word part ([A-Z]{2}) in a separate group and then catching only that group.
Lookaheads check for the pattern match immediately to the right of the current position in the string. (?=(.*5))(?=(.*8)) matches a location that is immediately followed with any 0 or more chars other than line break chars as many as possible and then 5 and then - at the same position - another similar check if performed but requiring 8 after any zero or more chars, as many as possible.
You may use as many as lookbehinds as there are required substrings before the two letters:
(?s)(?<=5.*?)(?<=8.*?)[A-Z]{2}
See the regex demo
Details
(?s) - makes the . match newline characters, too
(?<=5.*?) - a location that is immediately preceded with 5 and then 0 or more chars as few as possible
(?<=8.*?) - a location that is immediately preceded with 8 and then 0 or more chars as few as possible
[A-Z]{2} - two ASCII uppercase letters.
An alternative would be to "unfold" what you expect to match using exclusionary character classes and alternation of match order. Not pretty, but pretty fast:
(?<=\b[^58]*?(?:5[^8]*8|8[^5]*5)[^A-Z]*?)[A-Z]{2}
One of these days I'll learn regex.
I have the following filename
PE-run1000hbgmm3f1-job1000hbgmm3dt-Output-Workflow-1000hbgmm3fb-22.07.17.log
I'm able to get this to work so...
(?<logtype>[^-]+)-(?<run_id>[^-]+)-(?<job_id>[^-]+)-(?<capability>[^(0-9\.0-9\.0-9)]+)
logtype: PE
run_id: run1000hbgmm3f1
job_id: job1000hbgmm3dt
But I'm getting
capability: Output-Workflow-
...though I want it to be
capability: Output-Workflow-1000hbgmm3fb
...that is, all the text after the job_id up to the timestamp HH.mm.ss. Any help please? Thanks!
It is because you cannot negate a sequence of symbols with a negated character class. [^(0-9\.0-9\.0-9)] matches any single char other than (, digit, . and ).
You may replace your (?<capability>[^(0-9\.0-9\.0-9)]+) with (?<capability>.*?)-\d{2}\.\d{2}\.\d{2} to get the right value.
Now, the (?<capability>.*?)-\d{2}\.\d{2}\.\d{2} will match any 0+ chars (and capture them into "capability" group) as few as possible (since the *? is a lazy quantifier) up to the first occurrence of -, followed with 2 digits, and then 3 sequences of a dot (\.) followed with 2 digits.
See the regex demo at regex101.com.