I have a problem with my regular expression, I am trying to extract a string/number/whatever after a special string.
I have this string:
TEST 3098
There is 6 spaces between TEST and its value, but I am not quite sure if it is alway 6 spaces.
I am trying this regular expression (PCRE)
(?<=TEST\s\s\s\s\s\s).*?(?=\s)
The result should be 3098. With my regular expression, I get the right result, but it is not strong enough, if the number of spaces changes I won't be able to extract it.
The lookbehind should be in a limited size.
Any suggestions?
You may use
TEST\s*\K\S+
If the number of whitespaces should be set to some min/max number use a limiting quantifier, \s{2,} will match two or more, \s{1,10} will allow 1 to 10 whitespaces.
Details
TEST - TEST
\s* - 0 or more whitespaces
\K - match reset operator that omits the text matched so far from the overall match memory buffer
\S+ - 1+ non-whitespaces
See the regex demo
Related
I would like to match 10 characters after the second pattern:
My String:
www.mysite.de/ep/3423141549/ep/B104RHWZZZ?something
What I want to be matched:
B104RHWZZZ
What the regex currently matches:
B104RHWZZZ?something
Currently, my Regex looks like this:
(?<=\/ep\/)(?:(?!\/ep\/).)*$.
Could someone help me to change the regex that it only matches 10 characters after the second "/ep/" ("B104RHWZZZ")?
It depends on which characters you allow to match. If you want to allow 10 non whitspace characters characters not being / or ? then you could use;
(?<=\/ep\/)[^\/?\s]{10}(?=[^\/\s]*$)
Explanation
(?<=\/ep\/) Assert /ep/ directly to the left
[^\/?\s]{10} Match 10 times any non whitespace character except for / and ?
(?=[^\/\s]*$) Assert no more occurrence of / to the right
Regex demo
Or matching 1+ chars other than / ? & instead of exactly 10:
(?<=\/ep\/)[^\/?&\s]+(?=[^\/\s]*$)
Regex demo
This would match the string as matching group 1:
ep\/\w+\/ep\/(\w+)
https://regex101.com/r/9tUjxG/1
While lookarounds can make this expression more sophisticated so that you won't require matching groups, it makes (in my experiences) the expression hard to read, understand and maintain/extend.
That's why I would always keep regexes as simple as possible.
I would like to check all the strings with the format hostname abc_pqr_xyz in a file. Need a regex for this. There should be exactly 2 _'s and 3 words in the string.
I have tried using the regex ^hostname \s+.*_.*_.*
But it is giving a positive result for abc_abc_abc_abc_abc, as it considering abc_abc_abc as one word.
You may use a [^_] negated character class that matches any char but _ instead of .:
^hostname\s+[^_]*_[^_]*_[^_]*$
See the regex demo and a Regulex graph:
See $ at the end that checks the end of the string.
Also, a space before \s+ will require a space and then 1 or more whitespace chars, thus, that space may be harmful, that's why I removed it from the expression.
Note you may group the _[^_]* and then set the number of repetitions that you may adjust in the future:
^hostname\s+[^_]*(?:_[^_]*){2}$
See this regex demo.
I am trying to write some regex that will match a string that contains 4 or more letters in it that are not necessarily in sequence.
The input string can have a mix of upper and lowercase letters, numbers, non-alpha chars etc, but I only want it to pass the regex test if it contains at least 4 upper or lowercase letters.
An example of what I would like to be a valid input can be seen below:
a124Gh0st
I have currently written this piece of regex:
(?(?=[a-zA-Z])([a-zA-Z])| )
Which returns 5 matches successfully but it will currently always pass as long as I have greater than 1 letter in the input string. if I add {4,} to the end of it then it works, but only in situations where there are 4 letters in a row.
I am using the following website to test what I have been doing: regex101
Any help on this would be greatly appreciated.
You may use
(?s)^([^a-zA-Z]*[A-Za-z]){4}.*
or
^([^a-zA-Z]*[A-Za-z]){4}[\s\S]*
See the regex demo.
Details:
^ - start of string
([^a-zA-Z]*[A-Za-z]){4} - exactly 4 sequences of:
[^a-zA-Z]* - 0+ chars other than ASCII letters
[A-Za-z] - an ASCII letter
[\S\s]* - any 0+ chars (same as .* if the DOTALL modifier is enabled).
Why don't you just match the zero or more characters between each letter? For example,
(?:[A-Za-z].*){4}
You'll recognize the [A-Za-z]. The . matches any character, so .* is a run of any number (including zero) of any character. The group of a letter followed by any number of any characters is repeated four times, so this pattern matches if and only if at least four letters appear in the string. (Note that the trailing .* of the fourth repeat of the pattern is mostly inconsequential, since it can match zero characters).
If you are using a regex language that supports reluctant quantifiers, then using them will make this pattern considerably more efficient. For example, in Java or Perl, one might prefer to use
(?:[A-Za-z].*?){4}
The .*? still matches any number of any character, but the matching algorithm will match as few characters as possible with each such run. This will reduce the amount of backtracking it needs to perform. For this particular pattern, it will reduce the needed backtracking to zero.
If you do not have reluctant quantifiers in your regex dialect, then you can achieve the same desirable effect a bit more verbosely:
(?:[A-Za-z][^A-Za-z]*?){4}
There, only non-letters are matched for the runs between letters.
Even with this, the pattern uses some regex features not present in all regex flavors -- non-capturing groups, enumerated quantifiers -- but these are present in your original regex. For a maximally-compatible form, you might write
[A-Za-z][^A-Za-z]*[A-Za-z][^A-Za-z]*[A-Za-z][^A-Za-z]*[A-Za-z]
Which regex needs to be used to extract 'Manchester City' from string.
String is:
Aston Villa - Manchester City
I tried -(.*)\w|-(.), but it grabs - .
Note that -(.*)\w|-(.) matches - since both the alternatives here start with matching a hyphen. You can usually check if something is present or not with a lookaround.
However, in this case, I'd suggest
-\s*\K[^-]+$
Since you need to only match the substring after the last - with spaces trimmed off, you need something like a negative infinite width lookbehind (?<=-\s*). However, in PCRE, infinite width lookbehind is not supported. Instead, there is a \K operator that makes the engine omit the whole match that was grabbed so far by the current pattern.
See a regex demo
Breakdown:
- - a literal hyphen
\s* - zero or more whitespace characters
\K - operator that resets (empties) all currently kept match buffer
[^-]+ - one or more characters other than - up to ...
$ - the end of the string.
The simplest is[code] . *- (. *) [/code] and your data is in $1 or \1 or something else that depends on your tool. That assume that data are in format xxxxx-xxxxxx
Another simple option is - (.*) see: https://regex101.com/r/fY3oE7/1. Use the first capturing group in your language to get the part after the dash.
I need a regular expression that matches only numbers of length 7 (they can have leading zeros). I used the following super easy regex: \b[0-9]{7}\b. However, this regex also matches numbers in e.g. 5254-6408499 and (0241)4013999 (see https://regex101.com/r/zF5hV7/1).
How can I prevent them from being matched? I only want numbers of length 7 having leading and/or trailing spaces.
Depending on the regular expression flavor, you could create your own boundaries:
(?<=^| )\d{7}(?= |$)
This asserts that either the beginning of the string or a space precedes moving on to matching exactly 7 digits only if the engine asserts that either a space or the end of string follows.
You can use this regex:
(?:^|\s)([0-9]{7})(?:\s|$)
and grab captured group #1
Updated RegEx Demo