Regular Expression should not match other prefixes - regex

if for instance I have these words
john=14
adam=21
ben=11
john=18
johan=17
john=141
...
and the task is to find all occurences of john=14.
I came up with the following regular expression: .*=[^14].*\n which matches every string without a leading 1 after the equal sign.
However, I want to exactly match only john=14 in this example (and also for permutations of this example). It doesn't matter if there are one or more john=14. I thought about negation of the regular expression, such that I want to find every string that isn't equal to the one I want to find but I had a problem with the regular expression ([^\bjohn\b=14]\n).
Any help would be appreciated :)!

You need to use negative lookahead.
^(?!john=14$).*
Negative lookahead at the start asserts that the string going to be matched won't contain the exact john=14 string. If yes then match all the chars.
or
^(?!.*=14$).*

Related

Regex not select word with character at the end

I have a simple question.
I need a regular expression to match a hexdecimal number without colon at the end.
For example:
0x85af6b9d: 0x00256f8a ;some more interesting code
// dont match 0x85af6b9d: at all, but match 0x00256f8a
My expression for hexdecimal number is 0[xX][0-9A-Fa-f]{1,8}
Version with (?!:) is not possible, because it will just match 0x85af6b9 (because of the {1,8} token)
Using a $ also isn't possible - there can be more numbers than one
Thanks!
Here is one way to do so:
0[xX][0-9A-Fa-f]{1,8}(?![0-9A-Fa-f:])
See the online demo.
We use a negative lookahead to match all hexadecimal numbers without : at the end. Because of {1,8}, it is also necessary to ensure that the entire hexadecimal number is correctly matched. We therefore reuse the character set ([0-9A-Fa-f]) to ensure that the number does not continue.

Regex repetition

I'm trying to perform a regex replacement. Therefor I defined the following expression:
^(?:9903[0]*([0-9]*)){20}$
This Expression should match to
99030000000000000001
99030000000000000011
99030000000000000111
99030000000000001111
99031111111111111111
but not to
9903111111111111111
In fact, the expression above does not work until I either use {1,20} as quantifier or remove it completely. But as I want to check the length of the whole string without knowing the length of [0]* nor the length of the variable, there's something wrong with my expression.
Many thanks for your help in advance.
D
There are multiple things wrong with your initial regex.
The {} part is applied to en entire part between brackets. So your current regex requires that this part:
(?:9903[0]*([0-9]*))
Is repeated 20 times in its entirety, which is not what you want.
Then this part:
[0]*([0-9]*)
Makes little sense, do you want to capture the number after 9903 without the leading zeros? Then require that the capture starts with a non-zero number. [0] is a character class with just one character, equivalent to just 0.
Concluding, I would do it like so:
^9903(?=[0-9]{16}$)0*([1-9][0-9]*)$
Regex101
Edit: I realized later that if it's required to match 99030000000000000000 (get 0 in your capture group) then you need this:
^9903(?=[0-9]{16}$)0*([0-9]+)$
Regex101
You can do that by checking previously the length of the string with a lookahead:
^(?=[0-9]{20}$)99030*[0-9]*
The lookahead (?=...) is a zero-width assertion that checks what follows in the string. Here it is checking there are exactly 20 digits before the end of the string.

Weird in a regular expression

I tried the following regular expression:
Pattern: ((.[^[0-9])+)(([0-9]{1,3}([.][0-9]{3})+)|([0-9]+))
My goal is to match any string (excluding digit number) followed by a specified number, e.g. MG2999, dasdassa33232
I used the above regular expression.
It's weird as follows:
V375 (not matched)
Vv375 (matched)
Vvv375 (not matched, but first character is not matched)
Vvvv375 (matched)
...
I don't understand why the first character is never matched. May I need your help?
For your quick test, please try: http://regex101.com/
Thanks in advance!
--
Vu
(.[^[0-9])+) matches any character (.), followed by any character except digits and [, repeatedly.
You probably want [^0-9]+ here – or, simpler, \D+.
The rest of there regular expression has similar problems but since I don’t know the number format you want to match I cannot correct that.

Can't use regular expression to match exact string

Given a string below:
String s = "sschk##123456sschk##123456gme##100&200&300&1,2,3,4,5$6,7,8,9,0sschk##123456";
I apply a pattern, sschk##\\d+? or sschk##.+? want to get all sschk##123456 and replace them with an empty string. Please note that number after sschk## might different each time I got it, for example sschk##321321.
But I only got
[sschk##1, sschk##1, sschk##1]
What pattern should I apply to get exact each sschk##123456, so that I can do find and replace later.
Thanks a lot.
The problem with your regex was that you have used "?" marker which toggles the greediness of the "+" in your regex, so your regex "sschk##\d+?" means "a string sschk## followed by 1 or more numbers, but match as less digits as possible". Removing "?" would mean "a string sschk## followed by 1 or more numbers (match as much digits as possible)"
Your regex statement might look like this perhaps: sschk##\\d{6} and it would match a string "sschk##" followed by exactly 6 digits. If you want to match the string "sschk##" followed with variable length of digits, but not more than 6, you might use sschk##\\d{1,6}. If you need to match any number of digits after the string "sschk##" then use sschk##\\d+
I think I got it done.
Just apply the pattern like this
(sschk##\\d+)

String negation using regular expressions

Is it possible to do string negation in regular expressions? I need to match all strings that do not contain the string "..". I know you can use ^[^\.]*$ to match all strings that do not contain "." but I need to match more than one character. I know I could simply match a string containing ".." and then negate the return value of the match to achieve the same result but I just wondered if it was possible.
You can use negative lookaheads:
^(?!.*\.\.).*$
That causes the expression to not match if it can find a sequence of two periods anywhere in the string.
^(?:(?!\.\.).)*$
will only match if there are no two consecutive dots anywhere in the string.