XML Regex - Negative match - regex

I have a problem with negative lookahead in XSD pattern.
When I specified:
<xs:pattern value="^(?!(00|\+\d))\d{6,}$"/>
then I got an error message:
Value '^(?!(00|\+\d))\d{6,}$' is not a valid XML regular expression.
Any idea why it does not work?
In online javascript validator it works fine (e.g. here under unit tests section click on "run test").
I need to validate phone numbers. The phone number cannot include international prefixes (+\d) and (00).
Thanks

Try the following regex:
[1-9][0-9]{5,} | 0[1-9][0-9]{4,}
This matches a number which does not begin with zero and is followed by any digit (including zero) 5 or more times, and it also matches a number which starts with zero and is not immediately followed by zero, but after that can have 0-9.

I will add my deleted comment as an answer:
([1-9][0-9]|[0-9][1-9])[0-9]{4,}
See the regex demo.
The regex should work well for your scenario because
([1-9][0-9]|[0-9][1-9]) - matches either 1 digit from 1-9 ranges and any digit after or (|) any 1 digit followed with any digit but 0 - making up 2 digits
[0-9]{4,} - matches 4 and more any digits.
This pattern only matches a full/entire string because all regex patterns inside XSD pattern are anchored by default (so, you do not have to and can't enclose the pattern with ^ and $).
Right, there is no lookaround support in XSD regex (no lookaheads, nor lookbehinds). Besides, XSD regex has other interesting limitations/features:
^ and $ anchors
Non-capturing groups like (?:...) (use capturing ones instead)
/ should not be escaped, do not use \/
\d should be written as [0-9] to only match ASCII digits (same as in .NET)
Back-references like \1, \2 are not supported.
No word boundaries are supported either.
See some more XSD regex description at regular-expressions.info.

Related

Regex match 10 characters after second pattern

I would like to match 10 characters after the second pattern:
My String:
www.mysite.de/ep/3423141549/ep/B104RHWZZZ?something
What I want to be matched:
B104RHWZZZ
What the regex currently matches:
B104RHWZZZ?something
Currently, my Regex looks like this:
(?<=\/ep\/)(?:(?!\/ep\/).)*$.
Could someone help me to change the regex that it only matches 10 characters after the second "/ep/" ("B104RHWZZZ")?
It depends on which characters you allow to match. If you want to allow 10 non whitspace characters characters not being / or ? then you could use;
(?<=\/ep\/)[^\/?\s]{10}(?=[^\/\s]*$)
Explanation
(?<=\/ep\/) Assert /ep/ directly to the left
[^\/?\s]{10} Match 10 times any non whitespace character except for / and ?
(?=[^\/\s]*$) Assert no more occurrence of / to the right
Regex demo
Or matching 1+ chars other than / ? & instead of exactly 10:
(?<=\/ep\/)[^\/?&\s]+(?=[^\/\s]*$)
Regex demo
This would match the string as matching group 1:
ep\/\w+\/ep\/(\w+)
https://regex101.com/r/9tUjxG/1
While lookarounds can make this expression more sophisticated so that you won't require matching groups, it makes (in my experiences) the expression hard to read, understand and maintain/extend.
That's why I would always keep regexes as simple as possible.

Optional thousand-separator processes incomplete string

I need to process numbers that may have optional thousand-separators, such as 1234567 and 1,234,567
I naively assumed I could achieve this with
(\d{1,3}([,]?(\d{3}))*)
This, however, matches only 123456 (not the 7) and 1,234,567 (correctly)
However, if I specify an explicit number of matches (2 in this case)
(\d{1,3}([,]?(\d{3})){2})
or a bound (such as \b)
(\d{1,3}([,]?(\d{3}))*)\b
the full match is performed.
Why does the “greedy” * quantifier stop after the first match in the first regex?
If you want to match both numbers with, and without, proper comma thousands separators, then I would use an alternation:
^(\d{1,3}(?:,\d{3})*|\d+)$
Demo
The reason is that \d{1,3} is greedy, so it matches 123 at the beginning of the number. Then the rest of the regexp will only match groups of exactly 3 digits because it uses \d{3}. A regular expression doesn't try to match the longest possible string, so it won't backtrack and shorten the match for \d{1,3} to make the rest of the regexp go further.
But if you add a word boundary \b at the end, it no longer matches with that 3-digit prefix. That causes it to backtrack until it's able to match groups of 3 digits ending with a word boundary.

Regex filtering and excluding forbidden words

Trying to fulfill these requirements:
Alphanumeric allowed [a-zA-z0-9] or \w+
Only numbers NOT allowed
At least 8 characters \S{8,}
Forbidden words: Test, pimba, vraw ^(!?.*Test|pimba|vraw).*$ or \b(?:(?!word)\w)+\b
The problem is I can't mix it all together.
Documentation read: Mozila - Character Classes, Group and Ranges,
indicative Regex,
I'm using https://regex101.com/ to try the regex validation.
Tries:
\b(?:(?!word)\w)+\b(\S{8,})
^(?=\S*\w+)(\S{8,})\b$
^(?!.pimba|vraw|\d{8}).$
^(?=\S*\w+)(\S{8,})+(!?.*Test)$
You may use this regex:
^(?!\d+$)(?!.*(?:Test|pimba|vraw))\w{8,}$
RegEx Demo
RegEx Details:
^: Start
(?!\d+$): Negative lookahead to fail the match if we have all digits
(?!.*(?:Test|pimba|vraw)): Negative lookahead to fail the match if any of those substrings appear anywhere in input
\w{8,}: Match 8 or more word characters
$: End

RegEx: Excluding a pattern from the match

I know some basics of the RegEx but not a pro in it. And I am learning it. Currently, I am using the following very very simple regex to match any digit in the given sentence.
/d
Now, I want that, all the digits except some patterns like e074663 OR e123444 OR e7736 should be excluded from the match. So for the following input,
Edit 398e997979 the Expression 9798729889 & T900980980098ext to see e081815 matches. Roll over matches or e081815 the expression e081815 for details.e081815 PCRE & JavaScript flavors of RegEx are e081815 supported. Validate your expression with Tests mode e081815.
Only bold digits should be matched and not any e081815. I tried the following without the success.
(^[e\d])(\d)
Also, going forward, some more patterns needs to be added for exclusion. For e.g. cg636553 OR cg(any digits). Any help in this regards will be much appreciated. Thanks!
Try this:
(?<!\be)(?<!\d)\d+
Test it live on regex101.com.
Explanation:
(?<!\be) # make sure we're not right after a word boundary and "e"
(?<!\d) # make sure we're not right after a digit
\d+ # match one or more digits
If you want to match individual digits, you can achieve that using the \G anchor that matches at the position after a successful match:
(?:(?<!\be)(?<=\D)|\G)\d
Test it here
Another option is to use a capturing group with lookarounds
(?:\b(?!e|cg)|(?<=\d)\D)[A-Za-z]?(\d+)
(?: Non capture group
\b(?!e|cg) Word boundary, assert what is directly to the right is not e or cg
| Or
(?<=\d)\D Match any char except a digit, asserting what is directly on the left is a digit
) Close group
[A-Za-z]? Match an optional char a-zA-Z
(\d+) Capture 1 or more digits in group 1
Regex demo

Visual Studio replace magic number integers with doubles

Can a regular expression be used to find all magic number integers in Visual Studio and convert them to doubles?
The regular expression to find them is beyond my skill, here is what I came up with so far:
(?!=[\s\(])(?<!\.)\d+(?=[\s\)])
This will find all the integers, but erroneously matches
15.527 = "1"5.5"2""7"
Even with a perfect match, is there a way to replace with same number? For example 7 would be replaced by 7.0 and 16 would be replaced by 16.0
It seems the pattern I suggested works in almost 99% of cases, so, let me explain the (?<![0-9]\.)\b[0-9]+\b(?!\.) pattern:
(?<![0-9]\.) - a negative lookbehind failing the match if there is a digit and a . symbols right before the digit matched with [0-9]+ pattern
\b - a leading word boundary (i.e. the previous char cannot be a letter/digit/underscore)
[0-9]+ - 1 or more digits
\b - a trailing word boundary (i.e. the next char after the last matched digit cannot be a letter/digit/underscore)
(?!\.) - a negative lookahead that fails the match if the next char after the last digit matched by the [0-9]+ subpattern is followed with ..
Why [0-9]+ instead of \d? See this SO thread, \d is Unicode aware and matches more than just 0..9 digits.
See the pattern demo.
Using the regex provided by #Wiktor Stribiżew
Close all documents
Open the problematic documents
Ctrl+h
Set search to regex
Set scope to All Open Documents
Search for (?<![0-9]\.)\b[0-9]+\b(?![.0-9])
Replace with $&.0
Replace all
Review fix any strings that should not have been affected