I am using RegEx to assert the response of an API call, but it's currently a little too 'greedy' and ends up matching all kinds of responses. The RegEx bits are needed since the actual IDs in the response will be different each time.
The RegEx assertion is this:
{data:\[{"name":"Mat","~id":"(.*)"},{"name":"Laurie","~id":"(.*)"}\]},"something":true}
Which matches this correct response:
{data:[{"name":"Mat","~id":"4fd5ec146fc2ee0fff234234"},{"name":"Laurie","~id":"4fd5ec146fc2ee0fff234227"}]},"something":true}
as well as this incorrect response:
{data:[{"name":"Mat","~id":"4fd5ec146fc2ee0fff234234"},{"name":"Laurie","~id":"4fd5ec146fc2ee0fff234227"},{"name":"John","~id":"4fd5ec146fc2ee0fff234237"},{"name":"Paul","~id":"4fd5ec146fc2ee0fff234238"},{"name":"George","~id":"4fd5ec146fc2ee0fff234239"}]},"something":true}
The second (.*) is not just matching the ID of the second item, but it's matching the ID and all the other unwanted objects.
So I guess I need to make my RegEx be a little more strict when it comes to the ~id fields. Since the IDs will always be 24 hex characters, I'd like to replace the (.*) with something more appropriate.
I am writing this in Go, and therefore using Go's RegExp package.
And am using http://regexpal.com/ to test the RegEx
You can use [^"]*, [^"]{24} or [0-9a-fA-F]{24} instead of .* for your ID fields.
. (dot) in regular expression will match anything since a dot in RegEx is a special characteres that matches any single character (exception are newline characters).
You should use this RegEx to match always a 24 hex characters only:
^[A-Fa-f0-9]{24}$
Peace
Related
Apparently json schema doesn't like this regex: ^(?=.{1,63}$)([-a-z0-9]*[a-z0-9])?$
https://regex101.com/r/qsyUoQ/1
I get an error: pattern must be a valid regex. This error means the regex pattern I'm using is invalid according to json schema.
My regex seems to be valid for most other parsers though. and json schema supports positive and negative look aheads and capture groups: https://json-schema.org/understanding-json-schema/reference/regular_expressions.html
Is there some json schema specific escaping I need to do with my pattern?
I'm at a loss to see what it doesn't like about my regex.
The regex I want will do the following:
Allow lower case chars, numbers and "-"
Can start with but not end with "-"
Max length of string cannot exceed 63 chars
You could simplify the pattern to use the character classes and quantifiers without using the lookahead and the capture group.
You can change the quantifiers, matching 0-62 chars allowing the - and a single char without the - as a single char would also mean that it is at the end.
^[-a-z0-9]{0,62}[a-z0-9]$
Regex demo
using: this tool to evaluate my expression
My test string: "Little" Timmy (tim) McGraw
my regex:
^[()"]|.["()]
It looks like I'm properly catching the characters I want but my matches are including whatever character comes just before the match. I'm not sure what, or if anything, I'm doing wrong to be catching the preceding characters like that? The goal is to capture characters we don't want in the name field of one of our systems.
Brief
Your current regex ^[()"]|.["()] says the following:
^[()"]|.["()] Match either of the following
^[()"] Match the following
^ Assert position at the start of the line
[()"] Match any character present in the list ()"
.["()] Match the following
. Match any character (this is the issue you were having)
["()] Match any character present in the list "()
Code
You can actually shorten your regex to just [()"].
Ultimately, however, it would be much easier to create a negated set that determines which characters are valid rather than those that are invalid. This approach would get you something like [^\w ]. This means match anything not present in the set. So match any non-word and non-space characters (in your sample string this will match the symbols ()" since they are not in the set).
I am using ^[0-9()- ]+$ as regular expression to validate Phone number.
Basically I want to allow only numbers, hypen & both braces i.e. ( ).
I have added this in the model level attribute (in MVC3.0).
After giving a valid string (say 5299912548), its accepting, but in the view its throwing error as "parsing "^[0-9()- ]+$" - [x-y] range in reverse order.".
Is there a problem in the Regex used or some problem with other MVC3 stuff?
^[0-9()\- ]+$
You need to escape the hyphen - it's a range indicator otherwise.
You could also do this:
^[0-9() -]+$
The hyphen and space have been switched. Hyphen placement in regex has bugged me before, and I sometimes need to shuffle the position in these situations.
If anyone can enlighten me as to why this is, I'd appreciate it.
But this will fix this issue.
edit:
Research reveals the answer. Some flavors of regex allow the hyphen to be first or last and still be interpreted literally.
The problem is with this part:
[/-.]
That means "the range of characters from '/' to '.'" - but '/' comes after '.' in Unicode, so the range makes no sense.
If you wanted it to mean "slash, dash or period" then you want:
[/\-.]
In other words, you need to escape the dash. Note that if this is in a regular C# string literal, you'll need to perform another level of escaping too:
string pattern = "[/\\-.]";
Using a verbatim string literal means you don't need to escape the backslash:
string pattern = #"[/\-.]";
Alternatively, you can just put the dash at the start:
[-/.]
or end:
[/.-]
I could need some help on the following problem with regular expressions and would appreciate any help, thanks in advance.
I have to split a string by another string, let me call it separator. However, if an escape sequence preceeds separatorString, the string should not be split at this point. The escape sequence is also a string, let me call it escapeSequence.
Maybe it is better to start with some examples
separatorString = "§§";
escapeSequence = "###";
inputString = "Part1§§Part2" ==> Desired output: "Part1", "Part2"
inputString = "Part1§§Part2§§ThisIs###§§AllPart3" ==> Desired output: "Part1", "Part2", "ThisIs###§§AllPart3"
Searching stackoverflow, I found Splitting a string that has escape sequence using regular expression in Java and came up with the regular expression
"(?<!(###))§§".
This is basically saying, match if you find "§§", unless it is preceeded by "###".
This works fine with Regex.Split for the examples above, however, if inputString is "Part1###§§§§Part2" I receive "Part1###§", "§Part2" instead of "Part1###§§", "Part2".
I understand why, as the second "§" gives a match, because the proceeding chars are "##§" and not "###". I tried several hours to modify the regex, but the result got only worse. Does someone have an idea?
Let's call the things that appear between the separators, tokens. Your regex needs to stipulate what the beginning and end of a token looks like.
In the absence of any stipulation, in other words, using the regex you have now, the regex engine is happy to say that the first token is Part1###§ and the second is §Part2.
The syntax you used, (?<!foo) , is called a zero-width negative look-behind assertion. In other words, it looks behind the current match, and makes an assertion that it must match foo. Zero-width just indicates that the assertion does not advance the pointer or cursor in the subject string when the assertion is evaluated.
If you require that a new token start with something specific (say, an alphanumeric character), you can specify that with a zero-width positive lookahead assertion. It's similar to your lookbehind, but it says "the next bit has to match the following pattern", again without advancing the cursor or pointer.
To use it, put (?=[A-Z]) following the §§. The entire regex for the separator is then
(?<!###)§§(?=[A-z]).
This would assert that the character following a separator sequence needs to be an uppercase alpha, while the characters preceding the separator sequence must not be ###. In your example, it would force the match on the §§ separator to be the pair of chars before Part2. Then you would get Part1###§§ and Part2 as the tokens, or group captures.
If you want to stipulate what a token is in the negative - in other words to stipulate the a token begins with anything except a certain pattern, you can use a negative lookahead assertion. The syntax for this is (?!foo). It works just as you would expect - like your negative lookbehind, only looking forward.
The regular-expressions.info website has good explanations for all things regex, including for the lookahead and lookbehind constructs.
ps: it's "Hello All", not "Hello Together".
How about doing the opposite: Instead of splitting the string at the separators match non-separator parts and separator parts:
/(?:[^§#]|§[^§#]|#(?:[^#]|#(?:[^#]|#§§)))+|§§/
Then you just have to remove every matched separator part to get the non-separator parts.
I'm searching the pattern (.*)\\1 on the text blabl with regexec(). I get successful but empty matches in regmatch_t structures. What exactly has been matched?
The regex .* can match successfully a string of zero characters, or the nothing that occurs between adjacent characters.
So your pattern is matching zero characters in the parens, and then matching zero characters immediately following that.
So if your regex was /f(.*)\1/ it would match the string "foo" between the 'f' and the first 'o'.
You might try using .+ instead of .*, as that matches one or more instead of zero or more. (Using .+ you should match the 'oo' in 'foo')
\1 is the backreference typically used for replacement later or when trying to further refine your regex by getting a match within a match. You should just use (.*), this will give you the results you want and will automatically be given the backreference number 1. I'm no regex expert but these are my thoughts based on my limited knowledge.
As an aside, I always revert back to RegexBuddy when trying to see what's really happening.
\1 is the "re-match" instruction. The question is, do you want to re-match immediately (e.g., BLABLA)
/(.+)\1/
or later (e.g., BLAahemBLA)
/(.+).*\1/