Regex that matches following patterns
1. mrrtjjsf8907m5q29ui
2. 0?userid=y1arx6uxb1nidmz3tguv
3. bryj9itvwjmbyv3wg8ef
I am trying to pass these values to another variable col=?([a-zA-Z0-9]{1,20})|([a-zA-Z0-9]{1,20})
it is taking the right values for first and three for second one it is taking values 0
instead it should take y1arx6uxb1nidmz3tguv
I think you need to use this regex instead:
[a-zA-Z](?=[A-Za-z]*\d)[a-zA-Z0-9]{1,19}
Demo
This will make sure that the bunch of characters that you are going to match is only built of alphabets and numeber thus 0 Or userid are not considered as match!
Alternative:
But the above may not consider the case where a valid sequence may start with number instead of alphabet. In that case you may use the following regex which will consider both situation:
(?:[a-zA-Z](?=[A-Za-z]*\d)|\d(?=\d*[A-Za-z]))[a-zA-Z0-9]{1,19}
Demo 2
It appears you are not wanting to match anything before an equal sign =. You can use the line terminator $ to ensure that it will match your your characters and stop at any non-matching characters.
([a-zA-Z0-9]{1,20})$
DEMO
Related
I have a simple question.
I need a regular expression to match a hexdecimal number without colon at the end.
For example:
0x85af6b9d: 0x00256f8a ;some more interesting code
// dont match 0x85af6b9d: at all, but match 0x00256f8a
My expression for hexdecimal number is 0[xX][0-9A-Fa-f]{1,8}
Version with (?!:) is not possible, because it will just match 0x85af6b9 (because of the {1,8} token)
Using a $ also isn't possible - there can be more numbers than one
Thanks!
Here is one way to do so:
0[xX][0-9A-Fa-f]{1,8}(?![0-9A-Fa-f:])
See the online demo.
We use a negative lookahead to match all hexadecimal numbers without : at the end. Because of {1,8}, it is also necessary to ensure that the entire hexadecimal number is correctly matched. We therefore reuse the character set ([0-9A-Fa-f]) to ensure that the number does not continue.
Lets say I need to match a pattern if it appears 3 or 6 times in a row. The closest I can get is something like \d{3,6} but that doesn't quite do what I need.
'123' should match
'123456' should match
'1234' should not match
^(\d{3}|\d{6})$
You have to have some sort of terminator otherwise \d{3} will match 1234. That's why I put ^ and $ above. One alternative is to use lookarounds:
(?<!\d)(\d{3}|\d{6})(?!\d)
to make sure it's not preceded by or followed by a digit (in this case). More in Lookahead and Lookbehind Zero-Width Assertions.
How about:
(\d\d\d){1,2}
although you'll also need guards at either end which depend on your RE engine, something like:
[^\d](\d\d\d){1,2}[^\d]
or:
^(\d\d\d){1,2}$
For this case we can get away with this crafty method:
Clean Implementation
/(\d{3}){1,2}/
/(?:\d{3}){1,2}/
How?!
This works because we're looking for multiples of three that are consecutive in this case.
Note: There's no reason to capture the group for this case so I add the ?: non capture group flag to the capture group.
This is similar to paxdiablo implementation, but slightly cleaner.
Matching Hex
I was doing something similar for matching on basic hex colors since they could be 3 or 6 in length. This allowed me to keep my hex color checker's matching DRY'd up ie:
/^0x(?:[\da-f]{3}){1,2}$/i
First one matches 3, 6 but also 9, 12, 15, .... Second looks right. Here's one more twist:
\d{3}\d{3}?
Is there a way to match a fixed number of characters in a fixed length string via regex?
Example, I want to match all strings where the length of string is 5 and there are exactly 3 alphabets and 2 exclamations (!). The exclamations can be anywhere in the string.
Example matches: abc!!, a!b!c, !!abc, a!!bc
I tried to match using lookahead but I wasn't able to limit the length. The following was the regex I used.
(?=\w*!\w*!\w*)[\w!]{5}
This matches a!!!b and a!!!! as well which I don't want.
You can do this using a lookahead based regular expression.
^(?=(?:\w*!){2}\w*$)[\w!]{5}$
Live Demo
Probably easiest to just specify all possibilities.
(?=\w\w\w!!|\w\w\!\w\!|\w\w\!!\w|\w!\w\w!|\w!\w!\w|\w!!\w\w|!\w!\w\w|!!\w\w\w)
Regex doesn't work well with combinations/permutations.
If the number of combinations is too large, do it in parts where the first regex gathers potential matches and the second (and beyond) continue to validate it.
[\w!]{5}
match.count('!') == 2
match.count('\w') == 3
(that isn't valid code -- just a concept)
I'm trying to make a single pattern that will validate an input string. The validation rule does not allow any character to be repeated more that 3 times in a row.
For example:
Aabcddee - is valid.
Aabcddde - is not valid, because of 3 d chracters.
The goal is to provide a RegExp pattern that could match one of above examples, but not both. I know I could use back-references such as ([a-z])\1{1,2} but this matches only sequential characters. My problem is that I cannot figure out how to make a single pattern for that. I tried this, but I don't quite get why it isn't working:
^(([a-z])\1{1,2})+$
Here I try to match any character that is repeated 1 or 2 times in the internal group, then I match that internal group if it's repeated multiple times. But it's not working that way.
Thanks.
To check that the string does not have a character (of any kind, even new line) repeated 3 times or more in a row:
/^(?!.*(.)\1{2})/s
You can also check that the input string does NOT have any match to this regex. In this case, you can also know the character being repeated 3 times or more in a row. Notice that this is exactly the same as above, except that the regex inside the negative look-ahead (?!pattern) is taken out.
/^.*(.)\1{2}/s
If you want to add validation that the string only contains characters from [a-z], and you consider aaA to be invalid:
/^(?!.*(.)\1{2})[a-z]+$/i
As you can see i flag (case-insensitive) affect how the text captured is compared against the current input.
Change + to * if you want to allow empty string to pass.
If you want to consider aaA to be valid, and you want to allow both upper and lower case:
/^(?!.*(.)\1{2})[A-Za-z]+$/
At first look, it might seem to be the same as the previous one, but since there is no i flag, the text captured will not subject to case insensitive matching.
Below is failed answer, you can ignore it, but you can read it for fun.
You can use this regex to check that the string does not have 3 repeated character (of any kind, even new line).
/^(?!.*(.)(?:.*\1){2})/s
You can also check that the input string does NOT have any match to this regex. In this case, you can also know the character being repeated more than or equal to 3 times. Notice that this is exactly the same as above, except that the regex inside the negative look-ahead (?!pattern) is taken out.
/^.*(.)(?:.*\1){2}/s
If you want to add validation that the string only contains characters from [a-z], and you consider aaA to be invalid:
/^(?!.*(.)(?:.*\1){2})[a-z]+$/i
As you can see i flag (case-insensitive) affect how the text captured is compared against the current input.
If you want to consider aaA to be valid, and you want to allow both upper and lower case:
/^(?!.*(.)(?:.*\1){2})[A-Za-z]+$/
At first look, it might seem to be the same as the previous one, but since there is no i flag, the text captured will not subject to case insensitive matching.
From your question I get that you want to match
only strings consisting of chars from [A-Za-z] AND
only strings which have no sequence of the same character with a length of 3 or more
Then this regexp should work:
^(?:([A-Za-z])(?:(?!\1)|\1(?!\1)))+$
(Example in perl)
I need to have "or" logic in my regexp.
For example, from "foobar435" I would need the three numbers, so "435"
But from "barfoo543" I would need the three letters before the three numbers, so "foo"
Individually, the regexes would be "foobar([0-9]){3}" to get the first case, and "[a-zA-Z]{3}([0-9]{3})[a-zA-Z]{3}" to get the second case. How do I get both cases at once with one regexp? So, if the first regexp matches then return "435", but if not, return "foo"?
I am using hive so ideally I want to make one call only. So far I have...
REGEXP_EXTRACT(myString, 'foobar([0-9]){3}', 1) AS columnName
Not sure how to add the second case into this. Thanks!
You can use lookarounds for this.
In your first case, you want to match three digits preceded by "foobar" (use lookbehind):
(?<=foobar)[0-9]{3}
In your second case, you want to match three letters preceded by three letters (use lookbehind) and followed by three digits (use lookahead):
(?<=[a-zA-Z]{3})[a-zA-Z]{3}(?=\d{3})
Note that, if I interpreted your requirements correctly, it looks like you flipped the numeric part with the second alpha part in your expression.
Now that you have your two expressions, you just need to combine them with an 'or':
(?<=foobar)[0-9]{3}|(?<=[a-zA-Z]{3})[a-zA-Z]{3}(?=\d{3})
One thing to be aware of is that this will also match words with additional word characters on either end, ie "xfoobar435x". If this is undesirable, add a word boundary \b to the beginnings of the lookbehinds and to the end of the lookahead.