Is there a special character to join to groups of rules in regex
I need to match the first 2 chars and the last 2 number in every row
This match the first 2 chars
(^..)
this match the last 2 numbers
([0-9][0-9]$)
How to join those 2 rules?
Tried that withou success
(^..)([0-9][0-9]$)
Well you need to match the parts in between as well. Just allow for arbitrarily many arbitrary characters:
(^..).*([0-9][0-9]$)
Note that in most flavors . does not match line breaks. If your input may contain line breaks, use the s ("single line" or sometimes "dotall") modifier, to change .s meaning. Otherwise (i.e. in JavaScript) use [\s\S]*.
Also note that it might be easier, more readable and more efficient to just use two regexes consecutively:
^..
[0-9][0-9]$
No need for grouping/capturing and repetition.
EDIT:
Note that these two aren't completely equivalent. The first one requires at least four characters (because the two characters matched by .. cannot be matched again by [0-9][0-9]) while the second one could just contain two digits (in which case the .. would match those same digits). It depends on which of these semantics you are looking for. A third solution that uses only one regex but is equivalent to the two-regex solution would use lookaheads:
^(?=(..))(?=.*([0-9][0-9])$)
This would allow you to match x12, the first capture being x1 and the second being 12.
Thanks for Alan Moore for pointing this out.
You need to add anything goes here - also known as .*
(^..).*([0-9][0-9]$)
(^..).*([0-9][0-9]$)
You can use the .* modifer to match 'everything in between'
If the row contains additional characters between the "first two" and the "last two", then you'll need something in the regex to match the intervening characters; something like:
(^..).*([0-9][0-9]$)
Related
I'm trying to detect a price in regex with this:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
This covers:
12
12.5
12.50
12,500
12,500.00
But if I pass it
12..50 or 12.5.0 or 12.0.
it still returns a match on the 12 . I want it to negate the entire string and return no match at all if there is more than one period in the entire string.
I've been trying to get my head around negative lookaheads for an hour and have searched on Stack Overflow but can't seem to find the right answer. How do I do this?
What you are looking for, is this:
^\d+(,\d{3})*(\.\d{1,2})?$
What it does:
^ Start of Line
\d+ one or more Digits followed by
(,\d{3})* zero, one or more times a , followed by three Digits followed by
(\.\d{1,2})? one or zero . followed by one or two Digits followed by
$ End of Line
This will only match valid Prices. The Comma (,) is not obligatory in this Regex, but it will be matched.
Look here: http://www.regextester.com/?fam=98001
If you work with Prices and want to store them in a Database I recommend saving them as INT. So 1,234,56 becomes 123456 or 1,234 becomes 123400. After you matched the valid price, all you have to do is to remove the ,s, split the Value by the Dot, and fill the Value of [1] with str_pad() (STR_PAD_RIGHT) with Zeros. This makes Calculations easier, in special when you work with Javascript or other different Languages.
Your regex:
^\-?[0-9]+(,[0-9]+)?(\.[0-9]+)?
Note: The regex you provided does not seem to work for 12 (without "."). Since you didn't add a quantifier after \., it tries to match that pattern literally (.).
While there are multiple ways to solve this and the most "correct" answer will depend on your specific requirements, here's a regex that will not match 12..1, but will match 12.1:
(^\-?[0-9]+(?:,[0-9]+)?(?:\.[0-9]+))+
I surrounded the entire regex you provided in a capturing group (...), and added a one or more quantifier + at the end, so that the entire regex will fail if it does not satisfy that pattern.
Also (this may or may not be what you want), I modified the inner groups into non-capturing groups (?: ... ) so that it does not return unnecessary groups.
This site offers a deconstruction of regexes and explains them:
For the regex provided: https://regex101.com/r/EDimzu/2
Unit tests: https://regex101.com/r/EDimzu/2/tests (Note the 12 one's failure for multiple languages).
You can limit it by requiring there is only 0 or 1 periods like this:
^[0-9,]+[\.]{0,1}?[0-9,]+$
Trying to match all of these:
{_someWord1} ... $1=someWord, $2=1
{_another82} ... $1=another, $2=82 (item in question)
{_testX} ... $1=test, $2=X
My regex: {_(\w+)(\d+|X)} matches all three, but the groups for the 2nd item are:
{_another82} ... $1=another8, $2=2
I'd like to be able to have any number of digits be in $2, and keep just the words in $1. Do I need to have a look ahead of some sort?
In most regex flavors, you could use ungreedy repetition, which consumes as little as possible (as opposed to the default - as much as possible):
{_(\w+?)(\d+|X)}
However, if the part before the digit, can never contain digits and underscores (which are included in \w) you could simply use a more specific character class:
{_([a-zA-Z]+)(\d+|X)}
Try using a non-greedy match (adding a ? after \w+) to consume as little as possible and still match:
{_(\w+?)(\d+|X)}
or if your language (unspecified) supports look-arounds, then:
{_(\w+)(?<=[a-zA-Z])(\d+|X)}
which asserts that the last character of group 1 must be a letter (although letters may appear elsewhere within group 1)
I need to have "or" logic in my regexp.
For example, from "foobar435" I would need the three numbers, so "435"
But from "barfoo543" I would need the three letters before the three numbers, so "foo"
Individually, the regexes would be "foobar([0-9]){3}" to get the first case, and "[a-zA-Z]{3}([0-9]{3})[a-zA-Z]{3}" to get the second case. How do I get both cases at once with one regexp? So, if the first regexp matches then return "435", but if not, return "foo"?
I am using hive so ideally I want to make one call only. So far I have...
REGEXP_EXTRACT(myString, 'foobar([0-9]){3}', 1) AS columnName
Not sure how to add the second case into this. Thanks!
You can use lookarounds for this.
In your first case, you want to match three digits preceded by "foobar" (use lookbehind):
(?<=foobar)[0-9]{3}
In your second case, you want to match three letters preceded by three letters (use lookbehind) and followed by three digits (use lookahead):
(?<=[a-zA-Z]{3})[a-zA-Z]{3}(?=\d{3})
Note that, if I interpreted your requirements correctly, it looks like you flipped the numeric part with the second alpha part in your expression.
Now that you have your two expressions, you just need to combine them with an 'or':
(?<=foobar)[0-9]{3}|(?<=[a-zA-Z]{3})[a-zA-Z]{3}(?=\d{3})
One thing to be aware of is that this will also match words with additional word characters on either end, ie "xfoobar435x". If this is undesirable, add a word boundary \b to the beginnings of the lookbehinds and to the end of the lookahead.
I have a string containing ones and zeroes. I want to determine if there are substrings of 1 or more characters that are repeated at least 3 consecutive times. For example, the string '000' has a length 1 substring consisting of a single zero character that is repeated 3 times. The string '010010010011' actually has 3 such substrings that each are repeated 3 times ('010', '001', and '100').
Is there a regex expression that can find these repeating patterns without knowing either the specific pattern or the pattern's length? I don't care what the pattern is nor what its length is, only that the string contains a 3-peat pattern.
Here's something that might work, however, it will only tell you if there is a pattern repeated three times, and (I don't think) can't be extended to tell you if there are others:
/(.+).*?\1.*?\1/
Breaking that out:
(.+) matches any 1 or more characters, starting anywhere in the string
.*? allows any length of interposing other characters (0 or more)
\1 matches whatever was captured by the (...+) parentheses
.*? 0 or more of anything
\1 the original pattern, again
If you want the repetitions to occur immediately adjacent, then instead use
/(.+)\1\1/
… as suggested by #Buh Buh — the \1 vs. $1 notation may vary, depending on your regexp system.
(.+)\1\1
The \ might be a different charactor depending on your language choice. This means match any string then try to match it again twice more.
The \1 means repeat the 1st match.
it looks weird, but this could be the solution:
/000000000|100100100|010010010|001001001|110110110|011011011|101101101|111111111/
This contains all possible combinations for three times. So your regular expression will match for these numbers (i.e.):
10010010011
00010010011
10110110110
But not for these:
101010101010
001110111110
111000111000
And it doesn't matter where the sequence appears in the whole string.
I need a regex that will match strings of letters that do not contain two consecutive dashes.
I came close with this regex that uses lookaround (I see no alternative):
([-a-z](?<!--))+
Which given the following as input:
qsdsdqf--sqdfqsdfazer--azerzaer-azerzear
Produces three matches:
qsdsdqf-
sqdfqsdfazer-
azerzaer-azerzear
What I want however is:
qsdsdqf-
-sqdfqsdfazer-
-azerzaer-azerzear
So my regex loses the first dash, which I don't want.
Who can give me a hint or a regex that can do this?
This should work:
-?([^-]-?)*
It makes sure that there is at least one non-dash character between every two dashes.
Looks to me like you do want to match strings that contain double hyphens, but you want to break them into substrings that don't. Have you considered splitting it between pairs of hyphens? In other words, split on:
(?<=-)(?=-)
As for your regex, I think this is what you were getting at:
(?:[^-]+|-(?<!--)|\G-)+
The -(?<!--) will match one hyphen, but if the next character is also a hyphen the match ends. Next time around, \G- picks up the second hyphen because it's the next character; the only way that can happen (except at the beginning of the string) is if a previous match broke off at that point.
Be aware that this regex is more flavor dependent than most; I tested it in Java, but not all flavors support \G and lookbehinds.