Extract a sub-string from a matched string - regex

I am attempting to extract a sub-string from a string after matching for 24 at the beginning of the string. The substring is a MAC id starting at position 6 till the end of the string. I am aware that a sub string method can do the job. I am curious to know a regex implementation.
String = 2410100:80:a3:bf:72:d45
After much trial and error, this the reg-ex I have which I think is convoluted.
[^24*$](?<=^\S{6}).*$
How can this reg-ex be modified to match for 24, then extract the substring from position 6 till the end of the line?
https://regex101.com/r/vcvfMx/2
Expected Results: 00:80:a3:bf:72:d45

You can use:
(?<=^24\S{3}).*$
Here's a demo: https://regex101.com/r/HqT0RV/1/
This will get you the result you expect (i.e., 00:80:a3:bf:72:d45). However, that doesn't seem to be a valid MAC address (the 5 at the end seems to be not part of the MAC). In which case, you should be using something like this:
(?<=^24\S{3})(?:[0-9a-f]{2}:){5}[0-9a-f]{2}
Demo: https://regex101.com/r/HqT0RV/2
Breakdown:
(?<= # Start of a positive Lookbehind.
^ # Asserts position at the beginning of the string.
24 # Matches `24` literally.
\S{3} # Matches any three non-whitespace characters.
) # End of the Lookbehind (five characters so far).
(?: # Start of a non-capturing group.
[0-9a-f] # A number between `0` and `9` or a letter between `a` and `f` (at pos. #6).
{2} # Matches the previous character class exactly two times.
: # Matches `:` literally.
) # End of the non-capturing group.
{5} # Matches the previous group exactly five times.
[0-9a-f] # Any number between `0` and `9` or any letter between `a` and `f`.
{2} # Matches the previous character class exactly two times.

Related

How can I limit the total length of 2 adjacent strings in Regular Expression?

Example word: name.surname#exm.gov.xx.en
I want to limit the name + surname's total length to 12.
Ex: If name's length is 5 then the surname's length cannot bigger than 7.
My regex is here: ([a-z|çöşiğü]{0,12}.[a-z|çöşiğü]{0,12}){0,12}#exm.gov.xx.en
Thx in advance
If there should be a single dot present which should not be at the start or right before the #, you could assert 13 characters followed by an #
^(?=[a-zçöşğü.]{13}#)[a-zçöşğü]+\.[a-zçöşğü]+#exm\.gov\.xx\.en$
In parts
^ Start of string
(?= Positive lookahead, assert what is on the right is
[a-zçöşğü.]{13}# Match 13 times any of the listed followed by an #
) Close lookahead
[a-zçöşğü]+\.[a-zçöşğü]+ Match 2 times any of the listed with a dot inbetween
#exm\.gov\.xx\.en Match #exm.gov.xx.en
$ End of string
Regex demo
Note that I have omitted the pipe | from the character class as it would match it literally instead of meaning OR. If you meant to use it as a char, you could add it back. I also have remove the i as that will be matched by a-z

How to use regular expression to use as few groups as possible to match as long string as possible

For example, this is the regular expression
([a]{2,3})
This is the string
aaaa // 1 match "(aaa)a" but I want "(aa)(aa)"
aaaaa // 2 match "(aaa)(aa)"
aaaaaa // 2 match "(aaa)(aaa)"
However, if I change the regular expression
([a]{2,3}?)
Then the results are
aaaa // 2 match "(aa)(aa)"
aaaaa // 2 match "(aa)(aa)a" but I want "(aaa)(aa)"
aaaaaa // 3 match "(aa)(aa)(aa)" but I want "(aaa)(aaa)"
My question is that is it possible to use as few groups as possible to match as long string as possible?
How about something like this:
(a{3}(?!a(?:[^a]|$))|a{2})
This looks for either the character a three times (not followed by a single a and a different character) or the character a two times.
Breakdown:
( # Start of the capturing group.
a{3} # Matches the character 'a' exactly three times.
(?! # Start of a negative Lookahead.
a # Matches the character 'a' literally.
(?: # Start of the non-capturing group.
[^a] # Matches any character except for 'a'.
| # Alternation (OR).
$ # Asserts position at the end of the line/string.
) # End of the non-capturing group.
) # End of the negative Lookahead.
| # Alternation (OR).
a{2} # Matches the character 'a' exactly two times.
) # End of the capturing group.
Here's a demo.
Note that if you don't need the capturing group, you can actually use the whole match instead by converting the capturing group into a non-capturing one:
(?:a{3}(?!a(?:[^a]|$))|a{2})
Which would look like this.
Try this Regex:
^(?:(a{3})*|(a{2,3})*)$
Click for Demo
Explanation:
^ - asserts the start of the line
(?:(a{3})*|(a{2,3})*) - a non-capturing group containing 2 sub-sequences separated by OR operator
(a{3})* - The first subsequence tries to match 3 occurrences of a. The * at the end allows this subsequence to match 0 or 3 or 6 or 9.... occurrences of a before the end of the line
| - OR
(a{2,3})* - matches 2 to 3 occurrences of a, as many as possible. The * at the end would repeat it 0+ times before the end of the line
-$ - asserts the end of the line
Try this short regex:
a{2,3}(?!a([^a]|$))
Demo
How it's made:
I started with this simple regex: a{2}a?. It looks for 2 consecutive a's that may be followed by another a. If the 2 a's are followed by another a, it matches all three a's.
This worked for most cases:
However, it failed in cases like:
So now, I knew I had to modify my regex in such a way that it would match the third a only if the third a is not followed by a([^a]|$). So now, my regex looked like a{2}a?(?!a([^a]|$)), and it worked for all cases. Then I just simplified it to a{2,3}(?!a([^a]|$)).
That's it.
EDIT
If you want the capturing behavior, then add parenthesis around the regex, like:
(a{2,3}(?!a([^a]|$)))

RegEx: Get every word until last 4 words

I have strings like
wwww-wwww-wwww
wwww-www-ww-ww
Many w separated with -
But it's not regular wwww-wwww, it could be w-w-w-w as well
I try to find a regex that capture every word until the last 4 words.
So the result for example 1 would be the first 8w's (wwww-wwww)
For 2nd example the first 5w's (wwww-w)
Is it possible to do this in regex?
I have something like this right now:
^\w*(?=\w{4}$)
or maybe
[^-]*(?=\w{4}$)
I have 2 problems with my "solutions":
the last 4 words will not be captured for example 2. They are interrupted by the -
the words before the last 4 will not be captured. They are interrupted by the -.
Yes, it's possible with a slightly more sophisticated lookahead assertion:
/\w(?=(?:-*\w){4,}$)/x
Explanation:
/ # Start of regex
\w # Match a "word" character
(?= # only if the following can be matched afterwards:
(?: # (Start of capturing group)
-* # - zero or more separators
\w # - exactly one word character
){4,} # (End of capturing group), repeated 4 or more times.
$ # Then make sure we've reached the end of the string.
) # End of lookahead assertion/x
Test it live on regex101.com.

Regex to match time ranges involving am/pm like 7am-10pm

I have written the following regex
(1[012]|[1-9])(am|pm)\-(1[012]|[1-9])(am|pm)
to match following kind of time formats:
7am-10pm (matches correctly and creates 4 match groups 7, am, 10, pm)
13am-10pm (this should not be matched, however it matches and creates 4 match groups 3, am, 10, pm)
10pm (this doesn't match as expected because it doesn't specify the time range end)
111am-10pm (this should not be matched, however it matches and creates 4 match groups 11, am, 10, pm)
How can I improve my regex such that I don't need to repeat the digits and am/pm pattern and also following things:
it captures only the time range components like in 7am-10am there should be only 2 match groups 7am, 10am.
it matches only proper hours for e.g. 111am or 13pm etc should be considered a no-match.
I don't know if its possible to with a regex but can we make the regex match correct time ranges for e.g. 7am-1pm should match, however 4pm-1pm should be considered as no match?
Note: I am using Ruby 2.2.1
Thanks.
First let's see what you did wrong :
13am-10pm (this should not be matched, however it matches and creates 4 match groups 3, am, 10, pm)
it matches only proper hours for e.g. 111am or 13pm etc should be considered a no-match.
This matches, since you allow to match a single digit [1-9] here : (1[012]|[1-9]).
In order to fix this, you should either allow one [1-9] digit, or 1 + [0-2]. Since we do not know when the regex starts we 'll use some word boundary to be sure we have a "word start".
Since you do not want to capture the numbers but the whole time plus the am|pm you can use a non capturing group :
\b((?:1[0-2]|[1-9])
Then it's simply a matter of repeating ourselves and adding a dash :
\b((?:1[0-2]|[1-9])[ap]m)-((?:1[0-2]|[1-9])[ap]m)
Regarding point 3. Well, yes you could do this with a regex, but you are better off by simply adding a logical check once you get group 1 and 2 to see if the time range really makes sense.
All in all this is what you get :
# \b((?:1[0-2]|[1-9])[ap]m)-((?:1[0-2]|[1-9])[ap]m)
#
#
# Assert position at a word boundary «\b»
# Match the regular expression below and capture its match into backreference number 1 «((?:1[0-2]|[1-9])[ap]m)»
# Match the regular expression below «(?:1[0-2]|[1-9])»
# Match either the regular expression below (attempting the next alternative only if this one fails) «1[0-2]»
# Match the character “1” literally «1»
# Match a single character in the range between “0” and “2” «[0-2]»
# Or match regular expression number 2 below (the entire group fails if this one fails to match) «[1-9]»
# Match a single character in the range between “1” and “9” «[1-9]»
# Match a single character present in the list “ap” «[ap]»
# Match the character “m” literally «m»
# Match the character “-” literally «-»
# Match the regular expression below and capture its match into backreference number 2 «((?:1[0-2]|[1-9])[ap]m)»
# Match the regular expression below «(?:1[0-2]|[1-9])»
# Match either the regular expression below (attempting the next alternative only if this one fails) «1[0-2]»
# Match the character “1” literally «1»
# Match a single character in the range between “0” and “2” «[0-2]»
# Or match regular expression number 2 below (the entire group fails if this one fails to match) «[1-9]»
# Match a single character in the range between “1” and “9” «[1-9]»
# Match a single character present in the list “ap” «[ap]»
# Match the character “m” literally «m»
You are missing ^ (start of the line) in your regex and thats why it is matching from between.
You have to use:
^(1[012]|[1-9])(am|pm)\-(1[012]|[1-9])(am|pm)
Better solution: You can also use \b (boundary) if your pattern doesn't always start from new line.
\b(1[012]|[1-9])(am|pm)\-(1[012]|[1-9])(am|pm)\b
See DEMO.

I need a regx to validate a name that can be 1, 2, or 3 words

In this example I try to validate for a city name. It works if I enter San Louis Obispo but not if I enter Boulder Creek or Boulder. I thought ? was supposed to make a block optional.
if (!/^[a-zA-Z'-]+\s[a-zA-Z'-]*\s([a-zA-Z']*)?$/.test(field)){
return "Enter City only a-z A-Z .\' allowed and not over 20 characters.\n";
}
I think spaces are the problem (\s). You made second and third words optional (by using * instead of +), but not the spaces. Question mark is only being applied to the third word because of parentheses.
The issue with your regex is that, in english, it says to match a word that's required to be followed by a space that's optionally followed by another word but then is required to have another space and then optionally another word. So, a single-word would not match - however, a word followed by two spaces would. Additionally two words that have a space at the end would also match - but neither without the trailing spaces would match.
To fix your exact regex you should add another grouping (non-matching group with (?: instead of just () around the second word to the end of the sentence) and have this group as optional with ?. Also, move the \s's inside the optional groups as well.
Try this:
^[a-zA-Z'-]+(?:\s[a-zA-Z'-]+(?:\s[a-zA-Z']+)?)?$
Regex explaind:
^ # beginning of line
[a-zA-Z'-]+ # first matching word
(?: # start of second-matching word
\s[a-zA-Z'-]+ # space followed by matching word
(?: # start of third-matching word
\s[a-zA-Z']+ # space followed by matching word
)? # third-matching word is optional
)? # second-matching word is optional
$ # end of line
Alternatively, you can try the following regex:
^([a-zA-Z'-]+(?:\s[a-zA-Z'-]+){0,2})$
This will match 1 through 3 words, or "cities", in a given line with the ability to adjust the range of words without having to further-duplicate the matching set for each new word.
Regex explained:
^( # start of line & matching group
[a-zA-Z'-]+ # required first matching word
(?: # start a non-matching group (required to "match", but not returned as an individual group)
\s # sub-group required to start with a space
[a-zA-Z'-]+ # sub-group matching word
){0,2} # sub-group can match 0 -> 2 times
)$ # end of matching group & line
So, if you want to add the ability to match more than 3 words, you can change the 2 in the {0,2} range above to be the number of words you want to match minus 1 (i.e. if you want to match 4 words, you'll set it to {0,3}).