Regex Get Number In Between Word & Digits - regex

I have the following string:
[Tag|String|WORD012311120151218]
How can I get 0123111 from WORD012311120151218 - the problem I'm having is that 0123111 is dynamic and can come through the system as any length.
My thought process is as follows but I can't seem to get it right after searching: extract any value from after WORD and ends before the date 20151218 - the date changes daily but it's always going to be 8 digits.
Is this possible?

You can use this regex:
.*WORD(\d+)\d{8}\].*
And get the group $1.
Explanation in this link.
In Java, it would be:
String input = "[Tag|String|WORD012311120151218]";
String num = input.replaceAll(".*WORD(\\d+)\\d{8}\\].*", "$1");
System.out.println(num); //prints 0123111

/\[.*WORD([0-9]+)[0-9]{8,8}\]$/
should do it, with suitable escapes depending on which regex flavor you're using. ([0-9]+) is what you want, and [0-9]{8,8} is 8 trailing digits for the date.

'WORD012311120151218'.match(/(\d+)(\d{8}$)/)[1]

Related

Is it possible to negate a group in a regular expression?

Let's say that we have this text:
2020-09-29
2020-09-30
2020-10-01
2020-10-02
2020-10-12
2020-10-16
2020-11-12
2020-11-23
2020-11-15
2020-12-01
2020-12-11
2020-12-30
I want to do something like this:
\d\d\d\d-(NOT10)-(30)
So i want to get all dates of any year, but not of the 10th month and it is important, that the day is 30.
I tried a lot to do this using negative lookahead asserations but i did not come up with any working regexes.
You can use negative lookaheads:
\d\d\d\d-(?!10)\d\d-30
The Part (?!10) ensures that no 10 follows at the point where it is inserted into the regex. Notice that you still need to match the following digits afterwards, thus the \d\d part.
Generally speaking you can not (to my knowledge) negate a part that then also matches parts of the string. But with negative lookaheads you can simulate this as I did above. The generalized idea looks something like:
(?!<special-exclusion-pattern>)<general-inclusion-pattern>
Where the special-exclusion-pattern matches a subset of the general-inclusion-pattern. In the above case the general inclusion pattern is \d\d and the special exclusion pattern ins 10.
Try :
/20\d{2}-(?:0[1-9]|1[12])-30/
Explanation :
20\d{2} it will match 20XX
(?:0[1-9]|1[12]) it will match 0X or 11, 12
30 it will match 30
Demo :https://regex101.com/r/O2F1eV/1
It's easiest to simply convert the substring (if present) that matches /^\d{4}-10-30$/ to an empty string, then split the resulting string on one or more newlines.
If your string were
2020-10-16
2020-10-30
2020-11-12
2020-11-23
and was held by the variable str, then in Ruby, for example,
str.sub(/^\d{4}-10-30$/,'')
#=> "2020-10-16\n\n2020-11-12\n2020-11-23\n"
so
str.sub(/^\d{4}-10-30$/,'').split
#=> ["2020-10-16", "2020-11-12", "2020-11-23"]
Whatever language you are using undoubtedly has similar methods.

Tricky regex validation

I need to validate string with 2 groups which are separated with one space with next rules:
Each group needs to be at least 2 character long but less or equal to 15
Both groups together can't be more than 20 chars long (not counting space)
Groups can only contain letters (that's simple, it's [a-zA-Z])
Following these rules, here are some examples
Firstname Lastname (Valid)
Somename T (Invalid, 2nd one is <2)
Somethingsomettt Here (Invalid, first one is > 15)
Somethingsome Somethingsome (Invalid, total > 20)
It'd be simple [a-zA-Z]{2,15} [a-zA-Z]{2,15} if it wasn't for that 2+2<=total<=20 condition.
Is it even possible to limit it this way? If it is - how?
UPDATE
Just for the sake of it, resulting regex was supposed to be ^(?=[a-zA-Z ]{5,21}$)[a-zA-z]{2,15} [a-zA-Z]{2,15}$, #vks was closest one to it. Nevertheless, thanks #popovitsj and #Avinash Raj too.
^(?=.{5,21}$)[a-zA-Z]{2,15} [a-zA-Z]{2,15}$
Try this.See demo.
http://regex101.com/r/nA6hN9/30
This can be done with lookahead. Something like this:
^(?=.{1,20}$)[a-zA-z]{2,14} [a-zA-Z]{2,14}$
You could try the below regex which uses negative lookahead,
(?!^.{22,})^[a-zA-Z]{2,15} [a-zA-Z]{2,15}$
DEMO

How to match a one of a set of numbers?

I am trying to match a group of numbers in regex that consist of one of the following:
1,2,3,4,5,6,7,8,9,10,11
But I am having trouble figuring out the regex.
For single digits this pattern worked fine "0|1|2|3|4|5|6|7|8|9" but it fails on double digit numbers. For example 12 passes as ok due to the regex finding the 1 in 12.
You can use begin and end anchors to force the whole string to be matched:
^(0|1|2|3|4|5|6|7|8|9|10|11)$
Which can be shortened to:
^(\d|10|11)$
This will work if you want to check if just one number is between 0 and 11.
^[0-9]$|^1?[0-1]$
If you want to match a string like:
1,2,3,12,32,5,1,6,8, 11
and match 0-11 then you can use the following:
(?<=,|^)([0-9]|1?[0-1])(?=,|$)
use this regex ^(0|1|2|3|4|5|6|7|8|9|(10)|(11))$

perl regular expression digit match

Doing the below regex match to verify whether date is in the YYYY_MM_DD Format. But the regular expression gives an error message if i have a value of 2012_07_7. Date part and month should be exactly 2 digits according to the regex pattern. Not sure why it's not working.
if ($cmdParams{RunId} !~ m/^\d{4}_\d{2}_\d{2}$/)
{
print "Not a valid date in the format YYYY_MM_DD";
}
Your regex specifies exactly 2 digits for the day component, if you want to allow either 1 or 2 digits you should use {1,2} rather than {2}
Well if you look at your data that you have: 2012_07_7 you can see that the day-part is not of two digits.
Obviously. Your pattern dictates that the last numeric chunk should be of two digits, whereas you are providing 1. So if you want your pattern to match this text, try something like:
if ($cmdParams{RunId} !~ m/^\d{4}_\d{2}_\d\d?$/)
My solution: ^\d{4}_(?:1[0-2]|0?[1-9])_(?:3[01]|[1-2]\d|0?[1-9])$
this pattern match: 2000_12_01 or 2001_1_1 or 2001_02_1

What is wrong with this Regular Expression?

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.
you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.
.*?idUser=([0-9]+).*?
That regex should work for you :o)
Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match
Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))