Combining two Regular expressions into a single one using Vb.net - regex

I had two regular expressions which are mentioned below .
Regex 1.
^\d{9}_[a-zA-Z]{1}_(0[1-9]|1[0-2]).(0[1-9]|[1-2][0-9]|3[0-1]).[0-9]{4}_[0-9]{3}_[0-9a-zA-Z]{2}(?:_[0-9a-zA-Z]*)?
using this I am checking string.
999999999_A_12.10.2015_010_2q_somedescription
if any of this pattern got failed let say
999999999_12.10.2015_010_2q_somedescription
I need to notify second part got missed for this I am using regex 2.
Regex 2.
^\d{9}_^[a-zA-Z]$_(0[1-9]|1[0-2]).(0[1-9]|[1-2][0-9]|3[0-1]).[0-9]{4}_^[0-9]{3}$_[0-9a-zA-Z]{2}$_[0-9a-zA-Z]*
I tried splitting regex one and string into groups and comparing them. i am using Regex.Match method in vb.net even if my string contains
999999999_AB_12.10.2015_010_2q_somedescription
it is giving result as success.so I wrote regex 2 for exact match. But I need to combine these two regular expressions and make it into a single one. I am splitting regex 2 and string comparing them using Regex.Match method in vb.net which is working but I don't want to maintain two regex expressions.
Considered Match: 9
99999999_A_12.10.2015_010_2q_somedescription
if any thing is missing from the above string like
999999999_12.10.2015_010_2q_somedescription
(or) if anything is other than above format like
999999999_AB_12.10.2015_010_2q_somedescription
which are considered as mismatch I need to find which part is missing and I should notify it to the user
MisMatch:
999999999_12.10.2015_010_2q_somedescription,999999999_AB_12.10.2015_010_2q_somedescription,999999999_AB_12.10.20_010_2q_somedescription,999999999_AB_12.10.2015_01_2q_somedescription,999999999_AB_12.10.2015_010_2_somedescription,9999_AB_12.10.2015_010_2q_somedescription

You should use named group to get the value of the part that can change. For example:
\d{9}_(?<X>[a-zA-Z]{1})_(0[1-9]|1[0-2]).(0[1-9]|[1-2][0-9]|3[0-1]).[0-9]{4}_[0-9]{3}_[0-9a-zA-Z]{2}(?:_[0-9a-zA-Z]*)?
Now in VB.NET you can check the value of the capture group X in your match. You can then use if or switch to do whatever you want.

Related

Trying to extract repeating pattern from string in php/javascript

The following is in PHP but the regex will also be used in javascript.
Trying to extract repeating patterns from a string
string can be any of the following:
"something arbitrary"
"D123"
"D111|something"
"D197|what.org|when.net"
"D297|who.197d234.whatever|when.net|some other arbitrary string"
I'm currently using the following regex: /^D([0-9]{3})(?:\|([^\|]+))*/
This correctly does not match the first string, matches the second and third correctly. The problem is the third and fourth only match the Dxxx and the last string. I need each of the strings between the '|' to be matched.
I'm hoping to use a regex as it makes it a single step. I realize I could just detect the leading Dxxx then use explode or split as appropriate to break the strings out. I've just gotten stuck on wanting a single regular expression match step.
This same regex may be used in Python as well so just want a generic regex solution.
There is no way to have a dynamic number of capture groups in a regular expression, but if you know some upper limit to how many parts you would have in one string, you can just repeat the pattern that many times:
/^D([0-9]{3})(?:$|\|)(.*?)(?:$|\|)(.*?)(?:$|\|)(.*?)(?:$|\|)(.*?)(?:$|\|)/
So after the initial ^D([0-9]{3})(?:$|\|) you just repeat (.*?)(?:$|\|) as many times as you need it.
When the string has fewer elements, those remaining capture groups will match the empty string.
See regex tester.
Is something like preg_match_all() (the PHP variant of a global match) also acceptable for you?
Then you could use:
^(?|D([0-9]{3})|^.+$|(?!^)\|([^|\n]*)(?=\||$))
This will match everything in a string in different matches, e.g. take your string:
D197|what.org|when.net
It will you then give three matches:
D197
what.org
when.net
Running live: https://regex101.com/r/jL2oX6/4 (Everything in green are your group matches. Ignore what's in blue.)

A regular expression that replaces a group with hard coded text

First of all, I'm not sure if this is something you can even do in regular expressions. If you can, I have no idea on how to search for how to do it.
Let's say I have text:
Click this link for more information.
And a regular expression:
<a[^>]*>([^<]*)</a>
The application of the regular expression would yield this for group 1:
this link
Let's say I wanted to write the regular expression to instead return hard coded text for group 1
<a[^>]*>(${{replacement text}}[^<]*)</a>
(this is made up syntax by the way)
So that the application of the regular expression to the text would yield this for group 1:
replacement text
Is this possible?
Here's another example just to solidify my objective:
Examples of text:
serverNode1/appPortal
serverNode1/appPortal2
serverNode1/appPortal3
My regular expression
appPortal((?:?{{"1"}}\b)|(?:\d))
(using the same made up syntax)
The expected output for the first character group should be
1
2
3
(The point of the expression is to match the word break and replace it with "1" or otherwise use the digit character class to match a digit. The sub-groups are made optional with the ?: so the outside group is still group 1).
What is the point of this you may ask? I am using Splunk to do field extractions, and I'd like for the field to be extracted as 1, 2, or 3, like in my above example, and I can only rely on the regular expression groups to give me the fields (as in, I don't have anywhere to put code to say if group 1 == "" then change to "1").
Basically, as the regular expressions defined, it is not possible. By definition, regular expressions match the patterns in the text. To be clear, regexp engine returns matches that are always part of the original string, nothing more. There are some regex extensions that allows to specify name of the capturing group, but it does not transform the match.
The behaviour you described can be easy achieved processing the regex match in any programming language, but it also can be achieved by combining regex substitution and parsing.
For example, s/appPortal(?!\d)/appPortal1/ will replace "appPortal" without the digit after it with "appPortal1" and then you can apply another regex to build the match you want.

Find Regex mismatch part in a string using vb.net

I had a regex expression
^\d{9}_[a-zA-Z]{1}_(0[1-9]|1[0-2]).(0[1-9]|[1-2][0-9]|3[0-1]).[0-9]{4}_\d*_[0-9a-zA-Z]*_[0-9a-zA-Z]*
and string that match regex expression
000066874_A_12.31.2014_001_2Q_ICAN14
if user by mistake enters the string other than above format like
000066874_12.31.14_001_2Q_ICAN14
I need to find out in which part of my regex got failed. I tried using Regex.Matches and Regex.Match but using this I couldn't find in which part my string got miss matched with my Regex expression. I am using vb.net
This is very complicated to do with regex. I managed to make this regex, but you still have to check the capture groups after that.
^(?:(?:(\d{9})|.*?)_)?(?:(?:([a-zA-Z]{1})|.*?)_)?(?:(?:((?:0[1-9]|1[0-2]).(?:0[1-9]|[1-2][0-9]|3[0-1]).[0-9]{4})|.*?)_)?(?:(?:(\d*)|.*?)_)?(?:(?:([0-9a-zA-Z]*)|.*?)_)?(?:([0-9a-zA-Z]*)|.*?)$ will work if you, as seen in demo: https://regex101.com/r/aJ1wG1/2
Each part before an underline is a capture group, if a capture group is not there, there's an error in it. As you can see in the example, $3 is not present in 1st example, hence, a mistake in date is there. In second example, the $2 is not present, hence $2 onward are not there. 3rd example is correct and all 6 caputre groups are there.
When regexes get this massive, it's a sign that probably a different method should be used to solve the problem, but this might work for you with some additional code for group result checks.

Regex match sequence more than once

How come for something that simple I can't find an answer after looking one hour in the internet?
I have this sentence:
HeLLo woRLd HOw are YoU
I want to capture all groups that consist of two following capital letters
[A-Z]{2}
The regex above works but capture only LL (the first two capital letters) while I want LL in one group and in the other groups also RL HO
Most regular expression engines expose some way to make your expression global. This means that your expression will applied multiple times. This global flag is usually denoted with the /g marker at the end of your expression. This is your regular expression without the /g flag, while this is what happens when you apply said flag.
Different languages expose such functionality differently, in C# for instance, this is done through the Regex.Matches syntax. In Java, you use while(matcher.find()), which keeps providing sub strings which match the pattern provided.
EDIT: I am not a Python person, but judging from the example available here, you could do something like so:
it = re.finditer(r"[A-Z]{2}", "HeLLo woRLd HOw are YoU")
for match in it:
print "'{g}' was found between the indices {s}".format(g=match.group(), s=match.span())
You can not have multiple groups in this case, but you can have multiple matches. Add the global flag to your regex and use a method to match the regex.
For javscript, it would be /[A-Z]{2}/g.
The method most probably returns an Array of matches, and you can use index to access them.

Regular Expression to extract multiple parts when some string parts are absent

I am trying to create a regular expression that will capture several sections of a string. This is the expression I have created:
([0-9]{6}[-*][0-9xX]{7}).*([0-9]{1,3}-[0-9]{1,3}-[0-9]{1,3}).*([FPTSUCD])=?([01][*-])
The string that this runs against can appear in two different styles:
# 141803-6310114 #3-0-2 T0-jL
Or
]#0-7-4 C1-vU
When I use the first string I get all the parts I need.
141803-6310114
3-0-2
T
0-
When I use the second string I get no matches. This second sting is basically the same as the first but without this part “141803-6310114”. I would like the expression to work with both strings but for the number sequence to be optional. Can anyone advise on what the expression should look like to do this?
This will get you the parts in both cases:
(?:(\d{6}[-*][\dxX]{7}))?[^\d]*(\d{1,3}-\d{1,3}-\d{1,3}) ([FPTSUCD])=?([01][*-])
Made the first group optional (?) and changed the "eat all" between the first two groups to a "eat all non digits" + other clean up to make it more readable (at least to me ;)).
Regards