Find Regex mismatch part in a string using vb.net - regex

I had a regex expression
^\d{9}_[a-zA-Z]{1}_(0[1-9]|1[0-2]).(0[1-9]|[1-2][0-9]|3[0-1]).[0-9]{4}_\d*_[0-9a-zA-Z]*_[0-9a-zA-Z]*
and string that match regex expression
000066874_A_12.31.2014_001_2Q_ICAN14
if user by mistake enters the string other than above format like
000066874_12.31.14_001_2Q_ICAN14
I need to find out in which part of my regex got failed. I tried using Regex.Matches and Regex.Match but using this I couldn't find in which part my string got miss matched with my Regex expression. I am using vb.net

This is very complicated to do with regex. I managed to make this regex, but you still have to check the capture groups after that.
^(?:(?:(\d{9})|.*?)_)?(?:(?:([a-zA-Z]{1})|.*?)_)?(?:(?:((?:0[1-9]|1[0-2]).(?:0[1-9]|[1-2][0-9]|3[0-1]).[0-9]{4})|.*?)_)?(?:(?:(\d*)|.*?)_)?(?:(?:([0-9a-zA-Z]*)|.*?)_)?(?:([0-9a-zA-Z]*)|.*?)$ will work if you, as seen in demo: https://regex101.com/r/aJ1wG1/2
Each part before an underline is a capture group, if a capture group is not there, there's an error in it. As you can see in the example, $3 is not present in 1st example, hence, a mistake in date is there. In second example, the $2 is not present, hence $2 onward are not there. 3rd example is correct and all 6 caputre groups are there.
When regexes get this massive, it's a sign that probably a different method should be used to solve the problem, but this might work for you with some additional code for group result checks.

Related

Regex for value.contains() in Google Refine

I have a column of strings, and I want to use a regex to find commas or pipes in every cell, and then make an action. I tried this, but it doesn't work (no syntax error, just doesn't match neither commas nor pipes).
if(value.contains(/(,|\|)/), ...
The funny thing is that the same regex works with the same data in SublimeText. (Yes, I can work it there and then reimport, but I would like to understand what's the difference or what is my mistake).
I'm using Google Refine 2.5.
Since value.match should return captured texts, you need to define a regex with a capture group and check if the result is not null.
Also, pay attention to the regex itself: the string should be matched in its entirety:
Attempts to match the string s in its entirety against the regex pattern p and returns an array of capture groups.
So, add .* before and after the pattern you are looking inside a larger string:
if(value.match(/.*([,|]).*/) != null)
You can use a combination of if and isNonBlank like:
if(isNonBlank(value.match(/your regex/), ...

Improving a regex

I am looking for alternate methods to get john from the provided example.
My expression works as is but was hoping for some examples of better methods.
Example: john&home
my regexp: [a-z]{3,6}[^&home]
Im matching any character of length 3-6 upto but not including &home
Every item i run the regexp on is in the same format. 3-6 characters followed by &home
I have looked at other posts but was hoping for a reply specific to my regexp.
Most regex engines allow you to capture parts of a regex with capture groups. For instance:
^([A-Za-z]{3,6})&home$
The brackets here mean that you are interested in the part before the &home. The ^ and $ mean that you want to match the entire string. Without it, averylongname&homeofsomeone will be matched as well.
Since you use rubular, I assume you use the Ruby regex engine. In that case you can for instance use:
full = "john&home"
name = full.match(/^([A-Za-z]{3,6})&home$/).captures
And name will in this case contain john.

Is there any upper limit for number of groups used or the length of the regex in Notepad++?

I am new to using regex. I am trying to use the regex find and replace option in Notepad++.
I have used the following regex:
((?:)|(\+)|(-))(\d)((?:)|(\+)|(-))(/)((?:)|(\+)|(-))(\d)((?:)|(\+)|(-))
For the following text:
2/2
+2/+2
-2/-2
2+/2+
2-/2-
But I am able to get matches only for the first three. The last two, it only gives partial matches, excluding the last "+" and the "-". I am wondering if there is any upper limit for the number of groups (which i doubt is unlikely) that can be used or any upper limit for the maximum length of the regex. I am not sure why my regex is failing. Or if there is anything wrong with my regex, please correct it.
This is not an issue with Notepad++'s regex engine. The problem is that when you have alternations like (?:)|(\+)|(-), the regex engine will attempt to match the different options in the order they are specified. Since you specified an empty group first, it will attempt to match an empty string first, only matching the + or - if it needs to backtrack. This essentially makes the alternation lazy—it will never match any character unless it has to.
vks's answer works perfectly well, but just in case you actually needed those capturing groups separated out, you can do the same thing just by rewriting your alternations like this:
((\+)|(-)|(?:))(\d)((\+)|(-)|(?:))(/)((\+)|(-)|(?:))(\d)((\+)|(-)|(?:))
or even more simply, like this:
((\+)|(-)|)(\d)((\+)|(-)|)(/)((\+)|(-)|)(\d)((\+)|(-)|)
([-+]?)(\d)([-+]?)(/)([-+]?)(\d)([-+]?)
You can use this simple regex to match all cases.See here.
https://www.regex101.com/r/fG5pZ8/19

Issue while formulating regex for the password field

I am trying to formulate a regexp for a password field, which accepts at least one special character and one alpha numeric character.
I have already tried with this regexp ((?=.*\d)(?=.*[a-zA-Z])(?=.*\W)) on Rubular, which I cooked up. But it's not working properly.
Test String : test#123
Kindly suggest a way to overcome this.
If you can please give some explanation as well.
Your regex actually does match your test string. It seems that you are wanting it to be in your capture group though as you wrapped the look-aheads in parenthesis.
Wrapping a capture group around your look-aheads wont capture anything as they are just looking ahead to verify. You'll have to create a capture group capturing the entire thing after like this:
^(?=.*\d)(?=.*[a-zA-Z])(?=.*\W)(.{6,20})$
The ^ and $ are just checking the entire string passed. The . within the capture group () is just saying to grab the entire match. The {6,20} is saying it has to be between 6 and 20 characters long. You can change the numbers if you want.
Rubular

capture with if-then-else in php regex

I'm very lost with a regular expression. It's just black magic to me. Here's what i need:
there is a filename: some_file.jpg
it might be in the following format: some_file_p250.jpg
the regex to match the file in simple format: /^([a-zA-Z_-0-9]+).(jpg|jpeg|png)$/
the regex to match the file in advanced format: /^([a-zA-Z_-0-9]+)(_[a-z]?[0-9]{2,3}).(jpg|jpeg|png)$/
my question is as follows: how do i make the "(_[a-z]?[0-9]{3,4})" part optional? I've tried adding a question mark to the second group like this:
/^([a-zA-Z_\-0-9]+)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
Even though the pattern works, it always captures the contents of the second group in the first group and leaves the second empty.
How can i make this work to capture the filename, advanced part (_p250) and the extension separately? I'm thinking it has something to do with the greediness of the first group, but i might be completely wrong and even if i'm right, i still don't know how to solve it.
Thanks for your thoughts
Adding a question mark after the first plus will make the first capturing expression non-greedy. This worked for me using your test case:
/^([a-zA-Z_\-0-9]+?)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
I tested in Javascript, not PHP, but here's my test:
"some_file_p250.jpg".match(/^([a-zA-Z_\-0-9]+?)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/)
and my results:
["some_file_p250.jpg", "some_file", "_p250", "jpg"]
In my experience, making a capturing expression non-greedy makes regular expressions a lot more intuitive and will often make them work the way I expect them to work. In your case, it was doing what you suspected; the first expression was capturing everything and never gave the second expression a chance to capture anything.
I think this is what you want:
/^([a-zA-Z_\-0-9]+)(|_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
or
/^([\d\w\-]+)(|_[a-z]?[0-9]{3,4})\.(jpg|jpeg|png)$/