Unable to get regex working - regex

I have a third party application that lets me enter a regular expression to validate the user input in a text box. I need user to input the data in this format C:64GB, F:128GB H:32GB. This is the regex i wrote:
\b[A-z]{1}[:]?\d*(GB|gb)\b
This regex works fine but it only validates the first block. so if i write C:64GB F:128, it marks the input as valid. it does not check for F:128 as that makes the input invalid. it should be C:64GB F:128GB.
When I change my regex to ^\b[A-z]{1}[:]?\d*(GB|gb)\b$, it only allows me to enter C:64GB.
What am i doing wrong here?

You can use this regex:
^(\W*[A-Za-z]:?\d+(?:GB|gb)\W*)+$

You can use the i case insensitive flag to help simplify the call
/^([A-Z]:?\d+GB[\s,]*)+$/i
here's a demo on regex101.com
This will be quite permissive with whitespace/commas thanks to [\s,]*

You could use something like so: ^[A-Z]:?\d+(GB|gb),( [A-Z]:?\d+(GB|gb)){2}$. This will expect to match the entire pattern. You can see a working example here.

That's because the RegEx is eager. It will find the first match and then stop. You need to loop through all the matches or apply a Global modifier (which finds all the matches)

Your regex will be valid because it just look for at least 1 valid occurrence of the regex equivalent, and as you saw it, your first occurrence validating it, the regex is valid. If you want all your users inputs to be checked you should split your input string into several occurrences of the regex equivalent and check them one by one. Or do the equivalent with a regex, and that would give this :
^([A-Z]:?\d+[Gg][Bb] ?)+$
Side notes :
I removed the {1} after your [A-Z] because it's the regex default behavior,
I transformed your \b to ^ and $ because you need to control the full string and not part of it
I removed the [] around the : because it was useless (you want many values between only 1 value)
I added the space as a separator, but you can change it with whichever character pleases you
I replaced your (GB|gb) by a [Gg][Bb] so it will not be case sensitive on this piece of your user input

Related

Is there any upper limit for number of groups used or the length of the regex in Notepad++?

I am new to using regex. I am trying to use the regex find and replace option in Notepad++.
I have used the following regex:
((?:)|(\+)|(-))(\d)((?:)|(\+)|(-))(/)((?:)|(\+)|(-))(\d)((?:)|(\+)|(-))
For the following text:
2/2
+2/+2
-2/-2
2+/2+
2-/2-
But I am able to get matches only for the first three. The last two, it only gives partial matches, excluding the last "+" and the "-". I am wondering if there is any upper limit for the number of groups (which i doubt is unlikely) that can be used or any upper limit for the maximum length of the regex. I am not sure why my regex is failing. Or if there is anything wrong with my regex, please correct it.
This is not an issue with Notepad++'s regex engine. The problem is that when you have alternations like (?:)|(\+)|(-), the regex engine will attempt to match the different options in the order they are specified. Since you specified an empty group first, it will attempt to match an empty string first, only matching the + or - if it needs to backtrack. This essentially makes the alternation lazy—it will never match any character unless it has to.
vks's answer works perfectly well, but just in case you actually needed those capturing groups separated out, you can do the same thing just by rewriting your alternations like this:
((\+)|(-)|(?:))(\d)((\+)|(-)|(?:))(/)((\+)|(-)|(?:))(\d)((\+)|(-)|(?:))
or even more simply, like this:
((\+)|(-)|)(\d)((\+)|(-)|)(/)((\+)|(-)|)(\d)((\+)|(-)|)
([-+]?)(\d)([-+]?)(/)([-+]?)(\d)([-+]?)
You can use this simple regex to match all cases.See here.
https://www.regex101.com/r/fG5pZ8/19

Regular Expression - Matching part of a word, with one exception

I need a regular expression that will look up "ship" in any instacne, so: ship, spaceships, starship, shipping etc. However it needs to not look up "warship". Also it needs to be case insensitive. At the moment I've got:
(?!(warship))(?i)ship
...which looks up "ship" but still looks up "warship" thanks to it containing "ship". I've tried:
(?!(warship))^(?i)ship
...which works to an extent but then "starship" doesn't get returned for example. I'm sure the answer is super-simple but I can't see it just now. Your help would be great!
First I wanted to try negative lookbehind:
/(?<!war)ship/
it should match all words instead of warship. But it gets the ship part only. So it is ok if you just check your string by regexp but doesn't work properly if you want to get the matched word.
I suggest the search string:
(?i)(\w*ships?)(?<!warship)(?<!warships)
(?i) ... enables case-insensitive search.
(\w*ships?) ... matches any string starting with 0 or more word characters, containing ship and optionally also plural s at end in a marking group. Also possible would be (\b\w*ship\w*\b) or (\b[a-z]*ship[a-z]*\b) to find only entire words containing anywhere ship inside.
(?<!warship)(?<!warships) ... two negative lookbehinds checking if the found word is whether warship nor warships.
It appears you may be using the .NET engine or something similarly expressive, so you can use lookbehind.
First you need a regex to match the entire word:
\w*ship\w*
Then you can easily modify it to not match anything where war comes before ship, using negative lookbehind.
\w*(?<!war)ship\w*
Also, there's probably no reason to specify the case insensitivity flag in the regex itself, just apply it to the regex object when you create it.
I think you want something like this,
(?i)^(?!warship$)(?=.*ship).*
DEMO
It matches any instances of ship but not a warship
OR
(?i)\b\w*?(?<!war)ship\w*?\b
DEMO

regex up to a list of strings (without capturing that last string)

I am trying to form a regular expression to match text between a start-word and the first of a list of stop-words. However, I do not want to include the stop-word in my match.
(The use case is replacing a section of a document, stopping before the keyword signifying the next section)
My regular expression is:
(StartWord)[\s\S]*?(StopWord1|StopWord2|$)
However, this match includes the stop-word. See the example here: http://regexr.com/38pb9
Any thoughts? Thank you!
If your regex engine supports look aheads, you could just use this:
((StartWord)[\s\S]*?(?=StopWord1|StopWord2|$))
The look ahead makes that the match stops when the stop word or the end of the string is encountered, but it is not actually captured as part of the match.
If you also need to exclude the start word, you can use a look behind (again, assuming your regex engine supports it):
((?<=StartWord)[\s\S]*?(?=StopWord1|StopWord2|$))
But of course the simplest method may just be to use your existing pattern but use a group to extract only the parts that you need:
(StartWord)([\s\S]*?)(StopWord1|StopWord2|$)
Here, group 1 will contain the start word, group 2 will contain the body of the match, and group 3 will contain the stop word. In whatever language you're using, you can extract group 2 to get just the body.

What is wrong with my simple regex that accepts empty strings and apartment numbers?

So I wanted to limit a textbox which contains an apartment number which is optional.
Here is the regex in question:
([0-9]{1,4}[A-Z]?)|([A-Z])|(^$)
Simple enough eh?
I'm using these tools to test my regex:
Regex Analyzer
Regex Validator
Here are the expected results:
Valid
"1234A"
"Z"
"(Empty string)"
Invalid
"A1234"
"fhfdsahds527523832dvhsfdg"
Obviously if I'm here, the invalid ones are accepted by the regex. The goal of this regex is accept either 1 to 4 numbers with an optional letter, or a single letter or an empty string.
I just can't seem to figure out what's not working, I mean it is a simple enough regex we have here. I'm probably missing something as I'm not very good with regexes, but this syntax seems ok to my eyes. Hopefully someone here can point to my error.
Thanks for all help, it is greatly appreciated.
You need to use the ^ and $ anchors for your first two options as well. Also you can include the second option into the first one (which immediately matches the third variant as well):
^[0-9]{0,4}[A-Z]?$
Without the anchors your regular expression matches because it will just pick a single letter from anywhere within your string.
Depending on the language, you can also use a negative look ahead.
^[0-9]{0,4}[A-Za-z](?!.*[0-9])
Breakdown:
^[0-9]{0,4} = This look for any number 0 through 4 times at the beginning of the string
[A-Za-z] = This look for any characters (Both cases)
(?!.*[0-9]) = This will only allow the letters if there are no numbers anywhere after the letter.
I haven't quite figured out how to validate against a null character, but that might be easier done using tools from whatever language you are using. Something along this logic:
if String Doesn't equal $null Then check the Rexex
Something along those lines, just adjusted for however you would do it in your language.
I used RegEx Skinner to validate the answers.
Edit: Fixed error from comments

Parse with Regex without trailing characters

How can I successfully parse the text below in that format to parse just
To: User <test#test.com>
and
To: <test#test.com>
When I try to parse the text below with
/To:.*<[A-Z0-9._+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}>/mi
It grabs
Message-ID <CC2E81A5.6B9%test#test.com>,
which I dont want in my answer.
I have tried using $ and \z and neither work. What am I doing wrong?
Information to parse
To: User <test#test.com> Message-ID <CC2E81A5.6B9%test#test.com>
To:
<test#test.com>
This is my parsing information in Rubular http://rubular.com/r/DQMQC4TQLV
Since you haven't specified exactly what your tool/language is, assumptions must be made.
In general regex pattern matching tends to be aggressive, matching the longest possible pattern. Your pattern starts off with .*, which means that you're going to match the longest possible string that ENDS WITH the remainder of your pattern <[A-Z0-9._+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}>, which was matched with <CC2E81A5.6B9%test#test.com> from the Message-ID.
Both Apalala's and nhahtdh's comments give you something to try. Avoid the all-inclusive .* at the start and use something that's a bit more specific: match leading spaces, or match anything EXCEPT the first part of what you're really interested in.
You need to make the wildcard match non greedy by adding a question mark after it:
To:.*?<[A-Z0-9._+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}>