Regular Expression - Matching part of a word, with one exception - regex

I need a regular expression that will look up "ship" in any instacne, so: ship, spaceships, starship, shipping etc. However it needs to not look up "warship". Also it needs to be case insensitive. At the moment I've got:
(?!(warship))(?i)ship
...which looks up "ship" but still looks up "warship" thanks to it containing "ship". I've tried:
(?!(warship))^(?i)ship
...which works to an extent but then "starship" doesn't get returned for example. I'm sure the answer is super-simple but I can't see it just now. Your help would be great!

First I wanted to try negative lookbehind:
/(?<!war)ship/
it should match all words instead of warship. But it gets the ship part only. So it is ok if you just check your string by regexp but doesn't work properly if you want to get the matched word.

I suggest the search string:
(?i)(\w*ships?)(?<!warship)(?<!warships)
(?i) ... enables case-insensitive search.
(\w*ships?) ... matches any string starting with 0 or more word characters, containing ship and optionally also plural s at end in a marking group. Also possible would be (\b\w*ship\w*\b) or (\b[a-z]*ship[a-z]*\b) to find only entire words containing anywhere ship inside.
(?<!warship)(?<!warships) ... two negative lookbehinds checking if the found word is whether warship nor warships.

It appears you may be using the .NET engine or something similarly expressive, so you can use lookbehind.
First you need a regex to match the entire word:
\w*ship\w*
Then you can easily modify it to not match anything where war comes before ship, using negative lookbehind.
\w*(?<!war)ship\w*
Also, there's probably no reason to specify the case insensitivity flag in the regex itself, just apply it to the regex object when you create it.

I think you want something like this,
(?i)^(?!warship$)(?=.*ship).*
DEMO
It matches any instances of ship but not a warship
OR
(?i)\b\w*?(?<!war)ship\w*?\b
DEMO

Related

How to capture the word between is after certain text after end with some text in regex?

I would like to find something like this:
-(IBOutlet)UIView *aView;
I would like to find aView, something that I can confirm is -(IBOutlet) must be a prefix, but it comes with not ensure a space or another string, after that, we need to string that must begin with '*', until it match the ;.
So, my regex look like that:
(IBOutlet)*\*?;
For sure, it can't capture what I want. Any advise?
You just have to build it up incrementally. The best reference that I have found (by far) is http://www.regular-expressions.info. After learning the basics, you can then use one of many online pattern matching tools, here is one:
https://regex101.com
With that, your goal is easily determined (with some allowances for free space):
^\s*-\s*\(IBOutlet\)(\w*)\s*(\*\w*)
First problem: you don't have a capturing group so how do you get aView back after the match?
Second, the \*? means "match the * character literally, 0 or 1 times", which I guess isn't what you want either.
Try this pattern:
(IBOutlet)*\*(.+);
RegEx 101 can explain what each component means.

Smallest possible match / nongreedy regex search

I first thought that this answer will totaly solve my issue, but it did not.
I have a string url like this one:
http://www.someurl.com/some-text-1-0-1-0-some-other-text.htm#id_76
I would like to extract some-other-text so basically, I come with the following regex:
/0-(.*)\.htm/
Unfortunately, this matches 1-0-some-other-text because regex are greedy. I can not succeed make it nongreedy using .*?, it just does not change anything as you can see here.
I also tried with the U modifier but it did not help.
Why the "nongreedy" tip does not work?
In case you need to get the closest match, you can make use of a tempered greedy token.
0-((?:(?!0-).)*)\.htm
See demo
The lazy version of your regex does not work because regex engine analyzes the string from left to right. It always gets leftmost position and checks if it can match. So, in your case, it found the first 0-and was happy with it. The laziness applies to the rightmost position. In your case, there is 1 possible rightmost position, so, lazy matching could not help achieve expected results.
You also can use
0-((?!.*?0-).*)\.htm
It will work if you have individual strings to extract the values from.
You want to exclude the 1-0? If so, you can use a non capturing group:
(?:1-0-)+(.*?)\.htm
Demo

Unable to get regex working

I have a third party application that lets me enter a regular expression to validate the user input in a text box. I need user to input the data in this format C:64GB, F:128GB H:32GB. This is the regex i wrote:
\b[A-z]{1}[:]?\d*(GB|gb)\b
This regex works fine but it only validates the first block. so if i write C:64GB F:128, it marks the input as valid. it does not check for F:128 as that makes the input invalid. it should be C:64GB F:128GB.
When I change my regex to ^\b[A-z]{1}[:]?\d*(GB|gb)\b$, it only allows me to enter C:64GB.
What am i doing wrong here?
You can use this regex:
^(\W*[A-Za-z]:?\d+(?:GB|gb)\W*)+$
You can use the i case insensitive flag to help simplify the call
/^([A-Z]:?\d+GB[\s,]*)+$/i
here's a demo on regex101.com
This will be quite permissive with whitespace/commas thanks to [\s,]*
You could use something like so: ^[A-Z]:?\d+(GB|gb),( [A-Z]:?\d+(GB|gb)){2}$. This will expect to match the entire pattern. You can see a working example here.
That's because the RegEx is eager. It will find the first match and then stop. You need to loop through all the matches or apply a Global modifier (which finds all the matches)
Your regex will be valid because it just look for at least 1 valid occurrence of the regex equivalent, and as you saw it, your first occurrence validating it, the regex is valid. If you want all your users inputs to be checked you should split your input string into several occurrences of the regex equivalent and check them one by one. Or do the equivalent with a regex, and that would give this :
^([A-Z]:?\d+[Gg][Bb] ?)+$
Side notes :
I removed the {1} after your [A-Z] because it's the regex default behavior,
I transformed your \b to ^ and $ because you need to control the full string and not part of it
I removed the [] around the : because it was useless (you want many values between only 1 value)
I added the space as a separator, but you can change it with whichever character pleases you
I replaced your (GB|gb) by a [Gg][Bb] so it will not be case sensitive on this piece of your user input

Is there any upper limit for number of groups used or the length of the regex in Notepad++?

I am new to using regex. I am trying to use the regex find and replace option in Notepad++.
I have used the following regex:
((?:)|(\+)|(-))(\d)((?:)|(\+)|(-))(/)((?:)|(\+)|(-))(\d)((?:)|(\+)|(-))
For the following text:
2/2
+2/+2
-2/-2
2+/2+
2-/2-
But I am able to get matches only for the first three. The last two, it only gives partial matches, excluding the last "+" and the "-". I am wondering if there is any upper limit for the number of groups (which i doubt is unlikely) that can be used or any upper limit for the maximum length of the regex. I am not sure why my regex is failing. Or if there is anything wrong with my regex, please correct it.
This is not an issue with Notepad++'s regex engine. The problem is that when you have alternations like (?:)|(\+)|(-), the regex engine will attempt to match the different options in the order they are specified. Since you specified an empty group first, it will attempt to match an empty string first, only matching the + or - if it needs to backtrack. This essentially makes the alternation lazy—it will never match any character unless it has to.
vks's answer works perfectly well, but just in case you actually needed those capturing groups separated out, you can do the same thing just by rewriting your alternations like this:
((\+)|(-)|(?:))(\d)((\+)|(-)|(?:))(/)((\+)|(-)|(?:))(\d)((\+)|(-)|(?:))
or even more simply, like this:
((\+)|(-)|)(\d)((\+)|(-)|)(/)((\+)|(-)|)(\d)((\+)|(-)|)
([-+]?)(\d)([-+]?)(/)([-+]?)(\d)([-+]?)
You can use this simple regex to match all cases.See here.
https://www.regex101.com/r/fG5pZ8/19

Simple regex for matching up to an optional character?

I'm sure this is a simple question for someone at ease with regular expressions:
I need to match everything up until the character #
I don't want the string following the # character, just the stuff before it, and the character itself should not be matched. This is the most important part, and what I'm mainly asking. As a second question, I would also like to know how to match the rest, after the # character. But not in the same expression, because I will need that in another context.
Here's an example string:
topics/install.xml#id_install
I want only topics/install.xml. And for the second question (separate expression) I want id_install
First expression:
^([^#]*)
Second expression:
#(.*)$
[a-zA-Z0-9]*[\#]
If your string contains any other special characters you need to add them into the first square bracket escaped.
I don't use C#, but i will assume that it uses pcre... if so,
"([^#]*)#.*"
with a call to 'match'. A call to 'search' does not need the trailing ".*"
The parens define the 'keep group'; the [^#] means any character that is not a '#'
You probably tried something like
"(.*)#.*"
and found that it fails when multiple '#' signs are present (keeping the leading '#'s)?
That is because ".*" is greedy, and will match as much as it can.
Your matcher should have a method that looks something like 'group(...)'. Most matchers
return the entire matched sequence as group(0), the first paren-matched group as group(1),
and so forth.
PCRE is so important i strongly encourage you to search for it on google, learn it, and always have it in your programming toolkit.
Use look ahead and look behind:
To get all characters up to, but not including the pound (#): .*?(?=\#)
To get all characters following, but not including the pound (#): (?<=\#).*
If you don't mind using groups, you can do it all in one shot:
(.*?)\#(.*) Your answers will be in group(1) and group(2). Notice the non-greedy construct, *?, which will attempt to match as little as possible instead of as much as possible.
If you want to allow for missing # section, use ([^\#]*)(?:\#(.*))?. It uses a non-collecting group to test the second half, and if it finds it, returns everything after the pound.
Honestly though, for you situation, it is probably easier to use the Split method provided in String.
More on lookahead and lookbehind
first:
/[^\#]*(?=\#)/ edit: is faster than /.*?(?=\#)/
second:
/(?<=\#).*/
For something like this in C# I would usually skip the regular expressions stuff altogether and do something like:
string[] split = exampleString.Split('#');
string firstString = split[0];
string secondString = split[1];