How do I exclude a word from a regular expression search? - regex

How I can create a regular expression for the following problem:
I have a string,
name1=value1;name2=value2;.....;
Somewhere, there exists a pair,
"begin=10072011;"
I need, with regular expressions, to parse from the string all name=value; pairs, where the value is a number. However, I want to ignore the name begin
Currently I have the following regexp:
([\\w]+)=([\\d]+);
Mine selects the begin name. How can I change it to not include begin?

(?!begin)\b(\w+)=(\d+);
This uses negative lookahead, so it will not match if the string starts with "begin". The \b is necessary so that the regex does not just skip the "b" and match "egin=...".
Note that when describing a regex you should only using a single backslash for escapes, although for some languages you will need to use double backslashes to escape the backslash.

This should do it:
\b(?!begin=)(\w+)=(\d+)\b
As aC++ string literal it would look like this:
"\\b(?!begin=)(\\w+)=(\\d+)\\b"
\b is a word boundary; you use it to make sure you're matching a whole word (as "word" is defined in the context of regexes; read that page carefully). For example, without the first \b the regex would correctly fail to match
begin=1234 // OK
...but then it would skip ahead one position and match:
egin=1234 // oops!

I think (?<=begin=)\d+(?=;) will be a better choice.
If you keep all the information in XML format, the work will be much easier than now.

Related

Allowing words picked up in regex in certain cases only

I have a regex expression to look for people just sticking "N/A" or similar into a form field.
^(?!(\b(N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)\b))
Probably not the most elegant I am sure. However I cannot for the life of me get it to allow the above words if followed by something.
So if someone just types "yes" then I want it to fail the regex check. But if someone types "yes, I have blah blah etc etc" I want it to pass.
The expression I have allows the word to be used as long as it isn't the first word in the sentence. I just want to disallow the listed words as the ONLY words in the field.
Any ideas?
Thanks
You may remove the first \b (it is redundant between the start of string and a word char) and replace the second one with $ (end of string):
^(?!(?:N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)$)
See the regex demo
With a case insensitive option, you may reduce the pattern to
^(?!(?:n/?a|yes|no)$)
See another regex demo
Details
^ - start of string, then...
(?!(?:n/?a|yes|no)$) - a location in string that is not immediately followed with n/?a (na, n/a), yes or no that are followed with the end of string.
In human words, only the start of string is matched if the whole string is not equal to the alternatives inside the alternation group.
The easiest way would be to match all the forbidden strings exactly and invert the result.
Try ^(n/?a|yes|no)$ with a case-insensitive option and invert the result.
^ matches the beginning of the string. $ matches the end of the string.
When you don't have a case-insensitive option, use ^([nN]/?[aA]|[yY][eE][sS]|[nN][oO])$.

Regex: ignore characters that follow

I'd like to know how can I ignore characters that follows a particular pattern in a Regex.
I tried with positive lookaheads but they do not work as they preserves those character for other matches, while I want them to be just... discarded.
For example, a part of my regex is: (?<DoubleQ>\"\".*?\"\")|(?<SingleQ>\".*?\")
in order to match some "key-parts" of this string:
This is a ""sample text"" just for "testing purposes": not to be used anywhere else.
I want to capture the entire ""sample text"", but then I want to "extract" only sample text and the same with testing purposes. That is, I want the group to match to be ""sample text"", but then I want the full match to be sample text. I partially achieved that with the use of the \K option:
(?<DoubleQ>\"\"\K.*?\"\")|(?<SingleQ>\"\K.*?\")
Which ignores the first "" (or ") from the full match but takes it into account when matching the group. How can I ignore the following "" (")?
Note: positive lookahead does not work: it does not ignore characters from the following matches, it just does not include them in the current match.
Thanks a lot.
I hope I got your questions right. So you want to match the whole string including the quotes, but you want to replace/extract it only the expression without the quotes, right?
You typically can use the regex replace functionality to extract just a part of the match.
This is the regex expression:
""?(.*?)""?
And this the replace expression:
$1

How do I match a certain string that does not contain a certain word?

I have a pretty specific set of requirements defining which strings I want to match, and I have the following working regex:
/^#\s*([-a-zA-Z]+)(?=\s|$)/
This matches: '# keyword' ... As well as: '# Static keyword'
For my final condition, I want to ignore a string if it contains the word: "Static".. I've done a lot of digging, and I can't figure this one out.
The following is my best attempt:
/^#\s*(?!Static)([-a-zA-Z]+)(?=\s|$)/
However, it seems as though I'm woefully far from the solution.
You need to look for Static in more places than just right after # and whitespace:
/^#\s*(?!.*Static)([-a-zA-Z]+)(?=\s|$)/
By the way, you might want to replace (?=\s|$) with \b (a word boundary anchor that matches after an alphanumeric word). That would also match if punctuation or something other than whitespace delimits the word you're matching.
You can use this:
/^(?>[^S]++|S++(?!tatic\b))+$/
Or this that allows "electroStatic":
/^(?>[^S]++|\BS++|\bS(?!tatic\b))+$/

Regex for deleting characters before a certain character?

I'm very new at regex, and to be completely honest it confounds me. I need to grab the string after a certain character is reached in said string. I figured the easiest way to do this would be using regex, however like I said I'm very new to it. Can anyone help me with this or point me in the right direction?
For instance:
I need to check the string "23444:thisstring" and save "thisstring" to a new string.
If this is your string:
I'm very new at regex, and to be completely honest it confounds me
and you want to grab everything after the first "c", then this regular expression will work:
/c(.*)/s
It will return this match in the first matched group:
"ompletely honest it confounds me"
Try it at the regex tester here: regex tester
Explanation:
The c is the character you are looking for
.* (in combination with /s) matches everything left
(.*) captures what .* matched, making it available in $1 and returned in list context.
Regex for deleting characters before a certain character!
You can use lookahead like this
.*(?=x)
where x is a particular character or word or string.{using characters like .,$,^,*,+ have special meaning in regex so don't forget to escape when using it within x}
EDIT
for your sample string it would be
.*(?=thisstring)
.* matches 0 to many characters till thisisstring
Here is a one-line solution for matching everything after "before"
print $1."\n" if "beforeafter" =~ m/before(.*)/;
Edit:
While using lookbehind is possible, it's not required. Grouping provides an easier solution.
To get the string before : in your example, you have to use [^:][^:]*:\(.*\). Notice that you should have at least one [^:] followed by any number of [^:]s followed by an actual :, the character you are searching for.

Detecting specific string whether in start, middle or end of a string with Regular Expressions

I've been reading some Q&A about regular expressions but I haven't found that answer my question. I'll be using ra as the searched string.
My problem is that I want to find the string 'ra' in any string, 'ra' will be replaced with 'RA', but the thing is that I just want to replace 'ra' as long is not part of any other word, for example: order_ra replaced to order RA but camera cannot be replaced with cameRA.
I tried all ready with [\s|_]ra(?:[\s|_]) and does not work, because is looking for anything like order_ra or order ra with an space at the end. I would like to match order ra or order_ra either it has a white space after it or not. Can anyone help me on this? I'm not too literate with regular expressions.
The reason I'm needing this is because I want to capitalize 'ra' dynamically in a string sent by a user interaction but not if belong to a word like came*ra* or *ra*dical. I don't know if I explain myself clearly, excuse me if I'm not.
Usually, you would use word boundaries: \bra\b only matches ra on its own, not inside a word. Unfortunately, the underscore is treated as an alphanumeric character, so index_ra would not be matched.
Therefore you need to implement this yourself. Assuming that your regex dialext supports Unicode and lookaround assertions, use
(?<!\p{L})foo(?!\p{L})
This matches foo, but not foobar or bazfoo:
(?<!\p{L}) # Assert that there is no letter before the current position
foo # Match foo
(?!\p{L}) # Assert that there is no letter after the current position
If you can't use Unicode character classes, try this:
(?<![^\W\d_])foo(?![^\W\d_])
This is a bit contorted logic (triple negative for teh win!): [^\W\d_] matches a letter (= a character that is not a non-alphanumeric character and not a digit or underscore), so the negative lookaround assertions make sure that there are no letters around the search string ("not a not a (non-alphanumeric or digit or underscore)"). Twisted but necessary since we also want start and end of the string match here.
If I understand what you are looking for, the following will perform the match. The non-capturing group is specified in the parens with (?:...). It is similar to the OP but also includes beginning and end-of-line anchors.
(?:^|\s|_)ra(?:$|\s|_)