Regular expression for a-b, a-c but not a-a?

Regular expression for a-b, a-c but not a-a? - regex

I try to find method definitions except constructors.
To simplify Im looking for abc::def, foo::bar but not foo::foo
I already know how to write an expression like so:
\w[\w\d_]+::\w[\w\d_]+
But how to make sure the left part of the :: does not match the right part?
By the way, I cannot check if there is a type definition left of the qualified method name. I have a very old project where it was fine to not specify a type if it was int.

Note that \w already matches \d and _ and \w[\w\d_]+ = \w{2,}.
You can capture the first "word" (before ::) and check with a negative lookahead that the "word" after :: is not equal to it:
\b(\w+)::(?!\b\1\b)\w+\b
See the regex demo
Explanation:
\b - leading word boundary
(\w+) - Group 1: one or more alphanumeric and underscore characters
:: - 2 consecutive colons
(?!\b\1\b) - the next "word" cannot be the same as the value in Group 1
\w+\b - one or more alphanumeric and underscore characters followed with a trailing word boundary.
If you are not looking to match 1-character "words", you can use
\b(\w{2,})::(?!\b\1\b)\w{2,}\b

You can capture first part and check if it's repeated using back-referencing like this.
Regex: \b(\w[\w\d_]+)::(?!\1)\w[\w\d_]+
Explanation:
\b(\w[\w\d_]+) matches the first part.
(?!\1) negative lookahead for first part. If repeated whole match will be discarded.
\w[\w\d_]+ If not repeated then this part will match.
Regex101 Demo

Related

Nesting capture groups

I have the following strings:
'TwoOrMoreDimensions'
'LookLikeVectors'
'RecentVersions'
'= getColSums'
'=getColSums'
I would like to capture all occurrences of an uppercase letter that is preceded by a lowercase letter in all strings but the last two.
I can use ([a-z]+)([A-Z]) to capture all such occurrences but I don't know how to exclude matches from the last two strings.
The last two strings can be excluded using the negative lookahead ^(?!>\s|\=) - is it possible to combine this with the expression above?
I tried ^(?!>\s|\=)(([a-z]+)([A-Z])) but it doesn't yield any matches. I'm not sure why because ^(?!>\s|\=)(.+) captures all characters after the start of the matching string as a group. So why can't this capture group be further divided into group 2 ([a-z]+) and group 3 ([A-Z])?
Link to tester

The issue with your current regex is that the ^ anchors it to the start of string, so it can only match a sequence of lower case letters followed by an upper case letter at the start of the string, and none of your strings have that.
One way to do what you want is to use the \G anchor, which forces the current match to start where the previous one ended. That can be used in an alternation with ^(?!=) which will match any string which doesn't start with an = sign, and then a negated character class ([^a-z]) to skip any non-lower case characters:
(?:^(?!=)|\G)[^a-z]*(([a-z]+)([A-Z]))
This will give the same capture groups as your original regex.
Demo on regex101

Another solution (may not be the most efficient but meets the task) would be (?:^=\s*\w*)|([a-z]+)([A-Z])
This essentially forces the regex to greedily consume everything (in a non-capturing group, although is considered for full match) if it begins with =, leaving nothing for the next capture groups.
Regex101 Demo Link

Regex. Replace particular characters in string

In the following string i need to replace (with Regex only) all _ with the dots, except ones that are surrounded by digits. So this:
_this_is_a_2_2_replacement_
Should become
.this.is.a.2_2.replacement.
Tried lots of things. That's where i got so far:
([a-z]*(_)[a-z]*(_))*(?=\d_\d)...(_)\w*(_)
But it obviously doesn't work.

Try finding the following regex pattern:
(?<=\D)_|_(?=\D)
And then just replace that with dot. The logic here is that a replacement happens whenever an underscore which has at least one non digit on either side gets replaced with a dot. The regex pattern I used here asserts precisely this:
(?<=\D)_ an underscore preceded by a non digit
| OR
_(?=\D) an underscore followed by a non digit
Demo

If you are using PCRE, you could assert a digit on the left of the underscore and match a digit after. Then use make use of SKIP FAIL.
In the replacement use a dot:
(?<=\d)_\d(*SKIP)(*FAIL)|_
(?<=\d) Positive lookbehind, assert what is on the left is a digit
\d(*SKIP)(*FAIL) Consume the digit which should not be part of the match result
| Or
_ Match a single underscore
Regex demo

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)

Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.

The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line

If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo

The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

how to get sub-string using regex if I specify start and end, without start characters?

I have string like this:
12abcc?p_auth=123ABC&ABC&s
Start of symbol is "p_auth=" and end of string first "&" symbol.
P.S symbol '&' and 'p_auth=' must not be included.
I have wrote that regex:
(p_auth).+?(?=&)
Ok, thats works well, it gets that sub-string:
p_auth=123ABC
bot how to get string without 'p_auth'?

Use look-arounds:
(?<=p_auth=).*?(?=&)
See regex demo
The look-behind (?<=p_auth=) and the look-ahead (?=&) do not consume characters as they are zero-width assertions. They just check for the substring presence either before or after a certain subpattern.
A couple more words about (?<=p_auth=). It is a positive look-behind. Positive because it require a pattern inside it to appear on the left, before the "main" subpattern. If the look-behind subpattern is found, the result is just "true" and the regex goes on checking the rest of subpatterns. If not, the match is failed, the engine goes on looking for another match at the next index.
Here is some description from regular-expressions.info:
It [the look-behind] tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a "b" that is not preceded by an "a", using negative lookbehind. It doesn't match cab, but matches the b (and only the b) in bed or debt. (?<=a)b (positive lookbehind) matches the b (and only the b) in cab, but does not match bed or debt.
In most cases, you do not really need look-arounds. In this case, you could just use a
p_auth(.*?)&
And get the first capturing group value.
The .*? pattern will look for any number of characters other than a newline, but as few as possible that are required to find a match. It is called lazy dot matching, because the ? symbol makes the * quantifier stop before the first symbol that is matched by the subsequent subpattern in the regular expression.
The .*& would match all the substring until the last & because * quantifier is greedy - it will consume as many characters it can match as possible.
See more at Repetition with Star and Plus regular-expressions.info page.

p_auth(.+?)(?=&)
Simply use this and grab the group 1 or capture 1.

Regular expression to match non-integer values in a string

I want to match the following rules:
One dash is allowed at the start of a number.
Only values between 0 and 9 should be allowed.
I currently have the following regex pattern, I'm matching the inverse so that I can thrown an exception upon finding a match that doesn't follow the rules:
[^-0-9]
The downside to this pattern is that it works for all cases except a hyphen in the middle of the String will still pass. For example:
"-2304923" is allowed correctly but "9234-342" is also allowed and shouldn't be.
Please let me know what I can do to specify the first character as [^-0-9] and the rest as [^0-9]. Thanks!

This regex will work for you:
^-?\d+$
Explanation: start the string ^, then - but optional (?), the digit \d repeated few times (+), and string must finish here $.

You can do this:
(?:^|\s)(-?\d+)(?:["'\s]|$)
^^^^^ non capturing group for start of line or space
^^^^^ capture number
^^^^^^^^^ non capturing group for end of line, space or quote
See it work
This will capture all strings of numbers in a line with an optional hyphen in front.
-2304923" "9234-342" 1234 -1234
++++++++ captured
^^^^^^^^ NOT captured
++++ captured
+++++ captured

I don't understand how your pattern - [^-0-9] is matching those strings you are talking about. That pattern is just the opposite of what you want. You have simply negated the character class by using caret(^) at the beginning. So, this pattern would match anything except the hyphen and the digits.
Anyways, for your requirement, first you need to match one hyphen at the beginning. So, just keep it outside the character class. And then to match any number of digits later on, you can use [0-9]+ or \d+.
So, your pattern to match the required format should be:
-[0-9]+ // or -\d+
The above regex is used to find the pattern in some large string. If you want the entire string to match this pattern, then you can add anchors at the ends of the regex: -
^-[0-9]+$

For a regular expression like this, it's sometimes helpful to think of it in terms of two cases.
Is the first character messed up somehow?
If not, are any of the other characters messed up somehow?
Combine these with |
(^[^-0-9]|^.+?[^0-9])

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression for a-b, a-c but not a-a? - regex

Related

Nesting capture groups

Regex. Replace particular characters in string

How to create proper regular expression to find last character which I want to?

how to get sub-string using regex if I specify start and end, without start characters?

Regular expression to match non-integer values in a string

Categories

Resources