Regex. Replace particular characters in string - regex

In the following string i need to replace (with Regex only) all _ with the dots, except ones that are surrounded by digits. So this:
_this_is_a_2_2_replacement_
Should become
.this.is.a.2_2.replacement.
Tried lots of things. That's where i got so far:
([a-z]*(_)[a-z]*(_))*(?=\d_\d)...(_)\w*(_)
But it obviously doesn't work.

Try finding the following regex pattern:
(?<=\D)_|_(?=\D)
And then just replace that with dot. The logic here is that a replacement happens whenever an underscore which has at least one non digit on either side gets replaced with a dot. The regex pattern I used here asserts precisely this:
(?<=\D)_ an underscore preceded by a non digit
| OR
_(?=\D) an underscore followed by a non digit
Demo

If you are using PCRE, you could assert a digit on the left of the underscore and match a digit after. Then use make use of SKIP FAIL.
In the replacement use a dot:
(?<=\d)_\d(*SKIP)(*FAIL)|_
(?<=\d) Positive lookbehind, assert what is on the left is a digit
\d(*SKIP)(*FAIL) Consume the digit which should not be part of the match result
| Or
_ Match a single underscore
Regex demo

Related

Regex match last occurrence of substring among the same substrings in the string

For example we have a string:
asd/asd/asd/asd/1#s_
I need to match this part: /asd/1#s_ or asd/1#s_
How is it possible to do with plain regex?
I've tried negative lookahead like this
But it didn't work
\/(?:.(?!\/))?(asd)(\/(([\W\d\w]){1,})|)$
it matches this '/asd/asd/asd/asd/asd/asd/1#s_'
from this 'prefix/asd/asd/asd/asd/asd/asd/1#s_'
and I need to match '/asd/1#s_' without all preceding /asd/'s
Match should work with plain regex
Without any helper functions of any programming language
https://regexr.com/
I use this site to check if regex matches or not
here's the possible strings:
prefix/asd/asd/asd/1#s
prefix/asd/asd/asd/1s#
prefix/asd/asd/asd/s1#
prefix/asd/asd/asd/s#1
prefix/asd/asd/asd/#1s
prefix/asd/asd/asd/#s1
and asd part could be replaced with any word like
prefix/a1sd/a1sd/a1sd/1#s
prefix/a1sd/a1sd/a1sd/1s#
...
So I need to match last repeating part with everything to the right
And everything to the right could be character, not character, digit, in any order
A more complicated string example:
prefix/a1sd/a1sd/a1sd/1s#/ds/dsse/a1sd/22$$#!/123/321/asd
this should match that part:
/a1sd/22$$#!/123/321/asd
If you want the match only, you can use \K to reset the match buffer right before the parts that you want to match:
^.*\K/a\d?sd/\S+
The pattern will match
^ Start of string
.* Match any char except a newline until end of the line
\K Forget what is matched until now
/a\d?sd/ match a, optional digits and sd between forward slashes
\S+ Match 1+ non whitespace chars
See a regex demo

This regex to match a word surrounded by {} does not work

So here's my regex to match a word after "define" or "define:"
((?<=define |define: )\w+)
That part works well and all. But when I add the part where it also should match word between {} if it can, it matches everything.
((?<=define |define: )\w+)|([^{][A-Z]+[^}])
The regex with the examples
The thing that I noticed is that when I add ^ at first [{] then it ruins everything and I don't understand why.
Why does using [^{] not work?
By using [^{], your regex becomes:
[^{][A-Z]+[^}]
In words, this translates to:
character that's not a {
a bunch of letters
character that's not a }
Note how nothing in your regex enforces the idea that the "a bunch of letters" part has to be between {}s. It just says that it has to be after a character that is not {, and before a character that is not }. By this logic, even something like ABC would match because A is not {, B is the bunch of letters, and C is not }.
How to match a word between {}?
You can use this regex:
{([A-Z]+)}
And get group 1.
I don't think that you should combine this with the regex that matches a word after define. You should use 2 separate regexes because these are two completely different things.
So split it into two regexes:
(?<=define |define: )\w+
and
{([A-Z]+)}
You are using negated character classes the way we would use positive lookbehind (?<=) and positive lookahead (?=). They are fundamentally different and, as opposed to lookbehind or lookahead, character classes consume characters.
Hence:
[^{][A-Z] matches a capital letter that is preceded by a character other than {.
[A-Z][^}] matches a capital letter that is followed by a character other than }.
So if you try to match the letters in {OO} with the regex [^{][A-Z]+[^}], it is totally normal that your regex won't match anything because you have two letters, one preceded by a {, the other followed by a }.

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

RegEx negative lookahead on pattern

I want to find all expressions that don't end with ":"
I tried to do it like that:
[a-z]{2,}(?!:)
On this text:
foobar foobaz:
foobaz
foobaz:
The problem is, that it just takes away the last character befor the ":" and not the whole match.
Here is the example: https://regex101.com/r/jtLRvz/1
How can I get the negative lookahead work for the whole regular expression?
When [a-z]{2,}(?!:) matches baz:, [a-z]{2,} grabs 2 or more lowercase ASCII letters at once (baz) and the negative lookahead (?!:) checks the char immediately to the right. It is :, so the engine asks itself if there is a way to match the string in a different way. Since {2,} can match two chars, not currently matched three, it backtracks, and finds a valid match.
Add a-z to the lookahead pattern to make sure the char right after 2 or more lowercase ASCII letters is not a letter and not a colon:
[a-z]{2,}(?![a-z:])
^^^
See the regex demo
If your regex engine supports possessive modifiers, or atomic groups, you may use them to prevent backtracking into the [a-z]{2,} subpattern:
[a-z]{2,}+(?!:)
(?>[a-z]{2,})(?!:)
See another regex demo.

Regex pattern to match string that's not followed by a colon

Using regex, I'm trying to match any string of characters that meets the following conditions (in the order displayed):
Contains a dollar sign $; then
at least one letter [a-zA-Z]; then
zero or more letters, numbers, underscores, periods (dots), opening brackets, and/or closing brackets [a-zA-Z0-9_.\[\]]*; then
one pipe character |; then
one at sign #; then
at least one letter [a-zA-Z]; then
zero or more letters, numbers, and/or underscores [a-zA-Z0-9_]*; then
zero colons :
In other words, if a colon is found at the end of the string, then it should not count as a match.
Here are some examples of valid matches:
$tmp1|#hello
$x2.h|#hi_th3re
Valid match$here|#in_the middle of other characters
And here are some examples of invalid matches:
$tmp2|#not_a_match:"because there is a colon"
$c.4a|#also_no_match:
Here are some of the patterns I've tried:
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]*)(\|#)([a-zA-Z][a-zA-Z0-9_]*(?!.[:]))
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*(?![:]))
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*)([^:])
This pattern will do what you need
\$[A-Za-z]+[\w.\[\]]*[|]#[A-Za-z]+[\w]*+(?!:)
Regex Demo
I am using possessive quantifiers to cut down the backtracking using [\w]*+. You can also use atomic groups instead of possessive quantifiers like
\$[A-Za-z]+[\w.\[\]]*[|]#[A-Za-z]+(?>[\w]*)(?!:)
NOTE
\w => [A-Za-z0-9_]
I tested your third pattern in Regex 101 and it appears to be working correctly:
^.*(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*)([^:]).*$
The only change I needed to make to the regex to make it work was to add anchors ^ and $ to the start and end of the regex. I also allowed for your pattern to occur as a substring in the middle of a larger string.
By the way, you had the following example as a string which should not match:
$tmp2|#not_a_match:"because there is a colon"
However, even if we remove the colon from this string it will still not match because it contains quotes which are not allowed.
Regex101