Regex to match characters to the right of a colon - regex

I'm stuck on a regex. I'm trying to match words in any language to the right of a colon without matching the colon itself.
The basic rule:
For a line to be valid, it must not begin with or contain any characters outside of [a-z0-9_] until after :.
Any characters to the right of : should match as long as the line begins with the set of characters defined above.
For instance, given a string such as these:
this string should not match
bob_1:Hi. I'm Bob. I speak русский and this string should match
alice:Hi Bob. I speak 한국어 and this string should also match
http://example.com - would prefer to not match URLs
This string:should not match because no spaces or capital letters are allowed left of the colon
Only 2 of the 5 strings above need to match. And only to the right of the colon.
Hi. I'm Bob. I speak русский and this string should match
Hi Bob. I speak 한국어 and this string should also match
I'm currently using (^[a-z0-9_]+(?=:)) to match characters to the left of :. I just can't seem to reverse the logic.
The closest I have at the moment is (?!(?!:)).+. This seems to match everything to right of the colon as well as the colon itself. I just can't figure out how to not include : in the match.
Can one of you regex wizards help me out? If anything is unclear please let me know.

Short regex pattern (case insensitive):
^\w+:(\w.*)
\w - matches any word character (equal to [a-zA-Z0-9_])
https://regex101.com/r/MZhqSL/6
As you marked pcre, here's the pattern you need (only to the right of the colon):
^\w+:\K\w.*
\K - resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
https://regex101.com/r/E1yHVY/1

You can use this regex:
^[a-z0-9_]+:\K(?!//).*
RegEx Demo
RegEx Breakup:
^: Start
[a-z0-9_]+: Match 1+ of [a-z0-9_] characters
:: Match a colon
\K: Reset matched info so far
(?!//): Negative lookahead to disallow // right after colon to avoid matching potential URLs
.*: Match anything until end

You can use the regex: ^.*?:(.*)$
^.*?: - from the beginning of the line, any character until the colon (non-greedy) included
(.*)$ - use a matching group to anything that follows it till the end of the line
Link to DEMO

Related

Regex match last occurrence of substring among the same substrings in the string

For example we have a string:
asd/asd/asd/asd/1#s_
I need to match this part: /asd/1#s_ or asd/1#s_
How is it possible to do with plain regex?
I've tried negative lookahead like this
But it didn't work
\/(?:.(?!\/))?(asd)(\/(([\W\d\w]){1,})|)$
it matches this '/asd/asd/asd/asd/asd/asd/1#s_'
from this 'prefix/asd/asd/asd/asd/asd/asd/1#s_'
and I need to match '/asd/1#s_' without all preceding /asd/'s
Match should work with plain regex
Without any helper functions of any programming language
https://regexr.com/
I use this site to check if regex matches or not
here's the possible strings:
prefix/asd/asd/asd/1#s
prefix/asd/asd/asd/1s#
prefix/asd/asd/asd/s1#
prefix/asd/asd/asd/s#1
prefix/asd/asd/asd/#1s
prefix/asd/asd/asd/#s1
and asd part could be replaced with any word like
prefix/a1sd/a1sd/a1sd/1#s
prefix/a1sd/a1sd/a1sd/1s#
...
So I need to match last repeating part with everything to the right
And everything to the right could be character, not character, digit, in any order
A more complicated string example:
prefix/a1sd/a1sd/a1sd/1s#/ds/dsse/a1sd/22$$#!/123/321/asd
this should match that part:
/a1sd/22$$#!/123/321/asd
If you want the match only, you can use \K to reset the match buffer right before the parts that you want to match:
^.*\K/a\d?sd/\S+
The pattern will match
^ Start of string
.* Match any char except a newline until end of the line
\K Forget what is matched until now
/a\d?sd/ match a, optional digits and sd between forward slashes
\S+ Match 1+ non whitespace chars
See a regex demo

Regex - Discard the entire string if any part of the string doesn't match the pattern

I have a comma separated string which I want to validate using a regex. What I have written is gives me a match if there a part wrong later in the string. I want to discard it completely if any part is wrong.
My regex : ^(?:[\w\.]+,{1}(?:STR|INT|REAL){1},{1}(\s*|$))+
Positive Case : Component,STR,YoungGenUse,STR,YoungGenMax,STR,OldGenUse,INT,OldGenMax,INT,PermGenUse,INT,PermGenMax,INT,MajCollCnt,INT,MinCollDur,REAL,MinCollCnt,INT,
Negative Case :
Component,STR,YoungGenUse,STR,YoungGenMax,TEST,OldGenUse,INT,OldGenMax,INT,PermGenUse,INT,PermGenMax,INT,MajCollCnt,INT,MinCollDur,REAL,MinCollCnt,INT,
For the second case, my regex gives a match for the bold portion eventhough, later there is an incorrect part (TEST). How can I modify my regex to discard the entire string?
The pattern that you tried would not match TEST in YoungGenMax,TEST because the alternatives STR|INT|REAL do not match it.
It would show until the last successful match in the repetition which would be Component,STR,YoungGenUse,STR,
You have to add the anchor at the end, outside of the repetition of the group, to indicate that the whole pattern should be followed by asserting the end of the string.
There are no spaces or dots in your string, so you might leave out \s* and use \w+ without the dot in the character class. Note that \s could also possibly match a newline.
^(?:\w+,(?:STR|INT|REAL),)+$
Regex demo
If you want to keep matching optional whitespace chars and the dot:
^(?:[\w.]+,(?:STR|INT|REAL),\s*)+$
Regex demo
Note that by repeating the group with the comma at the end, the string should always end with a comma. You can omit {1} from the pattern as it is superfluous.
your regex must keep matching until end of the string, so you must use $ to indicate end of the line:
^(?:[\w.]+,{1}(?:STR|INT|REAL){1},{1}(\s*|$))+$
Regex Demo

Match all characters after the last instance of a string in regex

I am looking to capture all characters after the last instance of a string in regex.
The string (that which we're searching after the last instance of) is as follows, sans quotes: " - ", or \b\s\-\s\b: boundary(whitespace character, preceded by -, preceded by whitespace character).
Test string as follows:
One Thing - Two Things - Three Things - Four Things
Desired match:
Four Things
This regex only matches everything after the first instance of the string:
(?<=\b\s\-\s\b)(.*)$
(Returns, sans quotes: "Two Things - Three Things - Four Things")
Whereas this matches everything after the last single character -:
[^\-]+$
(Returns, sans quotes: " Four Things")
Thoughts?
Try using a positive lookbehind then negating on the - delimiter and taking the last result
(?<=- )[^-]+$
https://regex101.com/r/sMX9FC/1
I think you could get your match without using lookarounds.
You could match any char except a newline from the start of the string followed by matching your pattern. That will match the last instance.
Then capture in a group matching 0+ times any char except a newline until the end of the string.
^.*\b\s\-\s\b(.*)$
^ Start of string
.* Match any char except a newline
\b\s\-\s\b\ Match your pattern
(.*) Capture in group 1 matching 0+ times any char except a newline
$ End of string
Regex demo
The is no tool or programming language listed, but if \K is supported to forget what was matched, you might also use:
^.*\b\s\-\s\b\K.*$
Regex demo
This matches the end of a string, everything that is not a - after a -.
-\s*([^-]+)$
It's the simplest regex I could think of.
.*(?<=\b\s\-\s\b)(.*)$, or putting a .* before your current regex should achieve what you're after, since that's a greedy match by default.

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

regex for a whole word containing dots within a sentence

I am looking for a regular expression to catch a whole word or expression within a sentence that contains dots:
this is an example test.abc.123 for what I am looking for
In this case i want to catch "test.abc.123"
I tried with this regex:
(.*)(\b.+\..++\b)(.*)
(.*) some signs or not
(\b.+\..++\b) a word containing some signs followed by at least on dot that is followed by some signs and this at least once
(.*) some more signs nor not#
but it gets me: "abc.123 for what I am looking for"
I see that I got something completely wrong, can anyone enlighten me?
If you need to match part of a string you don't need to match entire string (unless you are restricted by a functionality).
Your regex is so greedy. It also has dots every where (.+ is not a good choice most of the time). It doesn't have a precise point to start and finish either. You only need:
\w+(?:\.+\w+)+
It looks for strings that begin and end with word characters and contain at least a period. See live demo here
This regex pattern matches strings with two or more dots:
.*\..*\..*
"." matches any character except line-breaks
"*" repeats previous tokens 0 or more times
"." matches a single dot, slash is used for escape
.* Match any character and continue matching until next token
test.abc.123
(.) Match a single dot
test. abc.123
.* Again, any character and continue matching until next token
test.example.com
. Matches a single dot
test.example. com
.* Matches any character and continue matching until next token
test.example.com
Try this pattern: (?=\w+\.{1,})[^ ]+.
Details: (?=\w+\.{1,}) - positive lookahead to locate starting of a word with at least one dot (.). Then, start matching from that position, until space with this pattern [^ ]+.
Demo