Regular expression using non-greedy matching -- confusing result

Regular expression using non-greedy matching -- confusing result - regex

I thought I understood how the non-greedy modifier works, but am confused by the following result:
Regular Expression: (,\S+?)_sys$
Test String: abc,def,ghi,jkl_sys
Desired result: ,jkl_sys <- last field including comma
Actual result: ,def,ghi,jkl_sys
Use case is that I have a comma separated string whose last field will end in "_sys" (e.g. ,sometext_sys). I want to match only the last field and only if it ends with _sys.
I am using the non-greedy (?) modifier to return the shortest possible match (only the last field including the comma), but it returns all but the first field (i.e. the longest match).
What am I missing?
I used https://regex101.com/ to test, in case you want to see a live example.

You can use
,[^,]+_sys$
The pattern matches:
, Match the last comma
[^,]+ Match 1 + occurrences of any char except ,
_sys Match literally
$ End of string
See a regex demo.
If you don't want to match newlines and whitespaces:
,[^\s,]+_sys$

It sounds like you're looking for the a string that ends with "_sys" and it has to be at the end of the source string, and it has to be preceded by a comma.
,\s*(\w+_sys)$
I added the \s* to allow for optional whitespace after the comma.
No non-greedy modifiers necessary.
The parens are around \w+_sys so you can capture just that string, without the comma and optional whitespace.

Related

Regex expression to ignore first and last character

So I am trying to make a regex match for strings of the form:
"catalog.schema.'tablename'" .
The output I am looking for is just catalog.schema.'tablename' leaving out the quotes at the end position.
Can anyone help me out
I tried to do it with the expression
/(?!^|.$)+[^\s]/ which leaves out the end quotes but matches each character.
So I modified it to /(?!^|.$)+[^\s]+/g . This matches the whole sentence but doesn't ignore the end quote.

Depends on the data arround your string and quotationmarks may be within the string.
Why not just this: "(.*?)"
https://regex101.com/r/oaS8o0/1
To answer the question in the title you might simply use:
^.(.*)?.$
https://regex101.com/r/FxJgtW/1

You can just use
(?<=.).+(?=.)
Or, if you cannot use lookbehind:
(?!^).+(?!$)
See the regex demo #1 and regex demo #2.
Since . matches any char other than line break chars, the patterns just match any strings without their start and end chars.

If you don't want to match the first and the last character, you can just use a capture group instead of lookarounds and use the group 1 value.
The first . matches the first of (any) characters, the (.+) is a capture group that matches 1 or more characters, and the . at the end matches the last character of the string.
.(.+).
Regex demo
Or to get the text between the double quotes at the start and the end of the string using a negated character class and a capture group:
^"([^"]+)"$
Regex demo

Notepad++ regex extract two options

I've a list below:
7080508136242611718:7080508978035787525:7549dda86ba9af19:31050:install_id=7080508978035787525; store-country-code=us; store-idc=useast5; ttreq=1$fd2f36282a10633c5638a02cc54c19ff13f60755; passport_csrf_token=13bf74c4e5fe04307f0a99de9aed53f9; passport_csrf_token_default=13bf74c4e5fe04307f0a99de9aed53f9; odin_tt=11ed1b48fba2d7a9fe3d86929b3d52cebbad0ca7f7dbd127e220cfb3be279621ba04487517b536050a6ded9fbe50e300cd11615e2e9551523478e5484896a9dda800e55e428842872fcf862e8c57d439:1648559503:351451268482810:3f:49:8c:b7:8c:cb:c5379d41-6cf3-4152-9d48-7aa45f7f611c:79375640-197c-4aaa-86cf-4ef8e7238be2:1:AgICAw0AFockF-RPsNA-7qeIMtk5-CKdkW2eP4TZYMDY7A
7080507996291827206:7080508977079666438:6742591cc0d20580:31050:install_id=7080508977079666438; store-country-code=us; store-idc=useast5; ttreq=1$a119611bfe79541b0b4c029fe910b6507123eec2; passport_csrf_token=fb42bbd472462c17f45acb531deb057a; passport_csrf_token_default=fb42bbd472462c17f45acb531deb057a; odin_tt=6c3b06ff01fd67f42e3dccb60a1e69ca67cb8654f49662017acc209f7176517bcd13a374311f7a1b3538e6407fb237267abf43578d3180d8c834e7df886fa4377a9b950dbb6ff146e3fabf37158dcfa8:1648559508:351451233766930:dd:9e:82:59:5f:7f:596da881-89e8-4f60-b644-5fef23f0a422:f04adc87-56de-4191-a25f-843bec1d5818:1:AgICAw0AFockF-RPsNA-7qeIMtk5-CKdsYPWv4TZYMDY7A
7080509102451394054:7080509820378072837:e36dc9aceecfc1cc:31050:install_id=7080509820378072837; store-country-code=us; store-idc=useast5; ttreq=1$d94700921d5ee2b21992910a2a4e84dd0ade1ec8; passport_csrf_token=2d4f4eca772dbfcbb37548ff02da3166; passport_csrf_token_default=2d4f4eca772dbfcbb37548ff02da3166; odin_tt=53d6999ebe29c0d5144a9669331ce3307a290891370914dabadbfa0520114e6e76b9103c9a6db5476e139251ee478f3a305577a89e3fa07288b7aca00774d3fccbd03566687dbcfdce31700065295939:1648559700:351451299637010:71:de:41:2b:ad:b4:1eba1ae9-3216-40e1-be7f-00303e524c27:2713cbd3-7a4f-493e-b76f-ac6d56ab8045:5:AgMNAgIAhyQWF-RPsNA-7qeIMtk5-CKcsBcWP4TZYMDY7w
7080509086894851590:7080509909225604870:98be64e38551984d:31050:install_id=7080509909225604870; store-country-code=us; store-idc=useast5; ttreq=1$05929375d8605739d8ebdbb5ce15eb406da5c467; passport_csrf_token=c95c71ad206a1d371e5b67505ae25be8; passport_csrf_token_default=c95c71ad206a1d371e5b67505ae25be8; odin_tt=6ddaa02f6133e61a4c591ef2a872f0ec2339d8b6a3fc480575fe279b13ded615e1fa7de979e18565f3ac8b8229a19a98bdf79aa1804071dcc025e1a4cd5314522cf40a62ca961770baea1d5d653d6d64:1648559720:351451292934660:9d:cf:c3:92:f6:f5:787dfb42-f4bf-43fa-9c64-ded19a1b1660:366c3024-217d-4f85-90dd-d95a0fd3e296:4:AgICAw0AFockF-RPsNA-7qeIMtk5-CKcs7bUP4TZYMDY7w
7080509183397299718:7080509974838085382:f39db5d314071713:31050:install_id=7080509974838085382; store-country-code=us; store-idc=useast5; ttreq=1$561ee2083cb13f0849a9f09e7f89edfe08c7ce6c; passport_csrf_token=721a8fee6f4f97c16ed1923ad3bbc72d; passport_csrf_token_default=721a8fee6f4f97c16ed1923ad3bbc72d;
I'd like to extract first two options aka below:
7080508136242611718:7080508978035787525
7080507996291827206:7080508977079666438
7080509102451394054:7080509820378072837
7080509086894851590:7080509909225604870
7080509183397299718:7080509974838085382
I've tried: *.: but its remove the reset of text. and keeps only first.
I've tried ^.*[0-9]+.*$ to get the second one. but no success.
Hopefully somebody can help me with accurate regex.
Thank you in advance.

This pattern *.: by itself is not a valid regex, and this pattern ^.*[0-9]+.*$ matches the whole string with at least a single digit.
If you want to match the digits and : you could make use of \K to forget what is matched so far and then match the rest of the line.
In the replacement use an empty string.
^\d+:\d+\K.*
^ Start of string
\d+:\d+ Match 1+ digits with : in between
\K.* Clear the current match, and match the rest of the line
Regex demo

^[^:]*:[^:]*\K.*
When matching things with delimiters I will use a negated character set to match the contents. In this case, the delimiter is a colon, so I want to match everything that isn't a colon until there's a colon. Then I want to match everything that isn't a colon. This will match everything up until the second colon. Because I want to keep what I just matched, I am using .* after \K, which resets the match at that point and matches everything else.
That pattern can be replaced with nothing, and the result is the first two columns of each line left.

You can use
Find: ^(\d+:\d+).*
Replace: $1
See this regex demo online.
The ^(\d+:\d+).* regex matches and captures into Group 1 one or more digits + : + one or more digits (with (\d+:\d+)) at the beginning of a line (^) and then matches the rest of the line (with .*).
The $1 replacement replaces the match with the Group 1 value.
See the demo and settings screenshot:
As an alternative, if there are chars other than digits you can also use
^([^:\v]+:[^:\v]+).*
where [^:\v]+ matches one or more chars other than a comma and any vertical whitespace.

Regex - Discard the entire string if any part of the string doesn't match the pattern

I have a comma separated string which I want to validate using a regex. What I have written is gives me a match if there a part wrong later in the string. I want to discard it completely if any part is wrong.
My regex : ^(?:[\w\.]+,{1}(?:STR|INT|REAL){1},{1}(\s*|$))+
Positive Case : Component,STR,YoungGenUse,STR,YoungGenMax,STR,OldGenUse,INT,OldGenMax,INT,PermGenUse,INT,PermGenMax,INT,MajCollCnt,INT,MinCollDur,REAL,MinCollCnt,INT,
Negative Case :
Component,STR,YoungGenUse,STR,YoungGenMax,TEST,OldGenUse,INT,OldGenMax,INT,PermGenUse,INT,PermGenMax,INT,MajCollCnt,INT,MinCollDur,REAL,MinCollCnt,INT,
For the second case, my regex gives a match for the bold portion eventhough, later there is an incorrect part (TEST). How can I modify my regex to discard the entire string?

The pattern that you tried would not match TEST in YoungGenMax,TEST because the alternatives STR|INT|REAL do not match it.
It would show until the last successful match in the repetition which would be Component,STR,YoungGenUse,STR,
You have to add the anchor at the end, outside of the repetition of the group, to indicate that the whole pattern should be followed by asserting the end of the string.
There are no spaces or dots in your string, so you might leave out \s* and use \w+ without the dot in the character class. Note that \s could also possibly match a newline.
^(?:\w+,(?:STR|INT|REAL),)+$
Regex demo
If you want to keep matching optional whitespace chars and the dot:
^(?:[\w.]+,(?:STR|INT|REAL),\s*)+$
Regex demo
Note that by repeating the group with the comma at the end, the string should always end with a comma. You can omit {1} from the pattern as it is superfluous.

your regex must keep matching until end of the string, so you must use $ to indicate end of the line:
^(?:[\w.]+,{1}(?:STR|INT|REAL){1},{1}(\s*|$))+$
Regex Demo

Regex in middle of text doesn't match

I have a regex to find url's in text:
^(?!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}?$
However it fails when it is surrounded by text:
https://regex101.com/r/0vZy6h/1
I can't seem to grasp why it's not working.

Possible reasons why the pattern does not work:
^ and $ make it match the entire string
(?!:\/\/) is a negative lookahead that fails the match if, immediately to the right of the current location, there is :// substring. But [a-zA-Z0-9-_]+ means there can't be any ://, so, you most probably wanted to fail the match if :// is present to the left of the current location, i.e. you want a negative lookbehind, (?<!:\/\/).
[a-zA-Z]{2,11}? - matches 2 chars only if $ is removed since the {2,11}? is a lazy quantifier and when such a pattern is at the end of the pattern it will always match the minimum char amount, here, 2.
Use
(?<!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}
See the regex demo. Add \b word boundaries if you need to match the substrings as whole words.
Note in Python regex there is no need to escape /, you may replace (?<!:\/\/) with (?<!://).

The spaces are not being matched. Try adding space to the character sets checking for leading or trailing text.

regex match till a character from a second occurance of a different character

My question is pretty similar to this question and the answer is almost fine. Only I need a regexp not only for character-to-character but for a second occurance of a character till a character.
My purpose is to get password from uri, example:
http://mylogin:mypassword#mywebpage.com
So in fact I need space from the second ":" till "#".

You could give the following regex a go:
(?<=:)[^:]+?(?=#)
It matches any consecutive string not containing any : character, prefixed by a : and suffixed by a #.
Depending on your flavour of regex you might need something like:
:([^:]+?)#
Which doesn't use lookarounds, this includes the : and # in the match, but the password will be in the first capturing group.
The ? makes it lazy in case there should be any # characters in the actual url string, and as such it is optional. Please note that that this will match any character between : and # even newlines and so on.

Here's an easy one that does not need look-aheads or look-behinds:
.*:.*:([^#]+)#
Explanation:
.*:.*: matches everything up to (and including) the second colon (:)
([^#]+) matches the longest possible series of non-# characters
# - matches the # character.
If you run this regex, the first capturing group (the expression between parentheses) will contain the password.
Here it is in action: http://regex101.com/r/fT6rI0

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression using non-greedy matching -- confusing result - regex

You can use ,[^,]+_sys$ The pattern matches: , Match the last comma [^,]+ Match 1 + occurrences of any char except , _sys Match literally $ End of string See a regex demo. If you don't want to match newlines and whitespaces: ,[^\s,]+_sys$

Related

Regex expression to ignore first and last character

Notepad++ regex extract two options

Regex - Discard the entire string if any part of the string doesn't match the pattern

Regex in middle of text doesn't match

regex match till a character from a second occurance of a different character

Categories

Resources