Notepad++ regex extract two options - regex

I've a list below:
7080508136242611718:7080508978035787525:7549dda86ba9af19:31050:install_id=7080508978035787525; store-country-code=us; store-idc=useast5; ttreq=1$fd2f36282a10633c5638a02cc54c19ff13f60755; passport_csrf_token=13bf74c4e5fe04307f0a99de9aed53f9; passport_csrf_token_default=13bf74c4e5fe04307f0a99de9aed53f9; odin_tt=11ed1b48fba2d7a9fe3d86929b3d52cebbad0ca7f7dbd127e220cfb3be279621ba04487517b536050a6ded9fbe50e300cd11615e2e9551523478e5484896a9dda800e55e428842872fcf862e8c57d439:1648559503:351451268482810:3f:49:8c:b7:8c:cb:c5379d41-6cf3-4152-9d48-7aa45f7f611c:79375640-197c-4aaa-86cf-4ef8e7238be2:1:AgICAw0AFockF-RPsNA-7qeIMtk5-CKdkW2eP4TZYMDY7A
7080507996291827206:7080508977079666438:6742591cc0d20580:31050:install_id=7080508977079666438; store-country-code=us; store-idc=useast5; ttreq=1$a119611bfe79541b0b4c029fe910b6507123eec2; passport_csrf_token=fb42bbd472462c17f45acb531deb057a; passport_csrf_token_default=fb42bbd472462c17f45acb531deb057a; odin_tt=6c3b06ff01fd67f42e3dccb60a1e69ca67cb8654f49662017acc209f7176517bcd13a374311f7a1b3538e6407fb237267abf43578d3180d8c834e7df886fa4377a9b950dbb6ff146e3fabf37158dcfa8:1648559508:351451233766930:dd:9e:82:59:5f:7f:596da881-89e8-4f60-b644-5fef23f0a422:f04adc87-56de-4191-a25f-843bec1d5818:1:AgICAw0AFockF-RPsNA-7qeIMtk5-CKdsYPWv4TZYMDY7A
7080509102451394054:7080509820378072837:e36dc9aceecfc1cc:31050:install_id=7080509820378072837; store-country-code=us; store-idc=useast5; ttreq=1$d94700921d5ee2b21992910a2a4e84dd0ade1ec8; passport_csrf_token=2d4f4eca772dbfcbb37548ff02da3166; passport_csrf_token_default=2d4f4eca772dbfcbb37548ff02da3166; odin_tt=53d6999ebe29c0d5144a9669331ce3307a290891370914dabadbfa0520114e6e76b9103c9a6db5476e139251ee478f3a305577a89e3fa07288b7aca00774d3fccbd03566687dbcfdce31700065295939:1648559700:351451299637010:71:de:41:2b:ad:b4:1eba1ae9-3216-40e1-be7f-00303e524c27:2713cbd3-7a4f-493e-b76f-ac6d56ab8045:5:AgMNAgIAhyQWF-RPsNA-7qeIMtk5-CKcsBcWP4TZYMDY7w
7080509086894851590:7080509909225604870:98be64e38551984d:31050:install_id=7080509909225604870; store-country-code=us; store-idc=useast5; ttreq=1$05929375d8605739d8ebdbb5ce15eb406da5c467; passport_csrf_token=c95c71ad206a1d371e5b67505ae25be8; passport_csrf_token_default=c95c71ad206a1d371e5b67505ae25be8; odin_tt=6ddaa02f6133e61a4c591ef2a872f0ec2339d8b6a3fc480575fe279b13ded615e1fa7de979e18565f3ac8b8229a19a98bdf79aa1804071dcc025e1a4cd5314522cf40a62ca961770baea1d5d653d6d64:1648559720:351451292934660:9d:cf:c3:92:f6:f5:787dfb42-f4bf-43fa-9c64-ded19a1b1660:366c3024-217d-4f85-90dd-d95a0fd3e296:4:AgICAw0AFockF-RPsNA-7qeIMtk5-CKcs7bUP4TZYMDY7w
7080509183397299718:7080509974838085382:f39db5d314071713:31050:install_id=7080509974838085382; store-country-code=us; store-idc=useast5; ttreq=1$561ee2083cb13f0849a9f09e7f89edfe08c7ce6c; passport_csrf_token=721a8fee6f4f97c16ed1923ad3bbc72d; passport_csrf_token_default=721a8fee6f4f97c16ed1923ad3bbc72d;
I'd like to extract first two options aka below:
7080508136242611718:7080508978035787525
7080507996291827206:7080508977079666438
7080509102451394054:7080509820378072837
7080509086894851590:7080509909225604870
7080509183397299718:7080509974838085382
I've tried: *.: but its remove the reset of text. and keeps only first.
I've tried ^.*[0-9]+.*$ to get the second one. but no success.
Hopefully somebody can help me with accurate regex.
Thank you in advance.

This pattern *.: by itself is not a valid regex, and this pattern ^.*[0-9]+.*$ matches the whole string with at least a single digit.
If you want to match the digits and : you could make use of \K to forget what is matched so far and then match the rest of the line.
In the replacement use an empty string.
^\d+:\d+\K.*
^ Start of string
\d+:\d+ Match 1+ digits with : in between
\K.* Clear the current match, and match the rest of the line
Regex demo

^[^:]*:[^:]*\K.*
When matching things with delimiters I will use a negated character set to match the contents. In this case, the delimiter is a colon, so I want to match everything that isn't a colon until there's a colon. Then I want to match everything that isn't a colon. This will match everything up until the second colon. Because I want to keep what I just matched, I am using .* after \K, which resets the match at that point and matches everything else.
That pattern can be replaced with nothing, and the result is the first two columns of each line left.

You can use
Find: ^(\d+:\d+).*
Replace: $1
See this regex demo online.
The ^(\d+:\d+).* regex matches and captures into Group 1 one or more digits + : + one or more digits (with (\d+:\d+)) at the beginning of a line (^) and then matches the rest of the line (with .*).
The $1 replacement replaces the match with the Group 1 value.
See the demo and settings screenshot:
As an alternative, if there are chars other than digits you can also use
^([^:\v]+:[^:\v]+).*
where [^:\v]+ matches one or more chars other than a comma and any vertical whitespace.

Related

Regex expression to ignore first and last character

So I am trying to make a regex match for strings of the form:
"catalog.schema.'tablename'" .
The output I am looking for is just catalog.schema.'tablename' leaving out the quotes at the end position.
Can anyone help me out
I tried to do it with the expression
/(?!^|.$)+[^\s]/ which leaves out the end quotes but matches each character.
So I modified it to /(?!^|.$)+[^\s]+/g . This matches the whole sentence but doesn't ignore the end quote.
Depends on the data arround your string and quotationmarks may be within the string.
Why not just this: "(.*?)"
https://regex101.com/r/oaS8o0/1
To answer the question in the title you might simply use:
^.(.*)?.$
https://regex101.com/r/FxJgtW/1
You can just use
(?<=.).+(?=.)
Or, if you cannot use lookbehind:
(?!^).+(?!$)
See the regex demo #1 and regex demo #2.
Since . matches any char other than line break chars, the patterns just match any strings without their start and end chars.
If you don't want to match the first and the last character, you can just use a capture group instead of lookarounds and use the group 1 value.
The first . matches the first of (any) characters, the (.+) is a capture group that matches 1 or more characters, and the . at the end matches the last character of the string.
.(.+).
Regex demo
Or to get the text between the double quotes at the start and the end of the string using a negated character class and a capture group:
^"([^"]+)"$
Regex demo

Regular expression using non-greedy matching -- confusing result

I thought I understood how the non-greedy modifier works, but am confused by the following result:
Regular Expression: (,\S+?)_sys$
Test String: abc,def,ghi,jkl_sys
Desired result: ,jkl_sys <- last field including comma
Actual result: ,def,ghi,jkl_sys
Use case is that I have a comma separated string whose last field will end in "_sys" (e.g. ,sometext_sys). I want to match only the last field and only if it ends with _sys.
I am using the non-greedy (?) modifier to return the shortest possible match (only the last field including the comma), but it returns all but the first field (i.e. the longest match).
What am I missing?
I used https://regex101.com/ to test, in case you want to see a live example.
You can use
,[^,]+_sys$
The pattern matches:
, Match the last comma
[^,]+ Match 1 + occurrences of any char except ,
_sys Match literally
$ End of string
See a regex demo.
If you don't want to match newlines and whitespaces:
,[^\s,]+_sys$
It sounds like you're looking for the a string that ends with "_sys" and it has to be at the end of the source string, and it has to be preceded by a comma.
,\s*(\w+_sys)$
I added the \s* to allow for optional whitespace after the comma.
No non-greedy modifiers necessary.
The parens are around \w+_sys so you can capture just that string, without the comma and optional whitespace.

Regex - Discard the entire string if any part of the string doesn't match the pattern

I have a comma separated string which I want to validate using a regex. What I have written is gives me a match if there a part wrong later in the string. I want to discard it completely if any part is wrong.
My regex : ^(?:[\w\.]+,{1}(?:STR|INT|REAL){1},{1}(\s*|$))+
Positive Case : Component,STR,YoungGenUse,STR,YoungGenMax,STR,OldGenUse,INT,OldGenMax,INT,PermGenUse,INT,PermGenMax,INT,MajCollCnt,INT,MinCollDur,REAL,MinCollCnt,INT,
Negative Case :
Component,STR,YoungGenUse,STR,YoungGenMax,TEST,OldGenUse,INT,OldGenMax,INT,PermGenUse,INT,PermGenMax,INT,MajCollCnt,INT,MinCollDur,REAL,MinCollCnt,INT,
For the second case, my regex gives a match for the bold portion eventhough, later there is an incorrect part (TEST). How can I modify my regex to discard the entire string?
The pattern that you tried would not match TEST in YoungGenMax,TEST because the alternatives STR|INT|REAL do not match it.
It would show until the last successful match in the repetition which would be Component,STR,YoungGenUse,STR,
You have to add the anchor at the end, outside of the repetition of the group, to indicate that the whole pattern should be followed by asserting the end of the string.
There are no spaces or dots in your string, so you might leave out \s* and use \w+ without the dot in the character class. Note that \s could also possibly match a newline.
^(?:\w+,(?:STR|INT|REAL),)+$
Regex demo
If you want to keep matching optional whitespace chars and the dot:
^(?:[\w.]+,(?:STR|INT|REAL),\s*)+$
Regex demo
Note that by repeating the group with the comma at the end, the string should always end with a comma. You can omit {1} from the pattern as it is superfluous.
your regex must keep matching until end of the string, so you must use $ to indicate end of the line:
^(?:[\w.]+,{1}(?:STR|INT|REAL){1},{1}(\s*|$))+$
Regex Demo

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

Regex to match characters to the right of a colon

I'm stuck on a regex. I'm trying to match words in any language to the right of a colon without matching the colon itself.
The basic rule:
For a line to be valid, it must not begin with or contain any characters outside of [a-z0-9_] until after :.
Any characters to the right of : should match as long as the line begins with the set of characters defined above.
For instance, given a string such as these:
this string should not match
bob_1:Hi. I'm Bob. I speak русский and this string should match
alice:Hi Bob. I speak 한국어 and this string should also match
http://example.com - would prefer to not match URLs
This string:should not match because no spaces or capital letters are allowed left of the colon
Only 2 of the 5 strings above need to match. And only to the right of the colon.
Hi. I'm Bob. I speak русский and this string should match
Hi Bob. I speak 한국어 and this string should also match
I'm currently using (^[a-z0-9_]+(?=:)) to match characters to the left of :. I just can't seem to reverse the logic.
The closest I have at the moment is (?!(?!:)).+. This seems to match everything to right of the colon as well as the colon itself. I just can't figure out how to not include : in the match.
Can one of you regex wizards help me out? If anything is unclear please let me know.
Short regex pattern (case insensitive):
^\w+:(\w.*)
\w - matches any word character (equal to [a-zA-Z0-9_])
https://regex101.com/r/MZhqSL/6
As you marked pcre, here's the pattern you need (only to the right of the colon):
^\w+:\K\w.*
\K - resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
https://regex101.com/r/E1yHVY/1
You can use this regex:
^[a-z0-9_]+:\K(?!//).*
RegEx Demo
RegEx Breakup:
^: Start
[a-z0-9_]+: Match 1+ of [a-z0-9_] characters
:: Match a colon
\K: Reset matched info so far
(?!//): Negative lookahead to disallow // right after colon to avoid matching potential URLs
.*: Match anything until end
You can use the regex: ^.*?:(.*)$
^.*?: - from the beginning of the line, any character until the colon (non-greedy) included
(.*)$ - use a matching group to anything that follows it till the end of the line
Link to DEMO