Return the first occurrence using Regex [closed] - regex

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have the following expression:
[Document[_id=5f9ecf8ca9bec5549493ba7d,·policy_name=xxx,·is_mobile=false, Document[_id=6090fead53bc363849fce989,·policy_name=yyy,·is_mobile=true, Document[_id=619cf036761c281e3ad12327,·policy_name=zzz,·is_mobile=false, Document[_id=619cf729ea016d1e3336e903,·policy_name=xyz,·is_mobile=false]
I would like to capture ONLY the first Document id (i.e- 5f9ecf8ca9bec5549493ba7d).
i tried this regex- (?<=Document\[_id=).*?[^,]* BUT it will return all the Document id's.
1).how can i capture the first / second (Nth match) of document id from the expression?
2). is it possible to do regex AND operator to find the Document id with "is_mobile=true"?
(i.e 5f9ecf8ca9bec5549493ba7d & true)
would really appreciate any help
EDIT:
i'm using https://regex101.com/
this is the link in which i tried to capture the first / second (nth occurance of Document id ( i need only the number) - https://regex101.com/r/ZnYRhq/1

There is not language listed, but one approach could be using a capture group for the value that you want, and start the pattern with an anchor ^ to assert the start of the string.
For the first Document id:
^.*?\bDocument\[_id=([^\]\[\s,]+)
Regex demo
For the first Document id that has is_mobile=true (assuming that the order of the key-value pairs is as given in the example and is within the same opening and closing square brackets)
^.*?\bDocument\[_id=([^\]\[\s,]+),[^\]\[]*\bis_mobile=true\b
The pattern matches:
^ Start of string
.*?\bDocument\[_id= Match the first occurrence of Document[_id=
( Capture group 1
[^\]\[\s,]+ Match 1+ times any char except ] [ whitespace char or ,
) Close group 1
,[^\]\[]* Match a comma and optional chars other than ] and [
\bis_mobile=true\b Match is_mobile=true between word boundaries
Regex demo
Or using lookarounds for a single (not global) match:
(?<=Document\[_id=)[^,]*(?=,[^][]*\bis_mobile=true\b)
Regex demo

How about this one?
(?:^\[Document\[_id=)([^,]+) for the first?
For n-th you need to use capturing group but how to do this is language/framework dependent.

txt="""
[Document[_id=5f9ecf8ca9bec5549493ba7d,·policy_name=xxx,·is_mobile=false, Document[_id=6090fead53bc363849fce989,·policy_name=yyy,·is_mobile=true, Document[_id=619cf036761c281e3ad12327,·policy_name=zzz,·is_mobile=false, Document[_id=619cf729ea016d1e3336e903,·policy_name=xyz,·is_mobile=false]
"""
print([i.split('=')[1].strip(' ') for i in txt.split(',') if '_id' in i ][0])
output:
5f9ecf8ca9bec5549493ba7d

Related

Regex to pull last 2 segments from FQDN

Working on trying to figure out some regex to pull out the last 2 segments of an FQDN.
^.*\shostname=[\w-]+\.(?P<myfield>[^\t]+)
This RegEx works and takes out the first segment of an FQDN.
www.aaa.bbb.someurl.net --> aaa.bbb.someurl.net
But… I only want to keep the last 2 segments of any FQDN.
I need it to be --> someurl.net
Other restrictions:
The hostname field will always be at least 3 segments - don't know the max.
This is for Splunk so I can't use a script. I need it to be PCRE compatible regex.
Here is an example of data:
2021-07-20 18:19:14 reason=Not allowed to browse this category event_id=12345 protocol=HTTP action=Blocked transactionsize=16051 responsesize=789 requestsize=456 urlcategory=Blocked serverip=1.2.4.5 clienttranstime=0 requestmethod=GET refererURL=None useragent=Microsoft-Delivery location=Internal ClientIP=5.6.7.8 status=403 user=John url=dl.delivery.mp.microsoft.com/filestreamingservice/files/abcd-efgh-ijkl/pieceshash vendor=Zscaler hostname=dl.delivery.mp.microsoft.com
From this I data I need the field “myfield” to be: microsoft.com.
The original answer with a much simpler regex ((?:\s|^)hostname=(?:[^\s.]+\.)*(?P<myfield>[^\s.]+\.[^\s.]+)) that worked for OP is in the question history.
You can use
(?:\s|^)hostname=(?:[^\s.]+\.)*?(?P<myfield>[^\s.]+\.(?:(?:ac|co)\.uk|govt?\.uk|judiciary\.uk|l(?:ea|td)\.uk|m(?:e|il|od)\.uk|n(?:et|hs|ic)\.uk|orgn?\.uk|p(?:arliament|lc|olice)\.uk|(?:royal|sch)\.uk|[^\s.]+)(?!\S))
Or, to match the last hostname=... value on a line:
^.*\shostname=(?:[^\s.]+\.)*?(?P<myfield>[^\s.]+\.(?:(?:ac|co)\.uk|govt?\.uk|judiciary\.uk|l(?:ea|td)\.uk|m(?:e|il|od)\.uk|n(?:et|hs|ic)\.uk|orgn?\.uk|p(?:arliament|lc|olice)\.uk|(?:royal|sch)\.uk|[^\s.]+)(?!\S))
See the regex #1 demo and regex #2 demo. Details:
(?:\s|^) - either a whitespace or start of string
hostname= - a literal substring
(?:[^\s.]+\.)*? - zero or more (but as few as possible) occurrences of one or more chars other than whitespace and dot and then a dot
(?P<myfield>[^\s.]+\.(?:(?:ac|co)\.uk|govt?\.uk|judiciary\.uk|l(?:ea|td)\.uk|m(?:e|il|od)\.uk|n(?:et|hs|ic)\.uk|orgn?\.uk|p(?:arliament|lc|olice)\.uk|(?:royal|sch)\.uk|[^\s.]+)(?!\S)) - Group "myfield": one or more chars other than whitespace and dot, then a dot, then any second-level domain or any one or more chars other than whitespace and dot and then either a whitespace or end of string.
Note: the \.(?:(?:ac|co)\.uk|govt?\.uk|judiciary\.uk|l(?:ea|td)\.uk|m(?:e|il|od)\.uk|n(?:et|hs|ic)\.uk|orgn?\.uk|p(?:arliament|lc|olice)\.uk|(?:royal|sch)\.uk pattern part (built from a regex trie) matches this list:
.ac.uk
.co.uk
.gov.uk
.judiciary.uk
.ltd.uk
.me.uk
.mod.uk
.net.uk
.nhs.uk
.nic.uk
.org.uk
.parliament.uk
.plc.uk
.police.uk
.royal.uk
.sch.uk
.co.uk
.ltd.uk
.me.uk
.net.uk
.nic.uk
.org.uk
.plc.uk
.sch.uk
.govt.uk
.orgn.uk
.lea.uk
.mil.uk
If you want to add more second-level domain names, add more to the list and use https://www.myregextester.com or suchlike services to built the word list regex.
You could match all following non whitspace chars after hostname= and then use a capture group to capture the last part with a single dot.
^.*\shostname=(?:\S+\.)?([^\s.]+\.[^\s.]+)
^.*\shostname=
(?:\S+\.)? Optionally match a possible dot before
( Capture group 1
[^\s.]+\.[^\s.]+ Match 2 non dot parts with a . in between
) Close group
Regex demo
If you would like to account for country codes, I've previously answered this at: Get Domain Extension From Hostname
The regular expression would look something like (shortened version): \w+((\.[a-z]{2,3})(\.(uk|au))?)$
The full expression with all country codes: \w+((\.[a-z]{2,3})(\.(ad|ae|af|ag|ai|al|am|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bl|bm|bn|bo|bq|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cu|cv|cw|cx|cy|cz|de|dj|dk|dm|do|dz|ec|ee|eg|er|es|et|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mf|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|ss|st|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|za|zm|zw))?)$

Regular expression Regex to extract a string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Please can somebody help me, I`m new to regex and have no idea how to do this!.
I`m trying to extract from a list which looks like this...
Joe-Age23-46737-251.aspx
Tim-Age18-46909-451.aspx
Roger-Age41-59768-251.aspx
What I want is this...
46737-251.aspx
46909-451.aspx
59768-251.aspx
so basically anything after the second to last hyphen.
Cheers
Let's translate "everything after the second-to-last hyphen" into regex:
(?<=-)[^-]*-[^-]*$
Explanation:
(?<=-) # Assert starting position right after a hyphen
[^-]* # Match zero or more characters except hyphens
- # Match a single hyphen
[^-]* # see above
$ # until end of string.
Test it live on regex101.com.
Step1 : Split the string on the basis of hyphen(-) . You will get array of strings.
Step2 : extract the second , fifth and eighth
and so on( incremented by 3 ).
Step3 : concatinate all the strings formed in step2.

Regex: Matching only groups that have a specific word embedded [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I cannot figure out how to match only on groups that contain a certain word ('test' for example below). It is a big text file and the groups start with a line 'Group x' and include text with an empty line separation to the next group. I think I need to use lookaheads and lookbehinds but don't know how. I can use vb.net for this but trying to test out different expressions in the regex testers and can't get anywhere.
Group 1
adfdf
dd test ddfdf
dfdfadf
Group 2
ddfadfa
Group 3
add test
adfdff
Group 4
adfdf
Expected 2 matches:
Group 1
adfdf
dd test ddfdf
dfdfadf
Group 3
add test
adfdff
Start your pattern with ^Group \d+$ and end with (?:^$|\Z). In the middle match test but not preceeded by an empty line $(?:.(?!^$) (see Regular expression to match a line that doesn't contain a word? for details on how the latter works). Don't forget the m and s modifiers:
^Group \d+$(?:.(?!^$))*?test.*?(?:^$|\Z)
Demo: https://regex101.com/r/kM9qB3/2

Regex for matching just the first occurrence of a comma in each line [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
What does the regex look like for matching only the first instance of a comma, and nothing but that comma?
I have tried things like ,{1} and I think it has something to do with non-greedy qualifiers like this ,(.*?), but I have had no success.
I'm using Notepad++ to try to convert code from another language to JavaScript. I want to turn the first comma into a colon. It looks like this:
'TJ', 'Tajikistan' ,
'TZ', 'Tanzania' ,
'TH', 'Thailand' ,
'TL', 'Timor-Leste' ,
'TG', 'Togo' ,
'TK', 'Tokelau' ,
'TO', 'Tongo' ,
'TT', 'Trinidad and Tobago' ,
Find what: /,/
Replace with: :
0 occurrences were replaced
What you can do is, instead of just replacing the first comma with a colon, you can automatically replace the comma and everything after it with the colon plus everything that was after the comma. (For example, in 'TZ', 'Tanzania' ,, this approach would replace , 'Tanzania' , with : 'Tanzania' ,.) After that, since the rest of the line has already undergone replacement, Notepad++ doesn't re-examine it to see whether it contains a comma.
The way you do that is by using a capture group, which lets the replacement-string incorporate part of what the regex matched.
Specifically, you would replace this ("Find what"):
,(.*)
meaning "a comma (,), plus zero or more characters (.*), and capture the latter (())", with this ("Replace with"):
:$1
meaning "a colon (:), plus whatever was captured by the first capture group ($1)".

Need a regular expression to return text from last "/" to last "-" [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I just cant seem to be able to figure out how to match the following
in the string /hello/there-my-friend
I need to capture everything after the last / and before the last -
So it should capture there-my.
Here's the Regular Expression you're looking for:
#(?<=/)[^/]+(?=-[^-/]*$)#
I'll break it down in a minute, but there are probably better ways to do this.
I might do something like this:
$str = "/hello/there-my-friend";
$pieces = explode('/', $str);
$afterLastSlash = $pieces[count($pieces)-1];
$dashes = explode('-', $afterLastSlash);
unset($dashes[count($dashes)-1]);
$result = implode('-', $dashes);
The performance here is guaranteed linear (limiting factor being the length of $str plus the length of $afterLastSlash. The regular expression is going to be much slower (as much as polynomial time, I think - it can get a little dicey with lookarounds.)
The code above could easily be pared down, but the naming makes it more clear. Here it is as a one liner:
$result = implode('-', array_slice(explode('-', array_slice(explode('/', $str), -1)), 0, -1));
But gross, don't do that. Find a middle ground.
As promised, a breakdown of the regular expression:
#
(?<= Look behind an ensure there's a...
/ Literal forward slash.
) Okay, done looking behind.
[^/] Match any character that's not a forward slash
+ ...One ore more times.
(?= Now look ahead, and ensure there's...
- a hyphen.
[^-/] followed by any non-hyphen, non-forward slash character
* zero or more times
$ until the end of the string.
) Okay, done looking ahead.
#
^".*/([^/-]*)-[^/-]*$
Syntax may vary depending on which flavor of RE you are using.
Try this short regex :
/\K\w+-\w+
Your regex engine need \K support
or
(?<=/)\w+-\w+
(more portable)
Explanations
\K is close to (?<=/) : a look-around regex advanced technique
\w is the same as [a-zA-Z0-9_], feel free to adapt it
This will do it:
(?!.*?/).*(?=-)
Depending on your language, you might need to escape the /
Breakdown:
1. (?!.*?/) - Negative look ahead. It will start collecting characters after the last `/`
2. .* - Looks for all characters
3. (?=-) - Positive look ahead. It means step 2 should only go up to the last `-`
Edited after comment: No longer includes the / and the last - in the results.
This is not an exact answer to your question (its not a regex), but if you are using C# you might use this:
string str = "/hello/there-my-friend";
int lastSlashIndex = str.LastIndexOf('/');
int lastDashIndex = str.LastIndexOf('-');
return str.Substring(lastSlashIndex, lastDashIndex - lastSlashIndex);