Regex- to extract a string before and after string

Regex- to extract a string before and after string - regex

Want extract string before and after the word. Below are the content.
Content:
1. http://www.example.com/myplan/mp/public/pl_be?Id=543543&timestamp=06280435435
2. http://www.example.com/course/df/public/pl_de?Id=454354&timestamp=0628031746
3. http://www.example.com/book/rg/public/pl_fo?Id=4445577&timestamp=0628031734
4. http://www.example.com/trip/tr/public/pl_ds?Id=454354&timestamp=06280314546
5. http://www.example.com/trip/tr/public/pl_ds
I want capture data for above string as below
1. http://www.example.com/myplan/mp/public/?Id=543543
2. http://www.example.com/course/df/public/?Id=454354
3. http://www.example.com/book/rg/public/?Id=4445577
4. http://www.example.com/trip/tr/public/?Id=454354
5. http://www.example.com/trip/tr/public/
I have tried with (./(?![A-Za-z]{2}_[A-Za-z]{2}).(?=&)). But it won't help.
I hope somebody can help me with this.

This pattern will catch what you want in two groups. It's more safe than other other examples that have been suggested so far because it allows for some variance in the URL.
(.*)\w\w_\w\w.*?(?:(?:[&?]\w+=\d+|%\w*)*?(\?Id=\d+)(?:.*))?
(.*) captures everything up until your xx_xx part (capture group 1)
\w\w_\w\w.* matches xx_xx and everything up until the next capture section
(?:[&?]\w+=\d+|%\w*)*? allows for there to be other & % or ? properties in your URL before your ?Id= property
(\?Id=\d+) captures your Id property (capture group 2)
(?:.*) is unnecessary but it bugs me when not all of the text is highlighted on regex101 ¯\_(ツ)_/¯
the optional non-capturing group here (?:(?:[&?]\w+=\d+|%\w*)*?(\?Id=\d+)(?:.*))? allows it to match URLs that don't have ID properties.
Here's an example of how it works

Response updated:
This pattern will do the work for you:
(.*\/)[^?]*(?:(\?[^&]*).*)?
Explanation:
(.*\/) -> Will match and capture every character until the / character is present (The .* is a greedy operator).
[^?]* -> Will match everything that's not a ? character.
(?:(\?[^&]*).*)? -> First of all, (?: ... ) is a non-capturing group, the ? at the end of this makes this group optional, (\?[^&]*) will match and capture the ? character and every non & character next to it, the last .* will match everything after the first param in the URL.
Then you can replace the string using only the first and second capture groups.
Here is a working example in regex101
Edit 2:
As emsimpson92 mentioned in the comments, the Id couldn't always be the first param, so you can use this pattern to match the Id param:
(.*\/)[^?]*(?:(\?).*?(Id=[^&]*).*)?
The important part here is that .*?(Id=[^&]*).* matches the Id param no matter its position.
.*? -> It matches all the characters until Id= is present. The trick here is that .* is a greedy quantifier but when is used in conjunction with ? it becomes a lazy one.
Here is an Example of this scenario in regex101

Related

Regex to extract static text and number using only regular expression

I am completely new to this regular expression.
But I tried to write the regular expression to get some static text and phone number for the below text
"password":"password123:cityaddress:mailaddress:9233321110:gender:45"
I written like below to extract this : "password":9233321110
(([\"]password[\"][\s]*:{1}[\s]*))(\d{10})?
regex link for demo:
https://regex101.com/r/2vNpMU/2
the correct regexp gives full match as "password":9233321110 in regex tool
I am not using any programming language here, this is for network packet capture at F5 level.
Please help me with the regexp;

I would use /^([^:]+)(?::[^:]+){3}:([^:]+)/ for this.
Explained (more detailed explanation at regex101):
^ matches from the start of the string
(…) is the first capture group. This will collect that initial "password"
[^:]+ matches one or more non-colon characters
(?:…) is a non-capturing group (it collects nothing for later)
:[^:]+ matches a colon and then 1+ non-colons
{3} instructs us to match the previous item (the non-capturing group) 3 times
: matches a literal colon
([^:]+) captures a match of 1+ non-colons, which will get us 9233321110 in this example
The first capture group is typically stored as $1 or the first item of the returned array. (In Javascript, the zeroth item is the full match and item index 1 is the first capture group.) The second capture group is $2, etc.
To always match the "password" key, hard-code it: /^("password")(?::[^:]+){3}:([^:]+)/
Here's a live snippet demonstrating it:
x = `"password":"password123:cityaddress:mailaddress:9233321110:gender:45"`;
match = x.match(/^([^:]+)(?::[^:]+){3}:([^:]+)/);
if (match) console.log(match[1] + ":" + match[2]);
else console.log("no match");

Regex - optional capture group after wildcard

Say I have the following list:
No 1 And Your Bird Can Sing (4)
No 2 Baby, You're a Rich Man (5)
No 3 Blue Jay Way S
No 4 Everybody's Got Something to Hide Except Me and My Monkey (1)
And I want to extract the number, the title and the number of weeks in the parenthesis if it exists.
Works, but the last group is not optional (regstorm):
No (?<no>\d{1,3}) (?<title>.*?) \((?<weeks>\d)\)
Last group optional, only matches number (regstorm):
No (?<no>\d{1,3}) (?<title>.*?)( \((?<weeks>\d)\))?
Combining one pattern with week capture with a pattern without week capture works, but there gotta be a better way:
(No (?<no>\d{1,3}) (?<title>.*) \((?<weeks>\d)\))|(No (?<no>\d{1,3}) (?<title>.*))
I use C# and javascript but I guess this is a general regex question.

Your regex is almost there!
First and most importantly, you should add a $ at the end. This makes (?<title>.*?) match all the way towards the end of the string. Currently, (?<title>.*?) matches an empty string and then stops, because it realises that it has reached a point where the rest of the regex matches. Why does the rest of the regex match? Because the optional group can match any empty string. By putting the $, you are making the rest of the regex "harder" to match.
Secondly, you forgot to match an open parenthesis \(.
This is how your regex should look like:
No (?<no>\d{1,3}) (?<title>.*?)( \((?<weeks>\d)\))?$
Demo

You may use this regex with an optional last part:
^No (?<no>\d{1,3}) (?<title>.*?\S)(?: \((?<weeks>\d)\))?$
RegEx Demo

Another option could be for the title to match either not ( or when it does encounter a ( it should not be followed by a digit and a closing parenthesis.
^No (?<no>\d{1,3}) (?<title>(?:[^(\r\n]+|\((?!\d\)))+)(?:\((?<weeks>\d)\))?
In parts
^No
(?\d{1,3}) Group no and space
(?<title>
(?: Non capturing group
[^(\r\n]+ Match any char except ( or newline
| Or
\((?!\d\)) Match ( if not directly followed by a digit and )
)+ Close group and repeat 1+ times
) Close group title
(?: Non capturing group
\((?<weeks>\d)\) Group weeks between parenthesis
)? Close group and make it optional
Regex demo
If you don't want to trim the last space of the title you could exclude it from matching before the weeks.
Regex demo

Regex to check only if the group is present

I have String which may have values like below.
854METHYLDOPA
041ALDOMET /00000101/
133IODETO DE SODIO [I 131]
In this i need to get the text starting from index 4 till we find any one these patterns /00000101/ or [I 131]
Expected Output:
METHYLDOPA
ALDOMET
IODETO DE SODIO
I have tried the below RegEx for the same
(?:^.{3})(.*)(?:[[/][A-Z0-9\s]+[]/\s+])
But this RegEx works if the string contains [/ but it doesn't work for the case1 where these patterns doesn't exist.
I have tried adding ? at the end but it works fore case 1 but doesn't work for case 2 and 3.
Could anyone please help me on getting the regx work?

Your logic is difficult to phrase. My interpretation is that you always want to capture from the 4th character onwards. What else gets captured depends on the remainder of the input. Should either /00000101/ or [I 131] occur, then you want to capture up until that point. Otherwise, you want to capture the entire string. Putting this all together yields this regex:
^.{3}(?:(.*)(?=/00000101/|\[I 131\])|(.*))
Demo

You may try this:
^.{3}(.*?)($|(?:\s*\/00000101\/)|(?:\s*\[I\s+131\])).*$
and replace by this to get the exact output you want.
\1
Regex Demo
Explanation:
^ --> start of a the string
.{3} --> followed by 3 characters
(.*?) --> followed by anything where ? means lazy it will fetch until it finds the following and won't go beyond that. It also captures it as
group 1 --> \1
($|(?:\s*\/00000101\/)|(?:\s*\[I\s+131\])) ---------->
$ --> ends with $ which means there is there is not such pattern that
you have mentioned
| or
(?:\s*\/00000101\/) -->another pattern of yours improvised with \s* to cover zero or more blank space.
| or
(?:\s*\[I\s+131\]) --> another pattern of yours with improvised \s+
which means 1 or more spaces. ?: indicates that we will not capture
it.
.*$ --> .* is just to match anything that follows and $
declares the end of string.
so we end up only capturing group 1 and nothing else which ensures to
replace everything by group1 which is your target output.

You could get the values you are looking for in group 1:
^.{3}(.+?)(?=$| ?\[I 131\]| ?\/00000101\/)
Explanation
From the beginning of the string ^
Match the first 3 characters .{3}
Match in a capturing group (where your values will be) any character one or more times non greedy (.+?)
A positive lookahead (?=
To assert what follow is either the end of the string $
or |
an optional space ? followed by [I 131] \[I 131\]
or |
an optional space ? followed by /00000101/ \/00000101\/
If your regex engine supports \K, you could try it like this and the values you are looking for are not in a group but the full match:
^.{3}\K.+?(?=$| ?\[I 131\]| ?\/00000101\/)

Middle-portion regex

I'm tying to write some regex matching a start and end of a string.
start:
https://www.example.com.au/
end:
-end
Example input/match:
Input IsMatch
https://www.example.com.au/hithere-end Y
https://www.example.com.au/hi-there-end Y
https://www.example.com.au/hithere-endx N
https://www.example.com.au/end N
This is what i have so far:
^https?://(www\.)?example\.com\.au/[A-z](\-end)$
Any help?
Thanks.

Try this pattern:
^https?:\/\/(?:www\.)?example\.com\.au\/(.+)-end$
Changes from your pattern:
/ are escaped (with \, 3 times).
The first group changed to a non-capturing one (?:).
[A-z] matches a single capital letter. Changed to (.+)
(a capturing group).
Removed parentheses from the last group (you don't want to capture it), hence \ is also not needed.
The "middle part" you want to capture is in group 1.

Check this:
^(https?://(www\.)?example\.com\.au/)[A-z]*(-end)$
Should work.

Try this C# code
Somestring.StartsWith("https://www.example.com.au/")
Somestring.EndsWith("-end")

Regex optional group

I am using this regex:
((?:[a-z][a-z]+))_(\d+)_((?:[a-z][a-z]+)\d+)_(\d{13})
to match strings like this:
SH_6208069141055_BC000388_20110412101855
separating into 4 groups:
SH
6208069141055
BC000388
20110412101855
Question: How do I make the first group optional, so that the resulting group is a empty string?
I want to get 4 groups in every case, when possible.
Input string for this case: (no underline after the first group)
6208069141055_BC000388_20110412101855

Making a non-capturing, zero to more matching group, you must append ?.
(?: ..... )?
^ ^____ optional
|____ group

You can easily simplify your regex to be this:
(?:([a-z]{2,})_)?(\d+)_([a-z]{2,}\d+)_(\d+)$
^ ^^
|--------------||
| first group ||- quantifier for 0 or 1 time (essentially making it optional)
I'm not sure whether the input string without the first group will have the underscore or not, but you can use the above regex if it's the whole string.
regex101 demo
As you can see, the matched group 1 in the second match is empty and starts at matched group 2.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex- to extract a string before and after string - regex

Related

Regex to extract static text and number using only regular expression

Regex - optional capture group after wildcard

Regex to check only if the group is present

Middle-portion regex

Regex optional group

Categories

Resources