Regular expression to return string from filename - regex

How to write a generic regular expression that will
1) capture string after first _ and before second _ as group 1
2) capture string after last _ as group 2
Example
ASIA_JAP_TOKYO_201109
OUTPUT Would be
group 1 - JAP
group 2 - 201109

You can do:
^[^_]*_([^_]*).*_([^_]*)$
Here first captured group will be "JAP" and second will be "201109".
^[^_]*_ matches upto the first _ from start
The first captured group, ([^_]*) captures the string upto next _
.*_ greedily matches upto the last _
([^_]*)$ matches the string after last _ and put it in captured group 2.
Demo

You can use regex like
/^[^_]+_([^_]+).*_([^_]+)$/
Regex explanation here.

For readability purposes, I might use two separate regular expression for this:
First regex:
^[^_]*_([^_]*?)_(.*)$
Second regex:
^(.*)_([^_]*)$
But if you are using a tool such as Java or Perl, I would much rather split the string on underscore and extract out the pieces you want.

Related

replaceAll regex to remove last - from the output

I was able to achieve some of the output but not the right one. I am using replace all regex and below is the sample code.
final String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
System.out.println(label.replaceAll(
"([^-]+)-([^-]+)-(.+)-([^-]+)-([^-]+)", "$3"));
i want this output:
abc-nyd-request-xyxpt
but getting:
abc-nyd-request-xyxpt-
here is the code https://ideone.com/UKnepg
You may use this .replaceFirst solution:
String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
label.replaceFirst("(?:[^-]*-){2}(.+?)(?:--1)?-[^-]+$", "$1");
//=> "abc-nyd-request-xyxpt"
RegEx Demo
RegEx Details:
(?:[^-]+-){2}: Match 2 repetitions of non-hyphenated string followed by a hyphen
(.+?): Match 1+ of any characters and capture in group #1
(?:--1)?: Match optional --1
-: Match a -
[^-]+: Match a non-hyphenated string
$: End
The following works for your example case
([^-]+)-([^-]+)-(.+[^-])-+([^-]+)-([^-]+)
https://regex101.com/r/VNtryN/1
We don't want to capture any trailing - while allowing the trailing dashes to have more than a single one which makes it match the double --.
With your shown samples and attempts, please try following regex. This is going to create 1 capturing group which can be used in replacement. Do replacement like: $1in your function.
^(?:.*?-){2}([^-]*(?:-[^-]*){3})--.*
Here is the Online demo for above regex.
Explanation: Adding detailed explanation for above regex.
^(?:.*?-){2} ##Matching from starting of value in a non-capturing group where using lazy match to match very near occurrence of - and matching 2 occurrences of it.
([^-]*(?:-[^-]*){3}) ##Creating 1st and only capturing group and matching everything before - followed by - followed by everything just before - and this combination 3 times to get required output.
--.* ##Matching -- to all values till last.

Regex to extract static text and number using only regular expression

I am completely new to this regular expression.
But I tried to write the regular expression to get some static text and phone number for the below text
"password":"password123:cityaddress:mailaddress:9233321110:gender:45"
I written like below to extract this : "password":9233321110
(([\"]password[\"][\s]*:{1}[\s]*))(\d{10})?
regex link for demo:
https://regex101.com/r/2vNpMU/2
the correct regexp gives full match as "password":9233321110 in regex tool
I am not using any programming language here, this is for network packet capture at F5 level.
Please help me with the regexp;
I would use /^([^:]+)(?::[^:]+){3}:([^:]+)/ for this.
Explained (more detailed explanation at regex101):
^ matches from the start of the string
(…) is the first capture group. This will collect that initial "password"
[^:]+ matches one or more non-colon characters
(?:…) is a non-capturing group (it collects nothing for later)
:[^:]+ matches a colon and then 1+ non-colons
{3} instructs us to match the previous item (the non-capturing group) 3 times
: matches a literal colon
([^:]+) captures a match of 1+ non-colons, which will get us 9233321110 in this example
The first capture group is typically stored as $1 or the first item of the returned array. (In Javascript, the zeroth item is the full match and item index 1 is the first capture group.) The second capture group is $2, etc.
To always match the "password" key, hard-code it: /^("password")(?::[^:]+){3}:([^:]+)/
Here's a live snippet demonstrating it:
x = `"password":"password123:cityaddress:mailaddress:9233321110:gender:45"`;
match = x.match(/^([^:]+)(?::[^:]+){3}:([^:]+)/);
if (match) console.log(match[1] + ":" + match[2]);
else console.log("no match");

Can regular expression assert that 2 of submatches to be equal?

Let say for this simple regexp,
(?P<first>\d+)\.(?P<second>\d+)
it can match strings like "123.456" so that,
first -> 123, second -> 456
Based on this example, is there a way to assert "first" should equal "second", otherwise the input string won't be a match?
You could capture the first digits before the dot in a capturing group and use a backreference after the dot to group 1:
(?P<first>\d+)\.(?P<second>\1)
Or you can referer to the first capturing group by name:
(?P<first>\d+)\.(?P<second>(?P=first))
As per comment from UnbearableLightness you could use word boundaries \b or use anchors ^ and $ to assert the start and the end of the line.
\b(?P<first>\d+)\.(?P<second>(?P=first))\b
You can backreference to the matched group in capture one with expression:
^(?P<first>\d+)\.(?P<second>\1)$
You can check it live here.

Middle-portion regex

I'm tying to write some regex matching a start and end of a string.
start:
https://www.example.com.au/
end:
-end
Example input/match:
Input IsMatch
https://www.example.com.au/hithere-end Y
https://www.example.com.au/hi-there-end Y
https://www.example.com.au/hithere-endx N
https://www.example.com.au/end N
This is what i have so far:
^https?://(www\.)?example\.com\.au/[A-z](\-end)$
Any help?
Thanks.
Try this pattern:
^https?:\/\/(?:www\.)?example\.com\.au\/(.+)-end$
Changes from your pattern:
/ are escaped (with \, 3 times).
The first group changed to a non-capturing one (?:).
[A-z] matches a single capital letter. Changed to (.+)
(a capturing group).
Removed parentheses from the last group (you don't want to capture it), hence \ is also not needed.
The "middle part" you want to capture is in group 1.
Check this:
^(https?://(www\.)?example\.com\.au/)[A-z]*(-end)$
Should work.
Try this C# code
Somestring.StartsWith("https://www.example.com.au/")
Somestring.EndsWith("-end")

Regex optional group

I am using this regex:
((?:[a-z][a-z]+))_(\d+)_((?:[a-z][a-z]+)\d+)_(\d{13})
to match strings like this:
SH_6208069141055_BC000388_20110412101855
separating into 4 groups:
SH
6208069141055
BC000388
20110412101855
Question: How do I make the first group optional, so that the resulting group is a empty string?
I want to get 4 groups in every case, when possible.
Input string for this case: (no underline after the first group)
6208069141055_BC000388_20110412101855
Making a non-capturing, zero to more matching group, you must append ?.
(?: ..... )?
^ ^____ optional
|____ group
You can easily simplify your regex to be this:
(?:([a-z]{2,})_)?(\d+)_([a-z]{2,}\d+)_(\d+)$
^ ^^
|--------------||
| first group ||- quantifier for 0 or 1 time (essentially making it optional)
I'm not sure whether the input string without the first group will have the underscore or not, but you can use the above regex if it's the whole string.
regex101 demo
As you can see, the matched group 1 in the second match is empty and starts at matched group 2.