Regex greedy to pull only required information - regex

I have one scenario
CF-123/NAME-ANUBHAV/RT-INR 450/SI-No smoking/SC-123
Regex should be compatible with java and it needs to be done in one statement.
wherein I have to pick some information from this string.which are prefixed with predefined tags and have to put them in named groups.
(CF-) confirmationNumber = 123
(Name-) name = ANUBHAV
(RT-) rate = INR 450
(SI-) specialInformation = No smoking
(SC-) serviceCode = 123
I have written below regex:
^(CF-(?<confirmationNumber>.*?)(\/|$))?(([^\s]+)(\/|$))?(NAME-(?<name>.*?)(\/|$))?([^\s]+(\/|$))?(RT-(?<rate>.*?)(\/|$))?([^\s]+(\/|$))?(SI-(?<specialInformation>.*?)(\/|$))?([^\s]+(\/|$))?(SC-(?<serviceCode>.*)(\/|$))?
There can be certain scenarios.
**1st:** CF-123/**Ignore**/NAME-ANUBHAV/RT-INR 450/SI-No smoking/SC-123
**2nd:** CF-123//NAME-ANUBHAV/RT-INR 450/SI-No smoking/SC-123
**3rd:** CF-123/NAME-ANUBHAV/RT-INR 450/**Ignore**/SI-No smoking/SC-123
there can be certain tags in between the string separated by / which we don't need to capture in our named group.enter code here
Basically we need to pick CF-,NAME-,RT-,SI-,SC- and have to assign them in confirmationNumber,name,rate,specialInformation,serviceCode. Anything coming in between the string need not to be captured.

To find the five bits of information that you are interested, you can use a pattern with named groups, compiling the pattern with the regex Pattern
Then, you can use the regex Matcher to find groups
String line = "CF-123/**Ignore**/NAME-ANUBHAV/RT-INR 450/SI-No smoking/SC-123";
String pattern = "CF-(?<confirmationNumber>[^/]+).*NAME-(?<name>[^/]+).*RT-(?<rate>[^/]+).*SI-(?<specialInformation>[^/]+).*SC-(?<serviceCode>[^/]+).*";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
After that, you can work with the matched groups:
if (m.find( )) {
String confirmationNumber = m.group("confirmationNumber");
String name = m.group("name");
String rate = m.group("rate");
String specialInformation = m.group("specialInformation");
String serviceCode = m.group("serviceCode");
// continue with your processing
} else {
System.out.println("NO MATCH");
}

Related

.NET Core - regex matches whole string instead of group [duplicate]

This question already has answers here:
Returning only part of match from Regular Expression
(4 answers)
Closed 2 years ago.
I tested my regex on regex101.com, it returns 3 groups
text :
<CloseResponse>SESSION_ID</CloseResponse>
regex :
(<.*>)([\s\S]*?)(<\/.*>)
in C#, I get only one match and one group that contains the whole string instead of just the SESSION_ID
I expect the code to return only SESSION_ID
I tried finding a global option but there don't seem to be any
here is my code
Regex rg = new Regex(#"<.*>([\s\S]*?)<\/.*>");
MatchCollection matches = rg.Matches(tag);
if (matches.Count > 0) ////////////////////////////////// only one match
{
if (matches[0].Groups.Count > 0)
{
Group g = matches[0].Groups[0];
return g.Value; //////////////////// = <CloseResponse>SESSION_ID</CloseResponse>
}
}
return null;
thanks for helping me on this
I managed to make it work this way
string input = "<OpenResult>SESSION_ID</OpenResult>";
// ... Use named group in regular expression.
Regex expression = new Regex(#"(<.*>)(?<middle>[\s\S]*)(<\/.*>)");
// ... See if we matched.
Match match = expression.Match(input);
if (match.Success)
{
// ... Get group by name.
string result = match.Groups["middle"].Value;
Console.WriteLine("Middle: {0}", result);
}
// Done.
Console.ReadLine();
Use non-capturing group if you want whole string as result: (?:)
(?:<.*>)(?:[\s\S]*?)(?:<\/.*>)
Demo
If you just want to capture session id use this:
(?:<.*>)([\s\S]*?)(?:<\/.*>)
Demo

regex for validating input C 200 50

How do i write regex for below?
C 200 50
C/c can be upper case or lower case.
200 - 0 to 200 range
50 - o to 50 range
All three words are separated by space and there can be 1 or more space.
This is what i tried so far.
public static void main(String[] args) {
String input = "C 200 50";
String regex = "C{1} ([01]?[0-9]?[0-9]|2[0-9][0]|20[0]) ([01]?[0-5]|[0-5][0])";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
boolean found = false;
while (matcher.find()) {
System.out.println("I found the text "+matcher.group()+" starting at index "+
matcher.start()+" and ending at index "+matcher.end());
found = true;
}
}
Not sure how to have multiple space, upper or lower first 'C'
If you are validating a string, you must be expecting a whole string match. It means you should use .matches() rather than .find() method as .matches() requires a full string match.
To make c match both c and C you may use Pattern.CASE_INSENSITIVE flag with Pattern.compile, or prepend the pattern with (?i) embedded flag option.
To match one or more spaces, one would use + or \\s+.
To match leading zeros, you may prepend the number matching parts with 0*.
Hence, you may use
String regex = "(?i)C\\s+0*(\\d{1,2}|1\\d{2}|200)\\s+0*([1-4]?\\d|50)";
and then
See the regex demo and a Regulex graph:
See the Java demo:
String input = "C 200 50";
String regex = "(?i)C +0*(\\d{1,2}|1\\d{2}|200) +0*([1-4]?\\d|50)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
boolean found = false;
if (matcher.matches()) {
System.out.println("I found the text "+matcher.group()+" starting at index "+
matcher.start()+" and ending at index "+matcher.end());
found = true;
}
Output:
I found the text C 200 50 starting at index 0 and ending at index 8
If you need a partial match, use the pattern with .find() method in a while block. To match whole words, wrap the pattern with \\b:
String regex = "(?i)\\bC\\s+0*(\\d{1,2}|1\\d{2}|200)\\s+0*([1-4]?\\d|50)\\b";

how to get a number between two characters?

I have this string:
String values="[52,52,73,52],[23,32],[40]";
How to only get the number 40?
I'm trying this pattern "\\[^[0-9]*$\\]", I've had no luck.
Can someone provide me with the appropriate pattern?
There is no need to use ^
The correct regex here is \\[([0-9]+)\\]$
If you are sure of the single number inside the [], this simple regex would do
\\[(\d+)\\]
Your could update your pattern to use a capturing group and a quantifier + after the character class and omit the ^ anchor to assert the start of the string.
Change the anchor to assert the end of string $ to the end of the pattern:
\\[([0-9]+)\\]$
^ ^^
Regex demo | Java demo
For example:
String regex = "\\[([0-9]+)\\]$";
String string = "[52,52,73,52],[23,32],[40]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
if(matcher.find()) {
System.out.println(matcher.group(1)); // 40
}
Given that you appear to be using Java, I recommend taking advantage of String#split here:
String values = "[52,52,73,52],[23,32],[40]";
String[] parts = values.split("(?<=\\]),(?=\\[)");
String[][] contents = new String[parts.length][];
for (int i=0; i < parts.length; ++i) {
contents[i] = parts[i].replaceAll("[\\[\\]]", "").split(",");
}
// now access any element at any position, e.g.
String forty = contents[2][0];
System.out.println(forty);
What the above snippet generates is a jagged 2D Java String array, where the first index corresponds to the array in the initial CSV, and the second index corresponds to the element inside that array.
Why not just use String.substring if you need the content between the last [ and last ]:
String values = "[52,52,73,52],[23,32],[40]";
String wanted = values.substring(values.lastIndexOf('[')+1, values.lastIndexOf(']'));

Cannot retrive a group from Scala Regex match

I am struggling with regexps in Scala (2.11.5), I have a followin string to parse (example):
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
I want to extract third numeric value in the string above (it needs to be third after a slash because there can be other groups following), in order to do that I have the following regex pattern:
val pattern = """\/\d+,\d+,(\d+)""".r
I have been trying to retrieve the group for the third sequence of digits, but nothing seems to work for me.
val matchList = pattern.findAllMatchIn(string).foreach(println)
val matchListb = pattern.findAllIn(string).foreach(println)
I also tried using matching pattern.
string match {
case pattern(a) => println(a)
case _ => "What's going on?"
}
and got the same results. Either whole regexp is returned or nothing.
Is there an easy way to retrieve a group form regexp pattern in Scala?
You can use group method of scala.util.matching.Regex.Match to get the result.
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
val pattern = """\/\d+,\d+,(\d+)""".r
val result = pattern.findAllMatchIn(string) // returns iterator of Match
.toArray
.headOption // returns None if match fails
.map(_.group(1)) // select first regex group
// or simply
val result = pattern.findFirstMatchIn(string).map(_.group(1))
// result = Some(14058913)
// result will be None if the string does not match the pattern.
// if you have more than one groups, for instance:
// val pattern = """\/(\d+),\d+,(\d+)""".r
// result will be Some(56)
Pattern matching is usually the easiest way to do it, but it requires a match on the full string, so you'll have to prefix and suffix your regex pattern with .*:
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
val pattern = """.*\/\d+,\d+,(\d+).*""".r
val pattern(x) = string
// x: String = 14058913

Regular expression to extract n-values from a string

Can regex extract the values embedded within a string, as identified by a variable template defined earlier within the same string? Or is this better handled in Java?
For example: "2012 Ferrari [F12] - Ostrich Leather interior [F12#OL] - Candy Red Metallic [F12#3]" The variable template is the first string encountered with square brackets, e.g. [F12], and the desired variables are found within subsequent instances of that template, e.g. 'OL' and '3'.
Since you are mentioning Java, I'll assume you are using the Java implementation, Pattern.
Java's Pattern supports so called back references, which can be used to match the same value a previous capturing group matched.
Unfortunately you cannot extract multiple values from a single capturing group, so you'll have to hardcode the number of templates you want to match, if you want to do this with a single pattern.
For one variable, it could look like this:
\[(.*?)\].*?\[\1#(.*?)\]
^^^^^ ^^^^^ variable
template ^^ back reference to whatever template matched
You can add more optional matches by wrapping them in optional non-capturing groups like this:
\[(.*?)\].*?\[\1#(.*?)\](?:.*?\[\1#(.*?)\])?(?:.*?\[\1#(.*?)\])?
^ optional group ^ another one
This would match up to three variables:
String s = "2012 Ferrari [F12] - Ostrich Leather interior [F12#OL] - Candy Red Metallic [F12#3]";
String pattern = "\\[(.*?)\\].*?\\[\\1#(.*?)\\](?:.*?\\[\\1#(.*?)\\])?(?:.*?\\[\\1#(.*?)\\])?";
Matcher matcher = Pattern.compile(pattern).matcher(s);
if (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
// prints F12, OL, 3, null
If you'll need to match any number of variables, however, you'll have to resort to extracting the template in a first pass and then embedding it in a second pattern:
// compile once and store in a static variable
Pattern templatePattern = Pattern.compile("\\[(.*?)\\]");
String s = "2012 Ferrari [F12] - Ostrich Leather interior [F12#OL] - Candy Red Metallic [F12#3]";
Matcher templateMatcher = templatePattern.matcher(s);
if (!templateMatcher.find()) {
return;
}
String template = templateMatcher.group(1);
Pattern variablePattern = Pattern.compile("\\[" + Pattern.quote(template) + "#(.*?)\\]");
Matcher variableMatcher = variablePattern.matcher(s);
while (variableMatcher.find()) {
System.out.println(variableMatcher.group(1));
}