How to use \R in java 8 regex [duplicate] - regex

This question already has answers here:
Difference between matches() and find() in Java Regex
(5 answers)
Closed 5 years ago.
I am trying to use the new \R regex matcher from java 8.
However, for the following code :
public static void main(String[] args)
{
String s = "This is \r\n a String with \n Different Newlines \r and other things.";
System.out.println(s);
System.out.println(Pattern.matches("\\R", s));
if (Pattern.matches("\\R", s)) // <-- is always false
{
System.out.println("Matched");
}
System.out.println(s.replaceAll("\\R", "<br/>")); //This is a String with <br/> Different Newlines <br/> and other things.
}
The Pattern.matches always returns false, where as the replaceAll method does seem to find a match and does what I want it to. How do I make the Pattern.matches work ?
I have also tried the long about way and still can't get it to work :
Pattern p = Pattern.compile("\\R");
Matcher m = p.matcher(s);
boolean b = m.matches();
System.out.println(b);

Well matches (both in String and Matchers classes) attempts to match the complete input string.
You need to use matcher.find instead:
Pattern p = Pattern.compile("\\R");
Matcher m = p.matcher(s);
boolean b = m.find();
System.out.println(b);
From Java docs:
\R Matches any Unicode line-break sequence, that is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
PS; If you want to know if input contains a line-break then this one liner will work for you:
boolean b = s.matches("(?s).*?\\R.*");
Note use of .* on either side of \R to make sure we are matching complete input. Also you need (?s) to enable DOTALL mode to be able to match multiline string with .*

Related

Capturing a delimiter that isn't in between single quotes

Like the question says, is it possible to use a single Regex string to get a delimiter that isn't in between some quotes?
For example, I want to split this string with the delimiter &:
"example=3&testing='f&tmp'"
should produce
["example=3", "testing='f&tmp'"]
Essentially, things inside single quotes (' ') should remain untouched.
I found out how to get things within quotes with expression: (?:'.*?')
The closest I could get to a tangible solution was: (.[^']&[^'])
It is not an easy task for a String#split, but is quite a feasible task for Matcher#find if you use
[^&\s=]+=(?:'[^']*'|[^\s&]*)
(see this regex demo) and this Java code:
String text = "example=3&testing='f&tmp'";
Pattern p = Pattern.compile("[^&\\s=]+=(?:'[^']*'|[^\\s&]*)");
Matcher m = p.matcher(text);
List<String> res = new ArrayList<>();
while(m.find()) {
res.add(m.group());
}
System.out.println(res);
// => [example=3, testing='f&tmp']
Details
[^&\s=]+ - one or more chars other than &, = and whitespace
= - a = char
(?:'[^']*'|[^\s&]*) - a non-capturing group matching either ', zero or more chars other than ' and then a ', or zero or more chars other than whitespace and &.

How to return/print matches on a string in RegEx in Flutter/Dart? [duplicate]

This question already has an answer here:
How to put all regex matches into a string list
(1 answer)
Closed 1 year ago.
I want to return a pattern through regEx in flutter every time it' found, I tested using the Regex operation it worked on the same string, returning the match after that included match 'text:' to '}' letters, but it does not print the matches in the flutter application.
The code I am using:
String myString = '{boundingBox: 150,39,48,25, text: PM},';
RegExp exp = RegExp(r"text:(.+?(?=}))");
print("allMatches : "+exp.allMatches(myString).toString());
The output print statement is printing I/flutter ( 5287): allMatches : (Instance of '_RegExpMatch', Instance of '_RegExpMatch')
instead of text: PM
Following is the screenshot of how it is parsing on regexr.com
Instead of using a non greedy match with a lookahead, I would suggest using a negated character class matching any char except } in capture group 1, and match the } after the group to prevent some backtracking.
\b(text:[^}]+)}
You can loop the result from allMatches and print group 1:
String myString = '{boundingBox: 150,39,48,25, text: PM},';
RegExp exp = RegExp(r"\b(text:[^}]+)}");
for (var m in exp.allMatches(myString)) {
print(m[1]);
}
Output
text: PM
You need to use map method to retrieve the string from the matches:
String myString = '{boundingBox: 150,39,48,25, text: PM},';
RegExp exp = RegExp(r"text:(.+?(?=}))");
final matches = exp.allMatches(myString).map((m) => m.group(0)).toString();
print("allMatches : $matches");

.NET Core - regex matches whole string instead of group [duplicate]

This question already has answers here:
Returning only part of match from Regular Expression
(4 answers)
Closed 2 years ago.
I tested my regex on regex101.com, it returns 3 groups
text :
<CloseResponse>SESSION_ID</CloseResponse>
regex :
(<.*>)([\s\S]*?)(<\/.*>)
in C#, I get only one match and one group that contains the whole string instead of just the SESSION_ID
I expect the code to return only SESSION_ID
I tried finding a global option but there don't seem to be any
here is my code
Regex rg = new Regex(#"<.*>([\s\S]*?)<\/.*>");
MatchCollection matches = rg.Matches(tag);
if (matches.Count > 0) ////////////////////////////////// only one match
{
if (matches[0].Groups.Count > 0)
{
Group g = matches[0].Groups[0];
return g.Value; //////////////////// = <CloseResponse>SESSION_ID</CloseResponse>
}
}
return null;
thanks for helping me on this
I managed to make it work this way
string input = "<OpenResult>SESSION_ID</OpenResult>";
// ... Use named group in regular expression.
Regex expression = new Regex(#"(<.*>)(?<middle>[\s\S]*)(<\/.*>)");
// ... See if we matched.
Match match = expression.Match(input);
if (match.Success)
{
// ... Get group by name.
string result = match.Groups["middle"].Value;
Console.WriteLine("Middle: {0}", result);
}
// Done.
Console.ReadLine();
Use non-capturing group if you want whole string as result: (?:)
(?:<.*>)(?:[\s\S]*?)(?:<\/.*>)
Demo
If you just want to capture session id use this:
(?:<.*>)([\s\S]*?)(?:<\/.*>)
Demo

Match longest substring with regex [duplicate]

I tried looking for an answer to this question but just couldn't finding anything and I hope that there's an easy solution for this. I have and using the following code in C#,
String pattern = ("(hello|hello world)");
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
var matches = regex.Matches("hello world");
Question is, is there a way for the matches method to return the longest pattern first? In this case, I want to get "hello world" as my match as opposed to just "hello". This is just an example but my pattern list consist of decent amount of words in it.
If you already know the lengths of the words beforehand, then put the longest first. For example:
String pattern = ("(hello world|hello)");
The longest will be matched first. If you don't know the lengths beforehand, this isn't possible.
An alternative approach would be to store all the matches in an array/hash/list and pick the longest one manually, using the language's built-in functions.
Regular expressions (will try) to match patterns from left to right. If you want to make sure you get the longest possible match first, you'll need to change the order of your patterns. The leftmost pattern is tried first. If a match is found against that pattern, the regular expression engine will attempt to match the rest of the pattern against the rest of the string; the next pattern will be tried only if no match can be found.
String pattern = ("(hello world|hello wor|hello)");
Make two different regex matches. The first will match your longer option, and if that does not work, the second will match your shorter option.
string input = "hello world";
string patternFull = "hello world";
Regex regexFull = new Regex(patternFull, RegexOptions.IgnoreCase);
var matches = regexFull.Matches(input);
if (matches.Count == 0)
{
string patternShort = "hello";
Regex regexShort = new Regex(patternShort, RegexOptions.IgnoreCase);
matches = regexShort.Matches(input);
}
At the end, matches will be be the output of "full" or "short", but "full" will be checked first and will short-circuit if it is true.
You can wrap the logic in a function if you plan on calling it many times. This is something I came up with (but there are plenty of other ways you can do this).
public bool HasRegexMatchInOrder(string input, params string[] patterns)
{
foreach (var pattern in patterns)
{
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(input))
{
return true;
}
}
return false;
}
string input = "hello world";
bool hasAMatch = HasRegexMatchInOrder(input, "hello world", "hello", ...);

Using RegEx split the string

I have a string like '[1]-[2]-[3],[4]-[5],[6,7,8],[9]' or '[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]', I'd like the Pattern to get the list result, but don't know how to figure out the pattern. Basically the comma is the split, but [6,7,8] itself contains the comma as well.
the string: [1]-[2]-[3],[4]-[5],[6,7,8],[9]
the result:
[1]-[2]-[3]
[4]-[5]
[6,7,8]
[9]
or
the string: [Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]
the result:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
,(?=\[)
This pattern splits on any comma that is followed by a bracket, but keeps the bracket within the result text.
The (?=*stuff*) is known as a "lookahead assertion". It acts as a condition for the match but is not itself part of the match.
In C# code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
foreach(String s in Regex.Split(inputstring, #",(?=\[)"))
System.Console.Out.WriteLine(s);
In Java code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile(",(?=\\[)"));
for(String s : p.split(inputstring))
System.out.println(s);
Either produces:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
Although I believe the best approach here is to use split (as presented by #j__m's answer), here's an approach that uses matching rather than splitting.
Regex:
(\[.*?\](?!-))
Example usage:
String input = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile("(\\[.*?\\](?!-))");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Resulting output:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
An answer that doesn't use regular expressions (if that's worth something in ease of understanding what's going on) is:
substitute "]#[" for "],["
split on "#"