java regex pattern.compile Vs matcher - regex

Im trying to find whether a word contains consecutive identical strings or not, using java.regex.patterns, while testing an regex with matcher, It returns true. But if I only use like this :
System.out.println("test:" + scanner.hasNext(Pattern.compile("(a-z)\\1")));
it returns false.
public static void test2() {
String[] strings = { "Dauresselam", "slab", "fuss", "boolean", "clap", "tellme" };
String regex = "([a-z])\\1";
Pattern pattern = Pattern.compile(regex);
for (String string : strings) {
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println(string);
}
}
}
this returns true. which one is correct.

The pattern ([a-z])\\1 uses a capturing group to match a single lowercase character which is then followed by a backreference to what is captured in group 1.
Ih you have Dauresselam for example, it would match the first s in the capturing group and then matches the second s. So if you want to match consecutive characters you could use that pattern.
The pattern (a-z)\\1 uses a capturing group to match a-z literally and then then uses a backreference to what is captured in group 1. So that would match a-za-z

It depends on what you want. Here you use parenthesis:
Pattern.compile("(a-z)\\1").
Here you use Square brackets inside pareanthesis:
String regex = "([a-z])\\1";
To compare, you should obviously use the same pattern.

Related

How to build regex to match values if they exists

I have a requirement to match the complete string if some part of value exists or not
For example :- Here are the list of strings that should be matched
en.key.value
fr.key.value
es.key.value
pt.key.value
key.value
So, length of string before first . can only be >=2.
Below are some values which should not be accepted
.key.value
z.key.value
Could someone please help ?
Thanks in advance
^[^.]{2,}\..+$
Matches
en.key.value
fr.key.value
es.key.value
pt.key.value
key.value
Does not match
.key.value
z.key.value
See yourself: Regexr.com
You could use the following regex : /[a-z]{2,}\.[a-z]+\.[a-z]+/g
[a-z]{2,} matches 2 or more repetitions of characters in the range between a and z.
\. matches the dot character.
[a-z]+ matches 1 or more repetitions of characters between a and z.
let regex = /[a-z]{2,}\.[a-z]+\.[a-z]+/g;
console.log(regex.test("fr.key.value"));
console.log(regex.test("z.key.value"));
Regex101.
You don't need to use regular expressions. You can split the string on the dots and check the length of the first part.
String[] strings = {"en.key.value",
"fr.key.value",
"es.key.value",
"pt.key.value",
"key.value",
".key.value",
"z.key.value"};
for (String string : strings) {
String[] parts = string.split("\\.");
System.out.printf("[%b] %s%n", (parts[0].length() >= 2), string);
}
Above code produces following output.
[true] en.key.value
[true] fr.key.value
[true] es.key.value
[true] pt.key.value
[true] key.value
[false] .key.value
[false] z.key.value
However, if you insist on using regular expressions, consider the following.
String[] strings = {"en.key.value",
"fr.key.value",
"es.key.value",
"pt.key.value",
"key.value",
".key.value",
"z.key.value"};
Pattern pattern = Pattern.compile("^[a-z]{2,}\\.");
for (String string : strings) {
Matcher matcher = pattern.matcher(string);
System.out.printf("[%b] %s%n", matcher.find(), string);
}
Explanation of regular expression ^[a-z]{2,}\\.
^ start of string
[a-z] any lower-case letter of the English alphabet
{2,} two or more occurrences of the preceding
\\. literal dot
In other words, the above pattern matches strings that start with two or more lower-case characters followed by a single dot.

regex to extract substring for special cases

I have a scenario where i want to extract some substring based on following condition.
search for any pattern myvalue=123& , extract myvalue=123
If the "myvalue" present at end of the line without "&", extract myvalue=123
for ex:
The string is abcdmyvalue=123&xyz => the it should return myvalue=123
The string is abcdmyvalue=123 => the it should return myvalue=123
for first scenario it is working for me with following regex - myvalue=(.?(?=[&,""]))
I am looking for how to modify this regex to include my second scenario as well. I am using https://regex101.com/ to test this.
Thanks in Advace!
Some notes about the pattern that you tried
if you want to only match, you can omit the capture group
e* matches 0+ times an e char
the part .*?(?=[&,""]) matches as least chars until it can assert eiter & , or " to the right, so the positive lookahead expects a single char to the right to be present
You could shorten the pattern to a match only, using a negated character class that matches 0+ times any character except a whitespace char or &
myvalue=[^&\s]*
Regex demo
function regex(data) {
var test = data.match(/=(.*)&/);
if (test === null) {
return data.split('=')[1]
} else {
return test[1]
}
}
console.log(regex('abcdmyvalue=123&3e')); //123
console.log(regex('abcdmyvalue=123')); //123
here is your working code if there is no & at end of string it will have null and will go else block there we can simply split the string and get the value, If & is present at the end of string then regex will simply extract the value between = and &
if you want to use existing regex then you can do it like that
var test = data1.match(/=(.*)&|=(.*)/)
const result = test[1] ? test[1] : test[2];
console.log(result);

Pattern match for (length)%code with before length

I have a pattern like x%c, where x is a single digit integer and c is an alphanumeric code of length x. % is just a token separator of length and code
For instance 2%74 is valid since 74 is of 2 digits. Similarly, 1%8 and 4%3232 are also valid.
I have tried regex of form ^([0-9])(%)([A-Z0-9]){\1}, where I am trying to put a limit on length by the value of group 1. It does not work apparently since the group is treated as a string, not a number.
If I change the above regex to ^([0-9])(%)([A-Z0-9]){2} it will work for 2%74 it is of no use since my length is to be limited controlled by the first group not a fixed digit.
I it is not possible by regex is there a better approach in java?
One way could be using 2 capture groups, and convert the first group to an int and count the characters for the second group.
\b(\d+)%(\d+)\b
\b Word boundary
(\d+) Capture group 1, match 1+ digits
% Match literally
(\d+) Capture group 2, match 1+ digits
\b Word boundary
Regex demo | Java demo
For example
String regex = "\\b(\\d+)%(\\d+)\\b";
String string = "2%74";
Pattern pattern = Pattern.compile(regex);
String strings[] = { "2%74", "1%8", "4%3232", "5%123456", "6%0" };
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
if (Integer.parseInt(matcher.group(1)) == matcher.group(2).length()) {
System.out.println("Match for " + s);
} else {
System.out.println("No match for " + s);
}
}
}
Output
Match for 2%74
Match for 1%8
Match for 4%3232
No match for 5%123456
No match for 6%0

Scala regex : capture between group

In below regex I need "test" as output but it gives complete string which matches the regex. How can I capture string between two groups?
val pattern = """\{outer.*\}""".r
println(pattern.findAllIn(s"try {outer.test}").matchData.map(step => step.group(0)).toList.mkString)
Input : "try {outer.test}"
expected Output : test
current output : {outer.test}
You may capture that part using:
val pattern = """\{outer\.([^{}]*)\}""".r.unanchored
val s = "try {outer.test}"
val result = s match {
case pattern(i) => i
case _ => ""
}
println(result)
The pattern matches
\{outer\. - a literal {outer. substring
([^{}]*) - Capturing group 1: zero or more (*) chars other than { and } (see [^{}] negated character class)
\} - a } char.
NOTE: if your regex must match the whole string, remove the .unanchored I added to also allow partial matches inside a string.
See the Scala demo online.
Or, you may change the pattern so that the first part is no longer as consuming pattern (it matches a string of fixed length, so it is possible):
val pattern = """(?<=\{outer\.)[^{}]*""".r
val s = "try {outer.test}"
println(pattern.findFirstIn(s).getOrElse(""))
// => test
See this Scala demo.
Here, (?<=\{outer\.), a positive lookbehind, matches {outer. but does not put it into the match value.

c# regex split or replace. here's my code i did

I am trying to replace a certain group to "" by using regex.
I was searching and doing my best, but it's over my head.
What I want to do is,
string text = "(12je)apple(/)(jj92)banana(/)cat";
string resultIwant = {apple, banana, cat};
In the first square bracket, there must be 4 character including numbers.
and '(/)' will come to close.
Here's my code. (I was using matches function)
string text= #"(12dj)apple(/)(88j1)banana(/)cat";
string pattern = #"\(.{4}\)(?<value>.+?)\(/\)";
Regex rex = new Regex(pattern);
MatchCollection mc = rex.Matches(text);
if(mc.Count > 0)
{
foreach(Match str in mc)
{
print(str.Groups["value"].Value.ToString());
}
}
However, the result was
apple
banana
So I think I should use replace or something else instead of Matches.
The below regex would capture the word characters which are just after to ),
(?<=\))(\w+)
DEMO
Your c# code would be,
{
string str = "(12je)apple(/)(jj92)banana(/)cat";
Regex rgx = new Regex(#"(?<=\))(\w+)");
foreach (Match m in rgx.Matches(str))
Console.WriteLine(m.Groups[1].Value);
}
IDEONE
Explanation:
(?<=\)) Positive lookbehind is used here. It sets the matching marker just after to the ) symbol.
() capturing groups.
\w+ Then it captures all the following word characters. It won't capture the following ( symbol because it isn't a word character.