Regular Expression how to stop at first match - regex

My regex pattern looks like this:
(?<=_)(.*?)(?=_)
I want replace only the first match between the two underscores (its not allways AVVI, it can be different. Same as AVVIDI):
T_AVVI_EINZELPOSTEN_TEST -> T_AVVIDI_EINZELPOSTEN_TEST
My regex pattern matches AVVI and EINZEPLOSTEN. How can i modify my regex to find only the first match AVVI?
code:
private Identifier addPrefix(final Identifier identifier) {
if (isExcluded(identifier)) {
return identifier;
}
Pattern p = Pattern.compile("(?<=_)(.*?)(?=_)");
Matcher m = p.matcher(identifier.getText());
return Identifier.toIdentifier(m.replaceAll(prefix));
}

You can do this using .replaceFirst like this using start anchor and a capture group:
String line = identifier.getText()
.repalceFirst("^([^_]*_)[^_]*(?=_)", "$1AVVIDI");
RegEx Demo
RegEx Breakup:
^: Start
([^_]*_): Capture group #1 to match 0 or more chars that are not _ followed by a _
[^_]*: Match 0 or more chars that are not _
(?=_): Positive lookahead to assert presence of _ at next position
$1AVVIDI: to replace with value in capture group #1 followed by text AVVIDI

Related

How do I make a regular expression that matches text with an open parenthesis only not preceded by a space?

How do I craft a regular expression with a group that includes text with an open parenthesis not preceded by a space, but does not include an open parenthesis preceded by a space (and everything after that)?
Some examples:
Matching: "Yasmani Grandal (1B 1.84)"
Would return: "Yasmani Grandal"
Matching: "J.T. Realmuto"
Would return: "J.T. Realmuto"
Matching: "WillD. Smith(LAD)"
Would return: "WillD. Smith(LAD)"
Matching: "Adley(round/1/2019) Rutschman"
Would return: "Adley(round/1/2019) Rutschman"
Attempted solutions:
(.+)(?:\s\(.*)
This regular expression returns the "Yasmani Grandal" as group 1 when matching "Yasmani Grandal (1B 1.84)", but doesn't match "J.T. Realmuto" because the second (non-matching) group is not optional.
But if I make it optional: (.+)(?:\s\(.*)?
...then group 1 when matching "Yasmani Grandal (1B 1.84)" is ""Yasmani Grandal (1B 1.84)".
You may use
^(.*?)(?:\s+\(.*\))?$
See the regex demo
Details
^ - start of string
(.*?) - Capturing group 1: any 0 or more chars other than line break chars as few as possible
(?:\s+\(.*\))? - an optional non-capturing group matching 1 or 0 occurrences of
\s+ - 1+ whitespaces
\( - a ( char
.* - any 0 or more chars other than line break chars as many as possible
\) - a ) char
$ - end of string.
You could use the following regular expression to convert matches to empty strings. (I've escaped the leading space merely for readability.)
\ +\((?!.* \)).*
The converted string is presumably what you want, so there seems no point to saving it to a capture group. If you need to capture the part of the string that is converted to an empty string, replace .* with
(.*).
As this regex contains nothing more exotic the a positive lookahead it should work with most regex engines.
Start your engine!
The regex engine performs the following operations.
\ + : match 1+ spaces
\( : match '('
(?!.* \)) : use a negative lookahead to assert the remainder of
the line does contain the string ' )'`
.* : match 0+ characters other than line terminators
I've assumed you want to remove all spaces preceding the left parenthesis that is preceded by at least one space. If, for example, the string were:
Yasmani Grandal (1B 1.84)
^^^^^^^^^^^^^^^
the part identified by the party hats would be converted to an empty string.
Can you try this and let me know if this works?
(.+)\s\(.*
public class HelloWorld{
public static void main(String []args){
String[] names = new String[] {"Yasmani Grandal (1B 1.84)","J.T. Realmuto","WillD. Smith(LAD)","Adley(round/1/2019) Rutschman"};
for (String in : names)
System.out.println(in.replaceAll("(.+)\\s\\(.*","$1"));
}
}
Please note I wrote a minimal expression for this. You can extend it as per your additional requirements. The above code works just fine.

Regex to extract string if there is or not a specific word

Hi I'm a regex noob and I'd like to make a regex in order to extract the penultimate string from the URL if the word "xxxx" is contained or the last string if the word "xxxx" is not contained.
For example, I could have 2 scenarios:
www.hello.com/aaaa/1adf0023efae456
www.hello.com/aaaa/1adf0023efae456/xxxx
In both cases I want to extract the string 1adf0023efae456.
I've tried something like (?=(\w*xxxx\w*)\/.*\/(.*?)\/|[^\/]+$) but doesn't work properly.
You can match the forward slash before the digits, then match digits and assert what follows is either xxxx or the end of the string.
\d+(?=/xxxx|$)
Regex demo
If there should be a / before matching the digits, you could use a capturing group and get the value from group 1
/(\d+)(?=/xxxx|$)
/ Match /
(\d+) Capture group 1, match 1+ digits
(?=/xxxx|$) Positive lookahead, assert what is on the right is either xxxx or end of string
Regex demo
Edit
If there could possibly also be alphanumeric characters instead of digits, you could use a character class [a-z0-9]+ with an optional non capturing group.
/([a-z0-9]+)(?:/xxxx)?$
Regex demo
To match any char except a whitespace char or a forward slash, use [^\s/]+
Using lookarounds, you could assert a / on the left, match 1+ alphanumerics and assert what is at the right is either /xxxx or the end of the string which did not end with /xxxx
(?<=/)[a-z0-9]+(?=/xxxx$|$(?<!/xxxx))
Regex demo
You could avoid Regex:
string[] strings =
{
"www.hello.com/aaaa/1adf0023efae456",
"www.hello.com/aaaa/1adf0023efae456/xxxx"
};
var x = strings.Select(s => s.Split('/'))
.Select(arr => new { upper = arr.GetUpperBound(0), arr })
.Select(z => z.arr[z.upper] == "xxxx" ? z.arr[z.upper - 1] : z.arr[z.upper]);

how to capture from group from end line in js regex?

I'm trying to capture a text into 3 groups I have managed to capture 2 groups but having an issue with the 3rd group.
This is the text :
<13>Apr 5 16:09:47 node2 Services: 2016-04-05 16:09:46,914 INFO [3]
Drivers.KafkaInvoker - KafkaInvoker.SendMessages - After sending
itemsCount=1
I'm using the following regex:
(?=- )(.*?)(?= - )|(?=])(.*?)(?= -)
My 3rd group should be : "After sending itemsCount=1"
any suggestions?
Your original expression is fine, just missing a $:
(?=- )(.*?)(?= - |$)|(?=])(.*?)(?= -)
Demo
and maybe we would slightly modify that to an expression similar to:
(?=-\s+).*?([A-Z].*?)(?=\s+-\s+|$)|(?=]\s+).*?([A-Z].*?)(?=\s+-)
Demo
You have 2 capturing groups. You don't get the match for the third part because the postitive lookahead in the first alternation is not considering the end of the string. You might solve that by using an alternation to look at either a space or assert the end of the string
(?=[-\]] )(.*?)(?= - |$)
^^
If those matches are ok, you could simplify that pattern by making use of a character class to match either - or ] like [-\]] and omit the alternation and the group as you now have only the matches.
Your pattern then might look like (also capturing the leading hyphen like the first 2 matches)
(?=[-\]] ).*?(?= - |$)
Regex demo
If this is your string and you want to have 3 capturing groups, you might use:
^.*?\[\d+\]([^-]+)-([^-]+)-\s*([^-]+)$
^ Start of string
.*? Match any char except a newline non greedy
\[\d+\] match [ 1+ digits ]
([^-]+)- Capture group 1, match 1+ times not -, then match -
([^-]+)- Capture group 2, match 1+ times not -, then match -
\s* Match 0+ whitespace chars
([^-]+) Capture group 2, match 1+ times not -
$ End of string
Regex demo
For example creating the desired object from the comments, you could first get all the matches from match[0] and store those in an array.
After you have have all the values, assemble the object using the keys and the values.
var output = {};
var regex = new RegExp(/(?=[-\]] ).*?(?= - |$)/g);
var str = `<13>Apr 5 16:09:47 node2 Services: 2016-04-05 16:09:46,914 INFO [3] Drivers.KafkaInvoker - KafkaInvoker.SendMessages - After sending itemsCount=1`;
var match;
var values = [];
var keys = ['Thread', 'Class', 'Message'];
while ((match = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (match.index === regex.lastIndex) {
regex.lastIndex++;
}
values.push(match[0]);
}
keys.forEach((key, index) => output[key] = values[index]);
console.log(output);

Extract only the first occurence of the search string and ignore everything after /

I'm new to regex and want to display all the folders that contain the string name but ignore the characters or inner directories after "/"
Using regex only
(*spark?/)
Below are the set of directories:
/app-logs/spark/logs/application_15262_85484
/user/oozie/share/lib/lib_36456456/spark
/app-logs/spark/logs
/app-logs/spark
/apps/spark/warehouse
My result should be:
/app-logs/spark
/user/oozie/share/lib/lib_36456456/spark
/app-logs/spark
/apps/spark
The expression we might be looking for here, would be:
(spark)\/?.*
which we would replace it with our first capturing group, $1.
Demo
Test
const regex = /(spark)\/?.*/gm;
const str = `/app-logs/spark/logs/application_15262_85484
/user/oozie/share/lib/lib_36456456/spark
/app-logs/spark/logs
/app-logs/spark
/apps/spark/warehouse`;
const subst = `$1`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log(result);
Your pattern (*spark?/) is not valid because before the quantifier * there is a an opening parenthesis for the capturing group which is not valid. The questionmark after the k means that the character k is optional.
You could use a repeating pattern to match a forward slash followed by matching not a forward slash, then match /spark
^(?:/[^/\n]+)+/spark
Explanation
^ Assert start of string
(?: Non capturing group
/[^/\n]+ Match /, then match 1+ times not / or a newline
)+ Close non capturing group and repeat 1+ times
/spark Match /spark
Regex demo

How do I match the contents of parenthesis in a scala regular expression

I'm trying to get at the contents of a string like this (2.2,3.4) with a scala regular expression to obtain a string like the following 2.2,3.4
This will get me the string with parenthesis and all from a line of other text:
"""\(.*?\)"""
But I can't seem to find a way to get just the contents of the parenthesis.
I've tried: """\((.*?)\)""" """((.*?))""" and some other combinations, without luck.
I've used this one in the past in other Java apps: \\((.*?)\\), which is why I thought the first attempt in the line above """\((.*?)\)""" would work.
For my purposes, this looks something like:
var points = "pointA: (2.12, -3.48), pointB: (2.12, -3.48)"
var parenth_contents = """\((.*?)\)""".r;
val center = parenth_contents.findAllIn(points(0));
var cxy = center.next();
val cx = cxy.split(",")(0).toDouble;
Use Lookahead and Lookbehind
You can use this regex:
(?<=\()\d+\.\d+,\d+\.\d+(?=\))
Or, if you don't need precision inside the parentheses:
(?<=\()[^)]+(?=\))
See demo 1 and demo 2
Explanation
The lookbehind (?<=\() asserts that what precedes is a (
\d+\.\d+,\d+\.\d+ matches the string
or, in Option 2, [^)]+ matches any chars that are not a closing parenthesis
The lookahead (?=\)) asserts that what follows is a )
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
May be try this out
val parenth_contents = "\\(([^)]+)\\)".r
parenth_contents: scala.util.matching.Regex = \(([^)]+)\)
val parenth_contents(r) = "(123, abc)"
r: String = 123, abc
A even sample regex for matching all occurrence of both parenthesis itself and content inside the parenthesises.
(\([^)]+\)+)
1st Capturing Group (\([^)]+\)+)
\( matches the character ( literally (case sensitive)
Match a single character not present in the list below [^)]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
) matches the character ) literally (case sensitive)
\)+ matches the character ) literally (case sensitive)
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
https://regex101.com/r/MMNRRo/1
\((.*?)\) works - you just need to extract the matched group. The easiest way to do that is to use the unapplySeq method of scala.util.matching.Regex:
scala> val wrapped = raw"\((.*?)\)".r
wrapped: scala.util.matching.Regex = \((.*?)\)
val wrapped(r) = "(123,abc)"
r: String = 123,abc