Optional grouping in Scala Regular Expressions - regex

Say I want to make a regex that splits a optional version number from a file name i.e
val regex(name, ver) = "file.jar" // name = file, ver = empty
val regex(name, ver) = "some-software.jar" // name = some-software, ver = empty
val regex(name, ver) = "software-1.0.jar" // name = software, ver = 1.0
val regex(name, ver) = "some-file-1.0.jar" // name = some-file, ver = 1.0
How is such a regular expression written in Scala/Java ?. In perl I would do something along the lines of:
(.*)(-(\d|.)*)?.jar
but Scala does not seam to support making optional groups in this syntax.

I am not sure what your question now is.
I assume it is not matching the second group, because the first one is greedy and since the second is optional, the first matches everything.
Try this:
(.*?)(?:-(?=\d)(.*))?.jar
See it here on Regexr
I made the first group a lazy match with the .*?
The second group is a non capturing group (the one starting with (?:. You will find the name now in group 1 and the version in group 2.
I put a lookahead after the dash, so that it searches for a dash followed by a digit.

How about:
/^(.+?)(?:-([\d.]*))?\.jar$/
Assuming version is always a mixed of digits and dots.

Related

Extract everything between pipes in key value pair

I have following sourceString
|User=gmailUser1|login with password=false|addition information=|source IP location=DE|
I want to extract everything between pipes in key value pair. In this case
User=gmailUser1
Login with password=false
addition information=
Source IP location=DE
My regex pattern is giving me the entire string.
\|(\b+)=(\b+)\|
Try with the expression:
/\|([^=|]+)=([^|]*)/g
or if you just want the pattern:
\|([^=|]+)=([^|]*)
Depending on your environment you will be able to get captures of group 1 and 2 for each key-value pair.
(I'm not able to test it out right now.)
Update 1: I did a short test and adapted it with the optimization of Wiktor Stribizew.
Update 2: Short explanation of the regex used:
The \b in your pattern means word boundary and does not represend a sign. You cannot combine it with +. See also What is a word boudary.
The first group ([^=|]+) matches anything that is not a = or a | with at least one character.
The second group ([^|]*) matches anything that is not a = with zero or more characters (addition information has an empty value).
Try this:
\w+(=|\s|\w+)
this match:
\w+ = numletter chars and a matching group
(=|\s|\w+) = a = sing, blank space or another numletter group

Surrounding one group with special characters in using substitute in vim

Given string:
some_function(inputId = "select_something"),
(...)
some_other_function(inputId = "some_other_label")
I would like to arrive at:
some_function(inputId = ns("select_something")),
(...)
some_other_function(inputId = ns("some_other_label"))
The key change here is the element ns( ... ) that surrounds the string available in the "" after the inputId
Regex
So far, I have came up with this regex:
:%substitute/\(inputId\s=\s\)\(\"[a-zA-Z]"\)/\1ns(/2/cgI
However, when deployed, it produces an error:
E488: Trailing characters
A simpler version of that regex works, the syntax:
:%substitute/\(inputId\s=\s\)/\1ns(/cgI
would correctly inser ns( after finding inputId = and create string
some_other_function(inputId = ns("some_other_label")
Challenge
I'm struggling to match the remaining part of the string, ex. "select_something") and return it as:
"select_something")).
You have many problems with your regex.
[a-zA-Z] will only match one letter. Presumably you want to match everything up to the next ", so you'll need a \+ and you'll also need to match underscores too. I would recommend \w\+. Unless more than [a-zA-Z_] might be in the string, in which case I would do .\{-}.
You have a /2 instead of \2. This is why you're getting E488.
I would do this:
:%s/\(inputId = \)\(".\{-}\)"/\1ns(\2)/cgI
Or use the start match atom: (that is, \zs)
:%s/inputId = \zs\".\{-}"/ns(&)/cgI
You can use a negated character class "[^"]*" to match a quoted string:
%s/\(inputId\s*=\s*\)\("[^"]*"\)/\1ns(\2)/g

Regex to grab formulas

I am trying to parse a file that contains parameter attributes. The attributes are setup like this:
w=(nf*40e-9)*ng
but also like this:
par_nf=(1) * (ng)
The issue is, all of these parameter definitions are on a single line in the source file, and they are separated by spaces. So you might have a situation like this:
pd=2.0*(84e-9+(1.0*nf)*40e-9) nf=ng m=1 par=(1) par_nf=(1) * (ng) plorient=0
The current algorithm just splits the line on spaces and then for each token, the name is extracted from the LHS of the = and the value from the RHS. My thought is if I can create a Regex match based on spaces within parameter declarations, I can then remove just those spaces before feeding the line to the splitter/parser. I am having a tough time coming up with the appropriate Regex, however. Is it possible to create a regex that matches only spaces within parameter declarations, but ignores the spaces between parameter declarations?
Try this RegEx:
(?<=^|\s) # Start of each formula (start of line OR [space])
(?:.*?) # Attribute Name
= # =
(?: # Formula
(?!\s\w+=) # DO NOT Match [space] Word Characters = (Attr. Name)
[^=] # Any Character except =
)* # Formula Characters repeated any number of times
When checking formula characters, it uses a negative lookahead to check for a Space, followed by Word Characters (Attribute Name) and an =. If this is found, it will stop the match. The fact that the negative lookahead checks for a space means that it will stop without a trailing space at the end of the formula.
Live Demo on Regex101
Thanks to #Andy for the tip:
In this case I'll probably just match on the parameter name and equals, but replace the preceding whitespace with some other "parse-able" character to split on, like so:
(\s*)\w+[a-zA-Z_]=
Now my first capturing group can be used to insert something like a colon, semicolon, or line-break.
You need to add Perl tag. :-( Maybe this will help:
I ended up using this in C#. The idea was to break it into name value pairs, using a negative lookahead specified as the key to stop a match and start a new one. If this helps
var data = #"pd=2.0*(84e-9+(1.0*nf)*40e-9) nf=ng m=1 par=(1) par_nf=(1) * (ng) plorient=0";
var pattern = #"
(?<Key>[a-zA-Z_\s\d]+) # Key is any alpha, digit and _
= # = is a hard anchor
(?<Value>[.*+\-\\\/()\w\s]+) # Value is any combinations of text with space(s)
(\s|$) # Soft anchor of either a \s or EOB
((?!\s[a-zA-Z_\d\s]+\=)|$) # Negative lookahead to stop matching if a space then key then equal found or EOB
";
Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture)
.OfType<Match>()
.Select(mt => new
{
LHS = mt.Groups["Key"].Value,
RHS = mt.Groups["Value"].Value
});
Results:

Music Chord part splitting Regex

This is a follow-up question to this one: Regex for matching a music Chord, asked by me.
Now that I have a regex to know whether a String representation of a chord is valid or not (previous question), how can I effectively get the three different parts of the chord (the root note, accidentals and chord type) into seperate variables?
I could do simple string manipulation, but I guess that it would be easier to build on the previous code and use regex for that, or am I am wrong?
Here is the updated code from the aforementioned question:
public static void regex(String chord) {
String notes = "^[CDEFGAB]";
String accidentals = "(#|##|b|bb)?";
String chords = "(maj7|maj|min7|min|sus2)";
String regex = notes + accidentals + chords;
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(chord);
System.out.println("regex is " + regex);
if (matcher.find()) {
int i = matcher.start();
int j = matcher.end();
System.out.println("i:" + i + " j:" + j);
}
else {
System.out.println("no match!");
}
}
Thanks.
Enclosing something with parentheses (except in cases with special meaning) creates a capturing group, or subpattern.
You already have accidentals and chords grouped as subpatterns like that, but you need to add parentheses to notes to capture that as a subpattern too.
String notes = "^([CDEFGAB])";
String accidentals = "(#|##|b|bb)?";
String chords = "(maj7|maj|min7|min|sus2)";
By convention, the string that is matched by the entire pattern is group 0, then every subpattern is captured as group 1, group 2, and so on.
I'm not a Java guy, but after reading the docs it looks like you would access your subpattern matches using .group():
String note = matcher.group(1);
String acci = matcher.group(2);
String chor = matcher.group(3);
Edit:
Originally, I suggested String accidentals = "((?:#|##|b|bb)?)";, because I was worried that the second subpattern being optional would have caused a group numbering problem if no match existed for it. However, a little testing suggests that even without wrapping it in a non-capturing grouping (?: ) like that, group 2 is always present but empty if there was no match. (Empty string in group 2 was the desired effect anyway.) So, it seems that
... = "(#|##|b|bb)?"; probably would suffice after all.
You've already done the work. Just add one more capture group so that your final regex becomes:
^([CDEFGAB])(#|##|b|bb)?(maj7|maj|min7|min|sus2)?$
And your note, accidental, and chord will be in the first, second, and third captures, respectively.
I like the accepted answer, but as a guitar player I do encounter chords with an extra bass note added, such as G/D, or A/D, or D/F#. Of course, there are a number of other chord names you might encounter such as : 5 , 6, min6, 9, min9, sus4 ... etc. You might consider adding to the number of possible string chords, and then adding something for the bass accidentals if you have any:
String chords =
"(maj|maj7|maj9|maj11|maj13|maj9#11|maj13#11|6|add9|maj7b5|maj7#5||min|m7|m9|m11|m13|
m6|madd9|m6add9|mmaj7|mmaj9|m7b5|m7#5|7|9|11|13|7sus4|7b5|7#5|7b9|7#9|7b5b9|7b5#9|
7#5b9|9#5|13#11|13b9|11b9|aug|dim|dim7|sus4|sus2|sus2sus4|-5|)";
String bass = "/([CDEFGAB])";
In order to complete the "String chords" definition, you might want to consult a chord dictionary. CHEERS!

Get second part of a string using RegEx

I have string like this "first#second", and I wonder how to get "second" part without "#" symbol as result of RegEx, not as match capture using brackets
upd: I forgot to add one more special char at the end of string, real string is "first#second*"
Simple regex:
/#(.*)$/
If you really don't want it to be a match capture, and you know there's a # in the string but none in the part you want, you can do
/[^#]*$/
and the whole regex is what you want.
If you must use regex, and you insist on not using capturing groups, you can use lookbehind in flavors that support them like this:
(?<=#).*
Or you can also capture just anything but #, to the end of the string, so something like this:
[^#]*$
The capturing group option, of course, is:
#(.*)
\__/
1
This matches the # too, but group 1 captures the part that you want.
Lastly, a non-regex alternative may look something like this:
secondPart = wholeString.substring( wholeString.indexOf("#") + 1 )
There may be issues with some of these solutions if # can also appear (perhaps escaped) anywhere else in the string.
References
regular-expressions.info
Lookarounds, Brackets for Capturing, Anchors
/[a-z]+#([a-z]+)/
You can use lookaround to exclude parts of an expression.
http://www.regular-expressions.info/lookaround.html
if your using java then
you can consider using Pattern & Matcher class. Pattern gives you a compiled, optimizer version of Regular expression. Matcher gives a complete internals of RE Matches.
Both Pattern.match & String.spilt gives same result where in first is compartively faster.
for e.g)
String s = "first#second#third";
String re = "#";
Pattern p = Pattern.compile(re);
Matcher m = p.matcher();
int ms = 0;
int me = 0;
while( m.find() ) {
System.out.println("start "+m.start()+" end "+ m.end()+" group "+m.group());
me = m.start();
System.out.println(s.substring(ms,me));
ms = m.end();
}
if other language u can consider using back-reference & groups also. if you find any repetitions.