Using RegEx split the string - regex

I have a string like '[1]-[2]-[3],[4]-[5],[6,7,8],[9]' or '[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]', I'd like the Pattern to get the list result, but don't know how to figure out the pattern. Basically the comma is the split, but [6,7,8] itself contains the comma as well.
the string: [1]-[2]-[3],[4]-[5],[6,7,8],[9]
the result:
[1]-[2]-[3]
[4]-[5]
[6,7,8]
[9]
or
the string: [Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]
the result:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]

,(?=\[)
This pattern splits on any comma that is followed by a bracket, but keeps the bracket within the result text.
The (?=*stuff*) is known as a "lookahead assertion". It acts as a condition for the match but is not itself part of the match.
In C# code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
foreach(String s in Regex.Split(inputstring, #",(?=\[)"))
System.Console.Out.WriteLine(s);
In Java code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile(",(?=\\[)"));
for(String s : p.split(inputstring))
System.out.println(s);
Either produces:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]

Although I believe the best approach here is to use split (as presented by #j__m's answer), here's an approach that uses matching rather than splitting.
Regex:
(\[.*?\](?!-))
Example usage:
String input = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile("(\\[.*?\\](?!-))");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Resulting output:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]

An answer that doesn't use regular expressions (if that's worth something in ease of understanding what's going on) is:
substitute "]#[" for "],["
split on "#"

Related

I want to exact match characters using regex in JS? [duplicate]

What is the regular expression (in JavaScript if it matters) to only match if the text is an exact match? That is, there should be no extra characters at other end of the string.
For example, if I'm trying to match for abc, then 1abc1, 1abc, and abc1 would not match.
Use the start and end delimiters: ^abc$
It depends. You could
string.match(/^abc$/)
But that would not match the following string: 'the first 3 letters of the alphabet are abc. not abc123'
I think you would want to use \b (word boundaries):
var str = 'the first 3 letters of the alphabet are abc. not abc123';
var pat = /\b(abc)\b/g;
console.log(str.match(pat));
Live example: http://jsfiddle.net/uu5VJ/
If the former solution works for you, I would advise against using it.
That means you may have something like the following:
var strs = ['abc', 'abc1', 'abc2']
for (var i = 0; i < strs.length; i++) {
if (strs[i] == 'abc') {
//do something
}
else {
//do something else
}
}
While you could use
if (str[i].match(/^abc$/g)) {
//do something
}
It would be considerably more resource-intensive. For me, a general rule of thumb is for a simple string comparison use a conditional expression, for a more dynamic pattern use a regular expression.
More on JavaScript regexes: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
"^" For the begining of the line "$" for the end of it. Eg.:
var re = /^abc$/;
Would match "abc" but not "1abc" or "abc1". You can learn more at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

Match longest substring with regex [duplicate]

I tried looking for an answer to this question but just couldn't finding anything and I hope that there's an easy solution for this. I have and using the following code in C#,
String pattern = ("(hello|hello world)");
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
var matches = regex.Matches("hello world");
Question is, is there a way for the matches method to return the longest pattern first? In this case, I want to get "hello world" as my match as opposed to just "hello". This is just an example but my pattern list consist of decent amount of words in it.
If you already know the lengths of the words beforehand, then put the longest first. For example:
String pattern = ("(hello world|hello)");
The longest will be matched first. If you don't know the lengths beforehand, this isn't possible.
An alternative approach would be to store all the matches in an array/hash/list and pick the longest one manually, using the language's built-in functions.
Regular expressions (will try) to match patterns from left to right. If you want to make sure you get the longest possible match first, you'll need to change the order of your patterns. The leftmost pattern is tried first. If a match is found against that pattern, the regular expression engine will attempt to match the rest of the pattern against the rest of the string; the next pattern will be tried only if no match can be found.
String pattern = ("(hello world|hello wor|hello)");
Make two different regex matches. The first will match your longer option, and if that does not work, the second will match your shorter option.
string input = "hello world";
string patternFull = "hello world";
Regex regexFull = new Regex(patternFull, RegexOptions.IgnoreCase);
var matches = regexFull.Matches(input);
if (matches.Count == 0)
{
string patternShort = "hello";
Regex regexShort = new Regex(patternShort, RegexOptions.IgnoreCase);
matches = regexShort.Matches(input);
}
At the end, matches will be be the output of "full" or "short", but "full" will be checked first and will short-circuit if it is true.
You can wrap the logic in a function if you plan on calling it many times. This is something I came up with (but there are plenty of other ways you can do this).
public bool HasRegexMatchInOrder(string input, params string[] patterns)
{
foreach (var pattern in patterns)
{
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(input))
{
return true;
}
}
return false;
}
string input = "hello world";
bool hasAMatch = HasRegexMatchInOrder(input, "hello world", "hello", ...);

Regex, close it off in both ends? [duplicate]

What is the regular expression (in JavaScript if it matters) to only match if the text is an exact match? That is, there should be no extra characters at other end of the string.
For example, if I'm trying to match for abc, then 1abc1, 1abc, and abc1 would not match.
Use the start and end delimiters: ^abc$
It depends. You could
string.match(/^abc$/)
But that would not match the following string: 'the first 3 letters of the alphabet are abc. not abc123'
I think you would want to use \b (word boundaries):
var str = 'the first 3 letters of the alphabet are abc. not abc123';
var pat = /\b(abc)\b/g;
console.log(str.match(pat));
Live example: http://jsfiddle.net/uu5VJ/
If the former solution works for you, I would advise against using it.
That means you may have something like the following:
var strs = ['abc', 'abc1', 'abc2']
for (var i = 0; i < strs.length; i++) {
if (strs[i] == 'abc') {
//do something
}
else {
//do something else
}
}
While you could use
if (str[i].match(/^abc$/g)) {
//do something
}
It would be considerably more resource-intensive. For me, a general rule of thumb is for a simple string comparison use a conditional expression, for a more dynamic pattern use a regular expression.
More on JavaScript regexes: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
"^" For the begining of the line "$" for the end of it. Eg.:
var re = /^abc$/;
Would match "abc" but not "1abc" or "abc1". You can learn more at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

Regular expression checking URLs, only allowing lowercase [duplicate]

What is the regular expression (in JavaScript if it matters) to only match if the text is an exact match? That is, there should be no extra characters at other end of the string.
For example, if I'm trying to match for abc, then 1abc1, 1abc, and abc1 would not match.
Use the start and end delimiters: ^abc$
It depends. You could
string.match(/^abc$/)
But that would not match the following string: 'the first 3 letters of the alphabet are abc. not abc123'
I think you would want to use \b (word boundaries):
var str = 'the first 3 letters of the alphabet are abc. not abc123';
var pat = /\b(abc)\b/g;
console.log(str.match(pat));
Live example: http://jsfiddle.net/uu5VJ/
If the former solution works for you, I would advise against using it.
That means you may have something like the following:
var strs = ['abc', 'abc1', 'abc2']
for (var i = 0; i < strs.length; i++) {
if (strs[i] == 'abc') {
//do something
}
else {
//do something else
}
}
While you could use
if (str[i].match(/^abc$/g)) {
//do something
}
It would be considerably more resource-intensive. For me, a general rule of thumb is for a simple string comparison use a conditional expression, for a more dynamic pattern use a regular expression.
More on JavaScript regexes: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
"^" For the begining of the line "$" for the end of it. Eg.:
var re = /^abc$/;
Would match "abc" but not "1abc" or "abc1". You can learn more at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

Regex to match only links starting with www [duplicate]

What is the regular expression (in JavaScript if it matters) to only match if the text is an exact match? That is, there should be no extra characters at other end of the string.
For example, if I'm trying to match for abc, then 1abc1, 1abc, and abc1 would not match.
Use the start and end delimiters: ^abc$
It depends. You could
string.match(/^abc$/)
But that would not match the following string: 'the first 3 letters of the alphabet are abc. not abc123'
I think you would want to use \b (word boundaries):
var str = 'the first 3 letters of the alphabet are abc. not abc123';
var pat = /\b(abc)\b/g;
console.log(str.match(pat));
Live example: http://jsfiddle.net/uu5VJ/
If the former solution works for you, I would advise against using it.
That means you may have something like the following:
var strs = ['abc', 'abc1', 'abc2']
for (var i = 0; i < strs.length; i++) {
if (strs[i] == 'abc') {
//do something
}
else {
//do something else
}
}
While you could use
if (str[i].match(/^abc$/g)) {
//do something
}
It would be considerably more resource-intensive. For me, a general rule of thumb is for a simple string comparison use a conditional expression, for a more dynamic pattern use a regular expression.
More on JavaScript regexes: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
"^" For the begining of the line "$" for the end of it. Eg.:
var re = /^abc$/;
Would match "abc" but not "1abc" or "abc1". You can learn more at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions