Dart RegEx is not splitting String - regex

Im a fresher to RegEx.
I want to get all Syllables out of my String using this RegEx:
/[^aeiouy]*[aeiouy]+(?:[^aeiouy]*\$|[^aeiouy](?=[^aeiouy]))?/gi
And I implemented it in Dart like this:
void main() {
String test = 'hairspray';
final RegExp syllableRegex = RegExp("/[^aeiouy]*[aeiouy]+(?:[^aeiouy]*\$|[^aeiouy](?=[^aeiouy]))?/gi");
print(test.split(syllableRegex));
}
The Problem:
Im getting the the word in the List not being splitted.
What do I need to change to get the Words divided as List.
I tested the RegEx on regex101 and it shows up to Matches.
But when Im using it in Dart with firstMatch I get null

You need to
Use a mere string pattern without regex delimiters in Dart as a regex pattern
Flags are not used, i is implemented as a caseSensitive option to RegExp and g is implemented as a RegExp#allMatches method
You need to match and extract, not split with your pattern.
You can use
String test = 'hairspray';
final RegExp syllableRegex = RegExp(r"[^aeiouy]*[aeiouy]+(?:[^aeiouy]*$|[^aeiouy](?=[^aeiouy]))?",
caseSensitive: true);
for (Match match in syllableRegex.allMatches(test)) {
print(match.group(0));
}
Output:
hair
spray

Related

How to get all sub-strings of a specific format from a string

I have a large string and I want to get all sub-strings of format [[someword]] from it.
Meaning, get all words (list) which are wrapped in opening and closing square brackets.
Now one way to do this is splitting string by space and then filtering the list with this filter but the problem is some times [[someword]] does not exist as a word, it might have a ,, space or . right before of after it.
What is the best way to do this?
I will appreciate a solution in Scala but as this is more of a programming problem, I will convert your solution to Scala if it's in some other language I know e.g. Python.
This question is different from marked duplicate because the regex needs to able to accommodate characters other than English characters in between the brackets.
You can use this (?<=\[{2})[^[\]]+(?=\]{2}) regex to match and extract all the words you need that are contained in double square brackets.
Here is a Python solution,
import re
s = 'some text [[someword]] some [[some other word]]other text '
print(re.findall(r'(?<=\[{2})[^[\]]+(?=\]{2})', s))
Prints,
['someword', 'some other word']
I never worked in Scala but here is a solution in Java and as I know Scala is based upon Java only hence this may help.
String s = "some text [[someword]] some [[some other word]]other text ";
Pattern p = Pattern.compile("(?<=\\[{2})[^\\[\\]]+(?=\\]{2})");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group());
}
Prints,
someword
some other word
Let me know if this is what you were looking for.
Scala solution:
val text = "[[someword1]] test [[someword2]] test 1231"
val pattern = "\\[\\[(\\p{L}+)]\\]".r //match words with brackets and get content with group
val values = pattern
.findAllIn(text)
.matchData
.map(_.group(1)) //get 1st group
.toList
println(values)

Simple Regex text replace then add suffix

Im have a program that vbcan only handle basic regex no C# vb.net etc.
This is my situation.
I have a set of start Urls.
http://www.foo.com?code=234654
I need to remove the ?code= and replace with a / then add the letter t at the end.
Like this:
http://www.foo.com/234654t
I would appreciate any help this this.
Thanks
Sean
For the dialect that is used in java.util.regex you can use this regular expression, for example:
String regex = "\\?+[A-Za-z=]+([0-9]+)(?<=[0-9]+)(?=$)";
String replacement = "/$1t";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(line);
if (m.find()) {
System.out.println(m.replaceAll(replacement));
}
Another example, by using replaceAll:
line.replaceAll("\\?+[A-Za-z=]+", "/").replaceAll("(?<=[0-9|/]+)(?=$)", "t");
For the string:
String line = "http://www.foo.com?code=234654";
You'll get:
http://www.foo.com/234654t

Regex in C# only returns the first match

I am trying to just simply disassemble a comma-separated string using the Regex below:
[^,]+
However, I get a different result from this Regex in C# than other engines such as online Regex compilers.
C# for some reason only detects the first element in the string and that's all.
Sample comma-separated string compiled online.
The code I use in C# which returns: Foo
var longString = "Foo, \nBar, \nBaz, \nQux"
var match = Regex.Match(longString, #"[^,]+");
var cutStrings = new List<string>();
if (match.Success)
{
foreach (var capture in match.Captures)
{
cutStrings.Add(capture.ToString());
}
}
Regex.Match returns the first match. Try Regex.Matches to give you the collection of results.

Dart: RegExp by example

I'm trying to get my Dart web app to: (1) determine if a particular string matches a given regex, and (2) if it does, extract a group/segment out of the string.
Specifically, I want to make sure that a given string is of the following form:
http://myapp.example.com/#<string-of-1-or-more-chars>[?param1=1&param2=2]
Where <string-of-1-or-more-chars> is just that: any string of 1+ chars, and where the query string ([?param1=1&param2=2]) is optional.
So:
Decide if the string matches the regex; and if so
Extract the <string-of-1-or-more-chars> group/segment out of the string
Here's my best attempt:
String testURL = "http://myapp.example.com/#fizz?a=1";
String regex = "^http://myapp.example.com/#.+(\?)+\$";
RegExp regexp= new RegExp(regex);
Iterable<Match> matches = regexp.allMatches(regex);
String viewName = null;
if(matches.length == 0) {
// testURL didn't match regex; throw error.
} else {
// It matched, now extract "fizz" from testURL...
viewName = ??? // (ex: matches.group(2)), etc.
}
In the above code, I know I'm using the RegExp API incorrectly (I'm not even using testURL anywhere), and on top of that, I have no clue how to use the RegExp API to extract (in this case) the "fizz" segment/group out of the URL.
The RegExp class comes with a convenience method for a single match:
RegExp regExp = new RegExp(r"^http://myapp.example.com/#([^?]+)");
var match = regExp.firstMatch("http://myapp.example.com/#fizz?a=1");
print(match[1]);
Note: I used anubhava's regular expression (yours was not escaping the ? correctly).
Note2: even though it's not necessary here, it is usually a good idea to use raw-strings for regular expressions since you don't need to escape $ and \ in them. Sometimes using triple-quote raw-strings are convenient too: new RegExp(r"""some'weird"regexp\$""").
Try this regex:
String regex = "^http://myapp.example.com/#([^?]+)";
And then grab: matches.group(1)
String regex = "^http://myapp.example.com/#([^?]+)";
Then:
var match = matches.elementAt(0);
print("${match.group(1)}"); // output : fizz

Using Regex is there a way to match outside characters in a string and exclude the inside characters?

I know I can exclude outside characters in a string using look-ahead and look-behind, but I'm not sure about characters in the center.
What I want is to get a match of ABCDEF from the string ABC 123 DEF.
Is this possible with a Regex string? If not, can it be accomplished another way?
EDIT
For more clarification, in the example above I can use the regex string /ABC.*?DEF/ to sort of get what I want, but this includes everything matched by .*?. What I want is to match with something like ABC(match whatever, but then throw it out)DEF resulting in one single match of ABCDEF.
As another example, I can do the following (in sudo-code and regex):
string myStr = "ABC 123 DEF";
string tempMatch = RegexMatch(myStr, "(?<=ABC).*?(?=DEF)"); //Returns " 123 "
string FinalString = myStr.Replace(tempMatch, ""); //Returns "ABCDEF". This is what I want
Again, is there a way to do this with a single regex string?
Since the regex replace feature in most languages does not change the string it operates on (but produces a new one), you can do it as a one-liner in most languages. Firstly, you match everything, capturing the desired parts:
^.*(ABC).*(DEF).*$
(Make sure to use the single-line/"dotall" option if your input contains line breaks!)
And then you replace this with:
$1$2
That will give you ABCDEF in one assignment.
Still, as outlined in the comments and in Mark's answer, the engine does match the stuff in between ABC and DEF. It's only the replacement convenience function that throws it out. But that is supported in pretty much every language, I would say.
Important: this approach will of course only work if your input string contains the desired pattern only once (assuming ABC and DEF are actually variable).
Example implementation in PHP:
$output = preg_replace('/^.*(ABC).*(DEF).*$/s', '$1$2', $input);
Or JavaScript (which does not have single-line mode):
var output = input.replace(/^[\s\S]*(ABC)[\s\S]*(DEF)[\s\S]*$/, '$1$2');
Or C#:
string output = Regex.Replace(input, #"^.*(ABC).*(DEF).*$", "$1$2", RegexOptions.Singleline);
A regular expression can contain multiple capturing groups. Each group must consist of consecutive characters so it's not possible to have a single group that captures what you want, but the groups themselves do not have to be contiguous so you can combine multiple groups to get your desired result.
Regular expression
(ABC).*(DEF)
Captures
ABC
DEF
See it online: rubular
Example C# code
string myStr = "ABC 123 DEF";
Match m = Regex.Match(myStr, "(ABC).*(DEF)");
if (m.Success)
{
string result = m.Groups[1].Value + m.Groups[2].Value; // Gives "ABCDEF"
// ...
}