Formatting regex in Dart on several lines - regex

I have
Pattern pattern = r'^((?:19|20)\d\d)[- /.]
(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$';
My editor shows an error on this regexp:
How can I fix it?

You entered a line break inside a string literal, that is why you get a syntax issue.
If you want to split a pattern into several lines, just use string concatenation:
Pattern pattern = r'^((?:19|20)\d\d)[- /.]' +
r'(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$';
Or, since string literals separated only with whitespace characters are concatenated automatically:
Pattern pattern = r'^((?:19|20)\d\d)[- /.]'
r'(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$';
Or, if you plan to re-use a long pattern, you may define this part as a variable, and just use string interpolation:
String d = r'((?:19|20)\d\d)';
String M = r'(0[1-9]|1[012])';
String y = r'(0[1-9]|[12][0-9]|3[01])';
String sep = r'[- /.]';
Pattern pattern = '^$d$sep$M$sep$y\$';

Related

Capturing a delimiter that isn't in between single quotes

Like the question says, is it possible to use a single Regex string to get a delimiter that isn't in between some quotes?
For example, I want to split this string with the delimiter &:
"example=3&testing='f&tmp'"
should produce
["example=3", "testing='f&tmp'"]
Essentially, things inside single quotes (' ') should remain untouched.
I found out how to get things within quotes with expression: (?:'.*?')
The closest I could get to a tangible solution was: (.[^']&[^'])
It is not an easy task for a String#split, but is quite a feasible task for Matcher#find if you use
[^&\s=]+=(?:'[^']*'|[^\s&]*)
(see this regex demo) and this Java code:
String text = "example=3&testing='f&tmp'";
Pattern p = Pattern.compile("[^&\\s=]+=(?:'[^']*'|[^\\s&]*)");
Matcher m = p.matcher(text);
List<String> res = new ArrayList<>();
while(m.find()) {
res.add(m.group());
}
System.out.println(res);
// => [example=3, testing='f&tmp']
Details
[^&\s=]+ - one or more chars other than &, = and whitespace
= - a = char
(?:'[^']*'|[^\s&]*) - a non-capturing group matching either ', zero or more chars other than ' and then a ', or zero or more chars other than whitespace and &.

Matlab: using regexp to get a string that has a whitespace in between

I want to use Regex to acquire some ID's in a cellstring array, the array looks like this:
myString = '(['US04650Y1001', 'US90274P3029', 'HON WI', 'US41165F1012'])';
My pattern for regex is as follows:
pattern = '[A-Za-z0-9.^_]+';
newArr = regexp(myString, pattern,'match');
I'd like to get the ID called 'HON WI', but with my current pattern, its splitting it into two because my pattern can't deal with the whitespace properly. I would like to get the whole "HON WI", as well as my other strings, everything that's in '', these might have special characters like ^, . or _, but I don't know how to add the whitespace.
I already tried stuff like this, without success:
pattern = '[A-Za-z0-9.^_\s]+';
My new array should have, in each cell, the strings/ID's contained in myString (US04650Y1001, US90274P3029, HON WI and US41165F1012) with dimensions 1x4.
Another approach that seems to work but not entirely sure:
myString = strrep(myString,'([','');
myString = strrep(myString,'])','');
myString = regexp(myString,',','split');
myString = strrep(myString,'''','');
This seems to get me what I want, but I would like to know how can I alter the regex on my first approach.
Many thanks in advance.
You may use a mere '([^']+)' regex and use 'tokens' to get the captures:
myString = '([''US04650Y1001'', ''US90274P3029'', ''HON WI'', ''US41165F1012''])';
pattern = '''([^'']+)''';
newArr = regexp(myString, pattern,'match', 'tokens');
The newArr will look like
{
[1,1] = 'US04650Y1001'
[1,2] = 'US90274P3029'
[1,3] = 'HON WI'
[1,4] = 'US41165F1012'
}
You may option is to use lookaround assertions. The following will match any string made of alphanumeric character or underscore (\w), space (' ') or characters . or ^, that is located between quotes. This will specifically exclude the blank space next to the comma, in the separation between tokens, i.e. ', ' does not give a match.
Note that \s will match any blank space character (including tab, newline), this is why a space is preferred here:
pattern2='(?<='')[\w.^ ]+(?='')';
pattern2 =
(?<=')[\w.^ ]+(?=')
newArr = regexp(myString, pattern2,'match');
newArr'
ans =
'US04650Y1001'
'US90274P3029'
'HON WI'
'US41165F1012'

.Net Regular Expression(Regex)

VB.NET separate strings using regex split?
Im having a logical error with the pattern string variable, the error occur after i extend the string from "(-)" to "(-)(+)(/)(*)"..
Dim input As String = txtInput.Text
Dim pattern As String = "(-)(+)(/)(*)"
Dim substrings() As String = Regex.Split(input, pattern)
For Each match As String In substrings
lstOutput.Items.Add(match)
This is my output when my pattern string variable is "-" it works fine
input: dog-
output: dog
-
My desired output(This is want i want to happen) but there is something wrong with the code.. its having an error after i did this "(-)(+)(/)()" even this
"(-)" + "(+)" + "(/)" + "()"
input: dog+cat/tree
output: dog
+
cat
/
tree
when space character input from textbox to listbox
input: dog+cat/ tree
output: dog
+
cat
/
tree
You need a character class, not the sequence of subpatterns inside separate capturing gorups:
Dim pattern As String = "([+/*-])"
This pattern will match and capture into Group 1 (and thus, all the captured values will be part of the resulting array) a char that is either a +, /, * or -. Note the position of the hyphen: since it is the last char in the character class, it is treated as a literal -, not a range operator.
See the regex demo:

Using RegEx split the string

I have a string like '[1]-[2]-[3],[4]-[5],[6,7,8],[9]' or '[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]', I'd like the Pattern to get the list result, but don't know how to figure out the pattern. Basically the comma is the split, but [6,7,8] itself contains the comma as well.
the string: [1]-[2]-[3],[4]-[5],[6,7,8],[9]
the result:
[1]-[2]-[3]
[4]-[5]
[6,7,8]
[9]
or
the string: [Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]
the result:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
,(?=\[)
This pattern splits on any comma that is followed by a bracket, but keeps the bracket within the result text.
The (?=*stuff*) is known as a "lookahead assertion". It acts as a condition for the match but is not itself part of the match.
In C# code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
foreach(String s in Regex.Split(inputstring, #",(?=\[)"))
System.Console.Out.WriteLine(s);
In Java code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile(",(?=\\[)"));
for(String s : p.split(inputstring))
System.out.println(s);
Either produces:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
Although I believe the best approach here is to use split (as presented by #j__m's answer), here's an approach that uses matching rather than splitting.
Regex:
(\[.*?\](?!-))
Example usage:
String input = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile("(\\[.*?\\](?!-))");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Resulting output:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
An answer that doesn't use regular expressions (if that's worth something in ease of understanding what's going on) is:
substitute "]#[" for "],["
split on "#"

how to remove double characters and spaces from string

Please let me how to remove double spaces and characters from below string.
String = Test----$$$$19****45#### Nothing
Clean String = Test-$19*45# Nothing
I have used regex "\s+" but it just removing the double spaces and I have tried other patterns of regex but it is too complex... please help me.
I am using vb.net
What you'll want to do is create a backreference to any character, and then remove the following characters that match that backreference. It's usually possible using the pattern (.)\1+, which should be replaced with just that backreference (once). It depends on the programming language how it's exactly done.
Dim text As String = "Test###_&aa&&&"
Dim result As String = New Regex("(.)\1+").Replace(text, "$1")
result will now contain Test#_&a&. Alternatively, you can use a lookaround to not remove that backreference in the first place:
Dim text As String = "Test###_&aa&&&"
Dim result As String = New Regex("(?<=(.))\1+").Replace(text, "")
Edit: included examples
For a faster alternative try:
Dim text As String = "Test###_&aa&&&"
Dim sb As New StringBuilder(text.Length)
Dim lastChar As Char
For Each c As Char In text
If c <> lastChar Then
sb.Append(c)
lastChar = c
End If
Next
Console.WriteLine(sb.ToString())
Here is a perl way to substitute all multiple non word chars by only one:
my $String = 'Test----$$$$19****45#### Nothing';
$String =~ s/(\W)\1+/$1/g;
print $String;
output:
Test-$19*45# Nothing
Here's how it would look in Java...
String raw = "Test----$$$$19****45#### Nothing";
String cleaned = raw.replaceAll("(.)\\1+", "$1");
System.out.println(raw);
System.out.println(cleaned);
prints
Test----$$$$19****45#### Nothing
Test-$19*45# Nothing