MATLAB regexp skip if part of word - regex

Given the following example:
str = 'deriv*dot(N,iv)';
expr = 'iv';
idx = regexp(str,expr);
This returns idx with 4 and 13. How do I only find the 'iv' that is NOT part of a word?
I tried messing around with Lookaround Operators for expr, but could not get the result I desired. Thanks for the help.

It seems like Matlab has it's own word boundary escape sequence.
expr = '\<iv\>';
That defines a word as anything that consists of letters, digits and underscores. If you want your own definition (i.e. letters only), then you need lookarounds:
expr = '(?<![a-zA-Z])expr(?![a-zA-Z])';

Related

Regex: not all BLANKS but allow certain characters, with limit

Trying to come up with a Regex, or combination of Regex, that returns False if a) they have only entered only BLANK(s), or they b) entered "non-legal" characters. Lastly, the number of characters has a set limit.
The closest I have thus far is below. Where it fails is that it does not count any leading spaces; only the non-BLANKs are counted, and so it fails. Using js.
const reg = /^(**[ ]***[!-~\u2018-\u201d\u2013\u2014]){1,10}$/;
EDIT: I think the above is incorrect, and I meant to post this:
const re4 = /^(?!\s*$)[!-~\u2018-\u201d\u2013\u2014]{1,10}$/;
EDIT 2: this has less clutter; allow space and all other 'standard' keyboard chars:
const re5 = /^(?!\s*$)[!-~]{1,10}$/;
So, this says you can enter a bunch of spaces, and must include at least 1 other character from the list following; but the {1,10} only counts the non-spaces and so I can end up with too many in total.
EDIT:
So, using re5 above --
s = ' '; // should fail
s = ' blah blah'; // should pass
s = ' blah blah'; // should fail, as there are 11 characters
Try ^(?:\s*\S){1,10}\s*$
Allow 1-10 non whiter, change \S to allow chars
Update 2: After learning that you cannot invert the match result in code, here's one last suggestion using negative lookahead (like you already tried yourself).
This regex matches only strings of 1-10 non-banned characters that are not all whitespace:
const re4 = /^(?!\s+$)[^\!-\~\u2018-\u201d\u2013\u2014]{1,10}$/
Update 1: Use this regex to match all-whitespace string OR strings longer than 10 chars OR strings containing bad characters:
const re4 = /(^\s+$|^.{11,}$|[\!-\~\u2018-\u201d\u2013\u2014])/
I understand that you want to impose a length restriction via regex. I would suggest against that and recommend using str.length instead.
This regex will match whitespace-only strings and strings containing one or more bad characters:
const re4 = /(^\s+$|[\!-\~\u2018-\u201d\u2013\u2014])/;
Regarding prohibition of all-whitespace strings: Instead of packing it into a regex, you might consider using something more explicit like if (s.trim().length == 0). IMO this makes your intention clearer and your code propably more readable, leaving you with this easy to read regex:
# matches any string containing a *bad* character
const re4 = /[\!-\~\u2018-\u201d\u2013\u2014]/;
If you use trim for the all-whitespace check, you might convert your regex into a positive assertion, even with length restriction:
# matches any string consisting of 1-10 characters not considered *bad*
const re4 = /^[^\!-\~\u2018-\u201d\u2013\u2014]{1,10}$/;
To match the input when it’s from 1 to 10 chars long and can't be all blanks, use a negative look ahead to assert not all blanks:
^(?! *$).{1,10}
If you want to restrict allowable chars, change the dot to a suitable character class of allowable chars.

Regex expression that excludes underscores, but only if they are right before a number or an uppercase letter?

i have a string that represents a filename that comes with an incorporated business unit code + a timestamp, like this: alfin_cf_cashflowcomposition_X0826_20180726122003.csv
I want to get exclude the BU code and the timestamp and end up with the stripped name, so i want to get this: alfin_cf_cashflowcomposition.csv
So far i've managed to match only the lowercase letters, the dots and the underscores (thus excluding the uppercase X and the numbers that represent the timestamp).
I used this simple expression: /[a-z._] and got this result: alfin_cf_cashflowcomposition__.csv
Notice that there are 2 underscores just before ".csv". I do not want those there.
I only want the underscores if the next character is a lowercase letter.
I need to write a regex that ignores underscores if the next character is either an uppercase letter or a number.
Any idea on how i can achieve that?
I would not use regex for this. You can strip the extension, split the rest of the filename and take all parts except the last two.
I implemented this simple code in javascript.
const orig_filename = "alfin_cf_cashflowcomposition_X0826_20180726122003.csv";
function strip_codes (orig){
const extpos = orig.lastIndexOf('.');
const nameparts = orig.substr(0,extpos).split('_');
const ext = orig.substr(extpos);
const name = nameparts
.slice(0, nameparts.length - 2)
.join('_');
return name + ext;
}
console.log(strip_codes(orig_filename));
you can use_.[A-Z,0-9].*(?=.csv) this regex it works.
you can test it online herehttps://regexr.com/

split text into words and exclude hyphens

I want to split a text into it's single words using regular expressions. The obvious solution would be to use the regex \\b unfortunately this one does split words also on the hyphen.
So I am searching an expression doing exactly the same as the \\b but does not split on hyphens.
Thanks for your help.
Example:
String s = "This is my text! It uses some odd words like user-generated and need therefore a special regex.";
String [] b = s.split("\\b+");
for (int i = 0; i < b.length; i++){
System.out.println(b[i]);
}
Output:
This
is
my
text
!
It
uses
some
odd
words
like
user
-
generated
and
need
therefore
a
special
regex
.
Expected output:
...
like
user-generated
and
....
#Matmarbon solution is already quite close, but not 100% fitting it gives me
...
like
user-
generated
and
....
This should do the trick, even if lookaheads are not available:
[^\w\-]+
Also not you but somebody who needs this for another purpose (i.e. inserting something) this is more of an equivalent to the \b-solutions:
([^\w\-]|$|^)+
because:
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
--- http://www.regular-expressions.info/wordboundaries.html
You can use this:
(?<!-)\\b(?!-)

Extract number not in brackets from this string using regular expressions [70-(90)]

[15-]
[41-(32)]
[48-(45)]
[70-15]
[40-(64)]
[(128)-42]
[(128)-56]
I have these values for which I want to extract the value not in curled brackets. If there is more than one, then add them together.
What is the regular expression to do this?
So the solution would look like this:
[15-] -> 15
[41-(32)] -> 41
[48-(45)] -> 48
[70-15] -> 85
[40-(64)] -> 40
[(128)-42] -> 42
[(128)-56] -> 56
You would be over complicating if you go for a regex approach (in this case, at least), also, regular expressions does not support mathematical operations, as pointed out by #richardtallent.
You can use an approach as shown here to extract a substring which omits the initial and final square brackets, and then, use the Split (as shown here) and split the string in two using the dash sign. Lastly, use the Instr function (as shown here) to see if any of the substrings that the split yielded contains a bracket.
If any of the substrings contain a bracket, then, they are omitted from the addition, or they are added up if otherwise.
Regular expressions does not support performing math on the terms. You can loop through the groups that are matched and perform the math outside of Regex.
Here's the pattern to extract any number within the square brackets that are not in cury brackets:
\[
(?:(?:\d+|\([^\)]*\))-)*
(\d+)
(?:-[^\]]*)*
\]
Each number will be returned in $1.
This works by looking for a number that is prefixed by any number of "words" separated by dashes, where the "words" are either numbers themselves or parenthesized strings, and followed by, optionally, a dash and some other stuff before hitting the end brace.
If VBA's RegEx doesn't support uncaptured groups (?:), remove all of the ?:'s and your captured numbers will be in $3 instead.
A simpler pattern also works:
\[
(?:[^\]]*-)*
(\d+)
(?:-[^\]]*)*
\]
This simply looks for numbers delimited by dashes and allowing for the number to be at the beginning or end.
Private Sub regEx()
Dim RegexObj As New VBScript_RegExp_55.RegExp
RegexObj.Pattern = "\[(\(?[0-9]*?\)?)-(\(?[0-9]*?\)?)\]"
Dim str As String
str = "[15-]"
Dim Match As Object
Set Match = RegexObj.Execute(str)
Dim result As Integer
Dim value1 As Integer
Dim value2 As Integer
If Not InStr(1, Match.Item(0).submatches.Item(0), "(", 1) Then
value1 = Match.Item(0).submatches.Item(0)
End If
If Not InStr(1, Match.Item(0).submatches.Item(1), "(", 1) And Not Match.Item(0).submatches.Item(1) = "" Then
value2 = Match.Item(0).submatches.Item(1)
End If
result = value1 + value2
MsgBox (result)
End Sub
Fill [15-] with the other strings.
Ok! It's been 6 years and 6 months since the question was posted. Still, for anyone looking for something like that maybe now or in the future...
Step 1:
Trim Leading and Trailing Spaces, if any
Step 2:
Find/Search:
\]|\[|\(.*\)
Replace With:
<Leave this field Empty>
Step 3:
Trim Leading and Trailing Spaces, if any
Step 4:
Find/Search:
^-|-$
Replace With:
<Leave this field Empty>
Step 5:
Find/Search:
-
Replace With:
\+

Regex help. I need ideas for solve the String Calculator kata with Groovy

I'm working on String Calculator code kata with Groovy.
There are a lot of scenarios that solve for achieve the solution:
I have:
//;\n1;2;3
//#\n1#2#3
//+\n1+2+3
//*\n1*2*3
//?\n1?2?3
I want:
1,2,3
My implementation:
String numbers = "//;\n1;2;3"
numbers.find(/\/\/\S[\n]/) { match ->
def delimeter = match[2]
numbers = numbers.minus(match).replaceAll(delimeter, ",")
}
With this solution I solved the first and second expressions, but I don't know how solve the others expressions.
java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0
The problem is that we must also consider any symbol that match with the sintaxt of regular expressions like +, * or ?
Finally I have the solution:
String numbers = "//+\n1+2+3"
numbers.find(/(?s)\/\/(.*)\n/) { match ->
def delimeter = match[1] // also match[0][2]
numbers = numbers.minus(match[0]).replace(delimeter, ",")
}
An important point (?s):
In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.
Dotall mode can also be enabled via the embedded flag expression (?s)
But really the problem was here: .replace(delimeter, ",")
//(.)\n(\d)\1(\d)\1(\d)
Need to use links.
(.) - math thiw any character, and \1 - math thiw character on it\
For next example you can apply this: //\[(.*?)\]\\n(\d)\1(\d)\1(\d)
It math thiw
//[*]\n12**3
And last: //\[(.*?)\]\[(.*?)\]\\n(\d)\1(\d)\2(\d)
//[*][%%]\n1*2%%3
And finaly:
//\[(.*?)\](?:\[(.*?)\])?\\n(\d)\1(\d)(?:\2|\1)(\d)
I think it's can work ewerythere
P.S : (\d) you can replace what you want. I think you need (\d*)