Regex expression not behaving as expected in matlab - regex

In matlab I have a string like :
y = '[3-G]]3|25+3|[3-G]4|25+4|G5|25+5|F'
Then I have a variable named intHit, which I need to return a cell array, containing an int if it is followed by a sign. So if we define it as:
intHit = regexp(y,'(\d*)([+-])','Match');
It returns something like:
intHit =
1×5 cell array
'3-' '25+' '3-' '25+' '25+'
HOWEVER, sometimes my intHit call, depending on the input of y returns minus(-) signs without an integer in front of it. I think my regex expression is faulty. Can someone help me format this so it will only return an integer followed by a plus(+) or minus(-) sign. So always something like the example above. Thanks in advance.

Try this instead:
intHit = regexp(y,'(\d+)([+-])','Match');
^--- that's the change
I think you had the wrong quantifier. * will match 0 or more times consecutively (which means it could match lone pluses and minuses), while + will match 1 or more times consecutively.
Also, if you're not capturing tokens, you can simplify your match expression to just '\d+[+-]'.

Related

Find group of strings starting and ending by a character using regular expression

I have a string, and I want to extract, using regular expressions, groups of characters that are between the character : and the other character /.
typically, here is a string example I'm getting:
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
and so, I want to retrieved, 45.72643,4.91203 and also hereanotherdata
As they are both between characters : and /.
I tried with this syntax in a easier string where there is only 1 time the pattern,
[tt]=regexp(str,':(\w.*)/','match')
tt = ':45.72643,4.91203/'
but it works only if the pattern happens once. If I use it in string containing multiples times the pattern, I get all the string between the first : and the last /.
How can I mention that the pattern will occur multiple time, and how can I retrieve it?
Use lookaround and a lazy quantifier:
regexp(str, '(?<=:).+?(?=/)', 'match')
Example (Matlab R2016b):
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = regexp(str, '(?<=:).+?(?=/)', 'match')
result =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
In most languages this is hard to do with a single regexp. Ultimately you'll only ever get back the one string, and you want to get back multiple strings.
I've never used Matlab, so it may be possible in that language, but based on other languages, this is how I'd approach it...
I can't give you the exact code, but a search indicates that in Matlab there is a function called strsplit, example...
C = strsplit(data,':')
That should will break your original string up into an array of strings, using the ":" as the break point. You can then ignore the first array index (as it contains text before a ":"), loop the rest of the array and regexp to extract everything that comes before a "/".
So for instance...
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
Breaks down into an array with parts...
1 - 'abcd'
2 - '45.72643,4.91203/Rou'
3 - 'hereanotherdata/defgh'
Then Ignore 1, and extract everything before the "/" in 2 and 3.
As John Mawer and Adriaan mentioned, strsplit is a good place to start with. You can use it for both ':' and '/', but then you will not be able to determine where each of them started. If you do it with strsplit twice, you can know where the ':' starts :
A='abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
B=cellfun(#(x) strsplit(x,'/'),strsplit(A,':'),'uniformoutput',0);
Now B has cells that start with ':', and has two cells in each cell that contain '/' also. You can extract it with checking where B has more than one cell, and take the first of each of them:
C=cellfun(#(x) x{1},B(cellfun('length',B)>1),'uniformoutput',0)
C =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
Starting in 16b you can use extractBetween:
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = extractBetween(str,':','/')
result =
2×1 cell array
{'45.72643,4.91203'}
{'hereanotherdata' }
If all your text elements have the same number of delimiters this can be vectorized too.

Spotfire: count the number of a certain character in a string

I am trying to add a new calculated column that counts the number of semi colons in a string and adds one to it. So the column i have contains a bunch of aliases and I need to know how many for each row.
For example,
A; B; C; D
So basically this means there are 4 aliases (3 semi colons + 1)
Need to do this for over 2 million rows. Help please!
Basic idea is to subtract length of your string without ; characters from it's original length:
len([columnName])-len(Substitute([columnName],";",""))+1
Here it is with a regular expression:
Len(RXReplace([Column 1], "(?!;).", "", "gis"))+1
RXReplace takes as arguments:
The string you are wanting to work on (in this case it is on Column 1)
The regular expression you want to use (here it is (?!;). )
What you want to replace matches with (blank in this situation so
that everything that matches the regex is removed)
Finally a parameter saying how you want it to work (we are passing
in gis which means replace all matches not just the first, ignore case, replace newlines)
We wrap this in a Len which gives us the amount of semicolons since that is all that is left and finally we add 1 to it to get the final result.
You can read more about the regular expression here: https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx but in a nutshell it says match everything that isn't a semi colon.
You can read more about RXReplace and Len here: https://docs.tibco.com/pub/spotfire/6.0.0-november-2013/userguide-webhelp/ncfe/ncfe_text_functions.htm

Find numbers in string using Golang regexp

I want to find all numbers in a string with the following code:
re:=regexp.MustCompile("[0-9]+")
fmt.Println(re.FindAllString("abc123def", 0))
I also tried adding delimiters to the regex, using a positive number as second parameter for FindAllString, using a numbers only string like "123" as first parameter...
But the output is always []
I seem to miss something about how regular expressions work in Go, but cannot wrap my head around it. Is [0-9]+ not a valid expression?
The problem is with your second integer argument. Quoting from the package doc of regex:
These routines take an extra integer argument, n; if n >= 0, the function returns at most n matches/submatches.
You pass 0 so at most 0 matches will be returned; that is: none (not really useful).
Try passing -1 to indicate you want all.
Example:
re := regexp.MustCompile("[0-9]+")
fmt.Println(re.FindAllString("abc123def987asdf", -1))
Output:
[123 987]
Try it on the Go Playground.
#icza answer is perfect for fetching positive numbers but, if you have a string which contains negative numbers also like below
"abc-123def987asdf"
and you are expecting output like below
[-123 987]
replace regex expression with below
re := regexp.MustCompile(`[-]?\d[\d,]*[\.]?[\d{2}]*`)

How can I add the multiplication sign to an algebraic expression through regex?

I am writing a mathematical parser in which a user can enter answers to be evaluated. How can I convert something like 'xe^x + xyz' to 'x*e^x + x*y*z' through Regex?
Alternative methods would be welcome too. Thank you!
Look for each occurrence of:
(?<=[a-zA-Z0-9])(?=[a-zA-Z])
Replace by:
*
Edit:
As pointed out by #ChristopherCreutzig, this regex will also handle the cases like 23xy in the most probable expected way. That is:
considering a sequence of digits as a part of a single expressoin,
considering a digit followed by a letter as a multiplication,
considering a letter followed by a digit as part of a single expression.
For example, for this input:
2x1 + 3xy
The resulting output is:
2*x1 + 3*x*y
See it in action and try it out live here on regex101.
(?<=\w)(?=\w)
(looking one letter before and after)
replace by
*

can any one tell me regular expression for this? UPDATED

I tried many syntax in vistal studio, and in this site, but nothing helped.
The expression would be _ct_(anyDigitHere)_
like
adkasjflasdjfsdf asdfkl sjfdsf _ct150_ asdfasd // so it would match this _ct150
any thing here doens't matter Random stuff..afd a&r9qwr89 ((
_ct415487_ anything here doesn't matter // this will match _ct415487_
basically any _ctAndAnyNumberHere_ (underscore at start and end)
A couple I tried ^*(_ct)(:z)(+_)+*$, ^*(_ct[0-9]+_)+*$. But none helps!
EDIT
Thanks for the reply(ies). That did work, but what I now the problem is replace those matched elements with a hidden field value.. say..
if the value in hidden field is of 1 digit, (any value from 0-9), I have to take last digit from that matched expression and replace it with the value in hidden field.
if the value in hidden field is of 2 digit, (any value from 0-99), I have to take last two digits from that matched expression and replace it with the value in hidden field.
so, basically..
if the value in hidden field is of n digit, I have to take last n digits from that matched expression and replace it with the value in hidden field.
How do I do that?
I don't know what language of visual studio you're talking about, but this should work:
_ct\d+_
or this:
_ct([0-9]+)_
EDIT:
Regex rg = new Regex("_ct([0-9]+)_");
string text = "any thing here doens't matter Random stuff..afd a&r9qwr89 ((_ct415487_ anything here doesn't matter";
var match = rg.Match(text).Groups[1].Value;
int sizeHiddenField = HiddenField1.Value.Length;
var newString = text.Replace(match, match.Substring(0, match.Length - sizeHiddenField) + HiddenField1.Value);
/_ct\d*_/
This is the regular expression syntax for your given problem. Try this