Find numbers in string using Golang regexp - regex

I want to find all numbers in a string with the following code:
re:=regexp.MustCompile("[0-9]+")
fmt.Println(re.FindAllString("abc123def", 0))
I also tried adding delimiters to the regex, using a positive number as second parameter for FindAllString, using a numbers only string like "123" as first parameter...
But the output is always []
I seem to miss something about how regular expressions work in Go, but cannot wrap my head around it. Is [0-9]+ not a valid expression?

The problem is with your second integer argument. Quoting from the package doc of regex:
These routines take an extra integer argument, n; if n >= 0, the function returns at most n matches/submatches.
You pass 0 so at most 0 matches will be returned; that is: none (not really useful).
Try passing -1 to indicate you want all.
Example:
re := regexp.MustCompile("[0-9]+")
fmt.Println(re.FindAllString("abc123def987asdf", -1))
Output:
[123 987]
Try it on the Go Playground.

#icza answer is perfect for fetching positive numbers but, if you have a string which contains negative numbers also like below
"abc-123def987asdf"
and you are expecting output like below
[-123 987]
replace regex expression with below
re := regexp.MustCompile(`[-]?\d[\d,]*[\.]?[\d{2}]*`)

Related

Regex expression not behaving as expected in matlab

In matlab I have a string like :
y = '[3-G]]3|25+3|[3-G]4|25+4|G5|25+5|F'
Then I have a variable named intHit, which I need to return a cell array, containing an int if it is followed by a sign. So if we define it as:
intHit = regexp(y,'(\d*)([+-])','Match');
It returns something like:
intHit =
1×5 cell array
'3-' '25+' '3-' '25+' '25+'
HOWEVER, sometimes my intHit call, depending on the input of y returns minus(-) signs without an integer in front of it. I think my regex expression is faulty. Can someone help me format this so it will only return an integer followed by a plus(+) or minus(-) sign. So always something like the example above. Thanks in advance.
Try this instead:
intHit = regexp(y,'(\d+)([+-])','Match');
^--- that's the change
I think you had the wrong quantifier. * will match 0 or more times consecutively (which means it could match lone pluses and minuses), while + will match 1 or more times consecutively.
Also, if you're not capturing tokens, you can simplify your match expression to just '\d+[+-]'.

regex interval with possible characters before and after number VBA

I'm trying to produce a regular expression that can identify a number within an interval in a string in VBA. Sometimes this number has characters around it, other times not (non-consistent notation from a supplier). The expression should identify that 1413 in the three examples below are within the number range 500-2000 (or alternatively that it's not in the number range 0-50 or 51-499).
Example:
Test 12/2014. Tot.flow:1413 m3 or
Test 12/2014. Tot.flow:1413m3 or
Test 12/2014. Tot.flow: 1413
These strings have some identifiers:
there will always be a colon before the number
there may be a white space between the colon and the number
there may be a white space between the number and the m3
m3 is not necessarily always present, and if not, the number is at the end of the string
So far what I have in my attempt to make an regex that find the number range is ([5-9][0-9][0-9]|[1]\d{3}|2000), but this matches all three digit numbers as well (2001 gives a match on 200). However, I understand that I'm missing out on a couple of concepts to achieve the ultimate goal here. I guess my problems are as following:
How to start the interval at something not being zero (found lots of questions on intervals starting on zero)
How to take into account the variations in notation both for flow: and m3?
I'm only interested in checking that the number lies within the number range. This is driving me bonkers, all help is highly appreciated!
You can just extract the number with regExp.Replace() using the following regex:
^.*:\s*(\d+).*$
The replacement part is $1.
Then, use usual number comparison to check whether the value is in the expected range (e.g. If CLng(result) > 499 And If CLng(result) < 2001 Then ...).
Test macro:
Dim re As RegExp, tgt As String, src As String
Set re = New RegExp
With re
.pattern = "^.*:\s*(\d+).*$"
.Global = False
End With
src = "Test 12/2014. Tot.flow: 1413"
tgt = re.Replace(src, "$1")
MsgBox (CLng(tgt) > 499 And CLng(tgt) < 2001)
You can try with:
:\s?([5-9]\d\d|1\d{3}|2000)\s?(m3|\n)
also, your regex ([5-9][0-9][0-9]|[1]\d{3}|2000) in my opinion is fine, it should not match numbers >500 and 2000<.

Simplest way to find out if at least one cell in a cell array matches a regular expression

I need to search a cell array and return a single boolean value indicating whether any cell matches a regular expression.
For example, suppose I want to find out if the cell array strs contains foo or -foo (case-insensitive). The regular expression I need to pass to regexpi is ^-?foo$.
Sample inputs:
strs={'a','b'} % result is 0
strs={'a','foo'} % result is 1
strs={'a','-FOO'} % result is 1
strs={'a','food'} % result is 0
I came up with the following solution based on How can I implement wildcard at ismember function of matlab? and Searching cell array with regex, but it seems like I should be able to simplify it:
~isempty(find(~cellfun('isempty', regexpi(strs, '^-?foo$'))))
The problem I have is that it looks rather cryptic for such a simple operation. Is there a simpler, more human-readable expression I can use to achieve the same result?
NOTE: The answer refers to the original regexp in the question: '-?foo'
You can avoid the find:
any(~cellfun('isempty', regexpi(strs, '-?foo')))
Another possibility: concatenate first all cells into a single string:
~isempty(regexpi([strs{:}], '-?foo'))
Note that you can remove the "-" sign in any of the above:
any(~cellfun('isempty', regexpi(strs, 'foo')))
~isempty(regexpi([strs{:}], 'foo'))
And that allows using strfind (with lower) instead of regexpi:
~isempty(strfind(lower([strs{:}]),'foo'))

Part of as string from a string using regular expressions

I have a string of 5 characters out of which the first two characters should be in some list and next three should be in some other list.
How could i validate them with regular expressions?
Example:
List for First two characters {VBNET, CSNET, HTML)}
List for next three characters {BEGINNER, EXPERT, MEDIUM}
My Strings are going to be: VBBEG, CSBEG, etc.
My regular expression should find that the input string first two characters could be either VB, CS, HT and the rest should also be like that.
Would the following expression work for you in a more general case (so that you don't have hardcoded values): (^..)(.*$)
- returns the first two letters in the first group, and the remaining letters in the second group.
something like this:
^(VB|CS|HT)(BEG|EXP|MED)$
This recipe works for me:
^(VB|CS|HT)(BEG|EXP|MED)$
I guess (VB|CS|HT)(BEG|EXP|MED) should do it.
If your strings are as well-defined as this, you don't even need regex - simple string slicing would work.
For example, in Python we might say:
mystring = "HTEXP"
prefix = mystring[0:2]
suffix = mystring[2:5]
if (prefix in ['HT','CS','VB']) AND (suffix in ['BEG','MED','EXP']):
pass # valid!
else:
pass # not valid. :(
Don't use regex where elementary string operations will do.

Delphi TRegEx bug?

I try to validate the input '3a' for regex '[_a-zA-Z][_a-zA-Z0-9]*' with that source:
len := TRegEx.Create([_a-zA-Z][_a-zA-Z0-9]*).Match('3a').Length;
I expected 0 for len variable, but it was 2. Is that correct?
This is not your real code. For a start it does not compile. You have omitted the quote marks. If we fix that then we have:
len := TRegEx.Create('[_a-zA-Z][_a-zA-Z0-9]*').Match('3a').Length;
But that returns a value of 1 and not 2 as you stated. This return value is correct because the a matches [_a-zA-Z] and then the input string ends.
I expect that you have the wrong regex. Perhaps you should be using
^[_a-zA-Z][_a-zA-Z0-9]*$
The ^ matches the beginning of the input string, the $ mathes the end. Presumably the input is taken from a source code tokenizer.
So the conclusion is that there is no bug evident in the Delphi regex code from this pattern and input.