Regex: This range OR that range - regex

So I am trying match a certain postcode range:
CB1 *, CB2 *, CB3 *, CB4 *, CB5 *, CB21 *, CB22 *, CB23 *, CB24 *, CB25 *
So I am trying to use range 1-5 OR 21-25.
This is my current regex:
^[CBcb].([1-5]|[21-25]).+$
I want to make sure the post code parts contains the following
[CB OR cb],[1-5 OR 21-25] and [Any combination]
Have a tinker: https://regex101.com/r/aP9uG3/2
How do you do you specify two ranges?

Since the patterns are the same and it is just the 2 that may or may not occur, you can say something like:
CB2?[1-5] # add ^ and $ if required
If you want to specify two ranges, you can always group them with parentheses common_pattern(pattern1|pattern2).

Your Regex pattern:
^[CBcb].([1-5]|[21-25]).+$
is being interpreted as:
^[CBcb].([12345]|[2125]).+$
You need:
^CB2?[1-5].+'
here ? means zero or one match of the preceding token, 2 in this case.

^cb2?[1-5].+$ and use the i flag as well.
The first error was that you were only matching one character from the list [cbCB]. The second is that there's a strange . in the middle. And the third is that you do not specify a range of numbers, but a range of characters. 21 is not a character, it is a sequence of characters. A range of characters to get all possible (integer) numbers would be [0-9]*. What you want is an optional 2 followed by a character from the range [1-5].
You should read up on what lists and ranges are and mean in Regular Expressions because you misused both of them! Eeryone makes mistakes obviously, but this is one of the basics you should get a hang of.

Having characters inside [] makes it a character class. This means that in matches any character inside the brackets (unless it's negated). It doesn't understand numbers, but characters.
If you want to match CB or cb, you separate them by | like CB|cb. Or even better - make your regex case independent. This is done in different ways in different regex flavors. In javascript for example, attach the character i to the regex: /cb/i.
As for the rest of the pattern, if 1-5 and 20-25 is literally what you want, matching 1-5 is done with a character class (which you now are familiar with ;) like [1-5] meaning match any character in the ASCII range between the characters 1 and 5 inclusive.
Make the preceding 2 optional, and your regex looks like this
CB2?[1-5]
It matches your postcode and without a terminating $, it allows for your [Any combination].
Hope this helps.
Regards

Related

RegEx for finding strings with chars and numbers

I am trying to match strings that are part numbers mixed with normal text.
Here are a few examples.
Towing Cntrl Ecu,Gl3t-19H378-Ac
Assy,Pwr,Tested Gd,Priv-M50t3
Left,Rear,Brn-Tan,Pwr,4DR,Mju1
T-Case Ecu,56029590AE
Right,Blind Spot Module,284K0 9HS0F
In these examples I am trying to match.
Gl3t-19H378-Ac
Priv-M50t3
Mju1
56029590AE
284K0 and 9HS0F
I am in .Net and this is the Regex I have been using.
(\b[a-zA-Z0-9][a-zA-Z0-9\-]{1,32}(\b|$)(?<=[0-9]))
It works for what I need if the match ends in a number. The rule I want is to match any string between word boundaries that is either all numbers or numbers and chars mixed, but never just chars.
This should do it:
\b[a-zA-Z0-9-]*\d[a-zA-Z0-9-]*\b
If you need to restrict the length to a maximum of 32, add a look ahead:
\b(?=[a-zA-Z0-9-]{1,32}\b)[a-zA-Z0-9-]*\d[a-zA-Z0-9-]*\b
If the underscore character is OK too, you can use [\w-] instead of [a-zA-Z0-9-].

Regex for string representation of a method call

I have a string that follows a specific pattern like so
operator(field,value)
and I'd like to use regex to extract out all three of operator, field and value. I'm struggling to come up with the syntax for how to capture these. In this case value can be alphanumeric as well, for example
"contains(name, Joe)"
or "lt(quantity, 2.5)"
Use something like this to capture groups, you may want to limit the characters accepted with [], note the use of ` and the use of \ escaping for () within the regexp:
func main() {
re := regexp.MustCompile(`(.+)\((.+),\s?(.+)\)`)
for _, t := range tests {
fmt.Println("result", re.FindStringSubmatch(t))
}
}
https://play.golang.org/p/43YLTafgQt
output:
result [contains(field, value) contains field value]
result [contains(name, Joe) contains name Joe]
result [lt(quantity, 2.5) lt quantity 2.5]
result [plus(no,44) plus no 44]
Depending on how strict you want to be you could use [a-z]+ or similar instead of .+ to match only certain characters but if you are not worried about bogus values this would probably be fine.
I don't know golang, but I do know regex's, so I'll do what I can here.
You probably want a group each for the "operator", "field", and "value". I'm going to assume for now that each of these can be represented as any combination of alphabetic, numeric, or underscore characters, with length of at least one character. In regex, we have a shortcut for that: \w represents a single alpha-numeric or underscore character, and the + modifier means "one or more". So \w+ means one or more such character in a row. If you want a more complex definition of what these fields can be named, I'll let you specify that in your question.
You say that you want to support "operator(field,value)". I'll start without whitespace anywhere, because it's simpler and you can easily remove all whitespace yourself before running the regex. We'll later add some whitespace support to the regex if you want it, but it'll make life difficult.
To do this, we want three groups, "1(2,3)" where 1 is the operator name, 2 is the field name, and 3 is the value name. Each of these, as given above, will be \w+ in our regex. We'll want to match the open and close parentheses as well as the comma, but we'll throw them away because they're really just delimiters. The parentheses will need to be escaped in the regex, since regex's have a special meaning for parentheses. The result looks like:
(\w+)\((\w+),(\w+)\)
\ 1 / \ 2 / \ 3 /
Where the second line shows you where the groups are each defined.
If you want to support some whitespace, you'll need to add \s* in all such locations. This gets hairy, but you can do it as such:
(\w+)\s*\(\s*(\w+)\s*,\s*(\w+)\s*\)
\ 1 / \ 2 / \ 3 /
You give an example of wanting to support floating point values, and I presume other kinds of values too. You can accomplish this using the "or" pipe, |. For example, group 3, instead of just being \w+, could be defined as
[a-zA-Z_]\w*|\d+\.?|\d*\.\d+
This string will support alphanumeric+underscore strings where the first character must be alphabetic or underscore, OR integers, OR floating point (defined as an integer string with a period at the beginning, middle, or end). Clearly, this can go on and on to support more complex string values, but you get the idea.
So the final regex might look like:
(\w+)\s*\(\s*(\w+)\s*,\s*([a-zA-Z_]\w+|\d+\.?|\d*\.\d+)\s*\)
Sorry for not giving any golang help, I hope someone else can edit my answer and fill in that major gap.

Need regex expression with multiple conditions

I need regex with following conditions
It should accept maximum of 5 digits then upto 3 decimal places
it can be negative
it can be zero
it can be only numbers (max. upto 5 digit place)
it can be null
I have tried following but its not, its not fulfilling all conditions
#"^([\-\+]?)\d{0,5}(.[0-9]{1,3})?)$"
E.g. maximum value can hold is from -99999.999 to 99999.999
Use this regex:
^[-+]?\d{0,5}(\.[0-9]{1,3})?$
I only made two changes here. First, you don't need to escape any characters inside a character class normally, except for opening and closing brackets, or possibly backslash itself. Hence, we can use [-+] to capture an initial plus or minus. Second, you need to escape the dot in your regex, to tell the engine that you want to match a literal dot.
However, I would probably phrase this regex as follows:
^[-+]?\d{1,5}(\.[0-9]{1,3})?$
This will match one to five digits, followed by an optional decimal point, followed by one to three digits.
Note that we want to capture things like:
0.123
But not
.123
i.e. we don't want to capture a leading decimal point should it not be prefixed by at least one number.
Demo here:
Regex101
I assume you're doing this in C# given the notation. Here's a little code you can use to test your expression, with two corrections:
You have to escape the dot, otherwise it means "any character". So, \. instead of .
There was an extraneous close parenthesis that prevented the expression from compiling
C#:
var expr = #"^([\-\+]?)\d{0,5}(\.[0-9]{1,3})?$";
var re = new Regex(expr);
string[] samples = {
"",
"0",
"1.1",
"1.12",
"1.123",
"12.3",
"12.34",
"12.345",
"123.4",
"12345.123",
".1",
".1234"
};
foreach(var s in samples) {
Console.WriteLine("Testing [{0}]: {1}", s, re.IsMatch(s) ? "PASS" : "FAIL");
}
Results:
Testing []: PASS
Testing [0]: PASS
Testing [1.1]: PASS
Testing [1.12]: PASS
Testing [1.123]: PASS
Testing [12.3]: PASS
Testing [12.34]: PASS
Testing [12.345]: PASS
Testing [123.4]: PASS
Testing [12345.123]: PASS
Testing [.1]: PASS
Testing [.1234]: FAIL
It should accept maximum of 5 digits
[0-9]{1,5}
then upto 3 decimal places
[0-9]{1,5}(\.[0-9]{1,3})?
it can be negative
[-]?[0-9]{1,5}(\.[0-9]{1,3})?
it can be zero
Already covered.
it can be only numbers (max. upto 5 digit place)
Already covered. 'Up to 5 digit place' contradicts your first rule, which allows 5.3.
it can be null
Not covered. I strongly suggest you remove this requirement. Even if you mean 'empty', as I sincerely hope you do, you should detect that case separately and beforehand, as you will certainly have to handle it differently.
Your regular expression contains ^ and $. I don't know why. There is nothing about start of line or end of line in the rules you specified. It also allows a leading +, which again isn't specified in your rules.

Comma Separated Numbers Regex

I am trying to validate a comma separated list for numbers 1-8.
i.e. 2,4,6,8,1 is valid input.
I tried [0-8,]* but it seems to accept 1234 as valid. It is not requiring a comma and it is letting me type in a number larger than 8. I am not sure why.
[0-8,]* will match zero or more consecutive instances of 0 through 8 or ,, anywhere in your string. You want something more like this:
^[1-8](,[1-8])*$
^ matches the start of the string, and $ matches the end, ensuring that you're examining the entire string. It will match a single digit, plus zero or more instances of a comma followed by a digit after it.
/^\d+(,\d+)*$/
for at least one digit, otherwise you will accept 1,,,,,4
[0-9]+(,[0-9]+)+
This works better for me for comma separated numbers in general, like: 1,234,933
You can try with this Regex:
^[1-8](,[1-8])+$
If you are using python and looking to find out all possible matching strings like
XX,XX,XXX or X,XX,XXX
or 12,000, 1,20,000 using regex
string = "I spent 1,20,000 on new project "
re.findall(r'(\b[1-8]*(,[0-9]*[0-9])+\b)', string, re.IGNORECASE)
Result will be ---> [('1,20,000', ',000')]
You need a number + comma combination that can repeat:
^[1-8](,[1-8])*$
If you don't want remembering parentheses add ?: to the parens, like so:
^[1-8](?:,[1-8])*$

Python: RE only captures first and last match

I'm trying to make a Regular Expression that captures the following:
- XX or XX:XX, up to 6 repetitions (XX:XX:XX:XX:XX:XX), where X is a hexadecimal number.
In other words, I'm trying to capture MAC addresses than can range from 1 to 6 bytes.
regex = re.compile("^([0-9a-fA-F]{2})(?:(?:\:([0-9a-fA-F]{2})){0,5})$")
The problem is that if I enter for example "11:22:33", it only captures the first match and the last, which results in ["11", "22"].
The question: is there any method that {0,5} character will let me catch all repetitions, and not the last one?
Thanks!
Not in Python, no. But you can first check the correct format with your regex, and then simply split the string at ::
result = s.split(':')
Also note that you should always write regular expressions as raw strings (otherwise you get problems with escaping). And your outer non-capturing group does nothing.
Technically there is a way to do it with regex only, but the regex is quite horrible:
r"^([0-9a-fA-F]{2})(?:([0-9a-fA-F]{2}))?(?:([0-9a-fA-F]{2}))?(?:([0-9a-fA-F]{2}))?(?:([0-9a-fA-F]{2}))?(?:([0-9a-fA-F]{2}))?$"
But here you would always get six captures, just that some might be empty.