Why doesn't this regex pattern work? - regex

I'm trying to select commas without numbers of 4 digits or the word "id" before, I tried with this:
( ? < ! [ \ d { 5 } | id ] ) ,
The problem
for example, if input string is "1999," that comma is not selected, I don't understand why.

Try this pattern:
(?<!\d{5}|id),
Your pattern, (?<![\d{5}|id]), is looking for a comma that is not after a digit, {, }, |, i, or d - They should not be in a charterer class: []. If anything, (?<![\d]{5}|id), will also work, but is redundant.

First of all, unless you're using the /x flag, each space will attempt to match a space. So take those out.
Second, you're using [...] presumably to group an alternation (|) but square brackets actually indicate a character class, i.e. [\d{5}|id] is equivalent to [id5{}|] and matches any one of those characters, but not more. What you mean is this:
(?<!\d{5}|id),
The final problem might be that many implementations of regex (you haven't specified which you're using) don't support variable-width lookbehind assertions. So, you may need to do something like:
(?<!\d{5}|...id),

Related

Regex for string representation of a method call

I have a string that follows a specific pattern like so
operator(field,value)
and I'd like to use regex to extract out all three of operator, field and value. I'm struggling to come up with the syntax for how to capture these. In this case value can be alphanumeric as well, for example
"contains(name, Joe)"
or "lt(quantity, 2.5)"
Use something like this to capture groups, you may want to limit the characters accepted with [], note the use of ` and the use of \ escaping for () within the regexp:
func main() {
re := regexp.MustCompile(`(.+)\((.+),\s?(.+)\)`)
for _, t := range tests {
fmt.Println("result", re.FindStringSubmatch(t))
}
}
https://play.golang.org/p/43YLTafgQt
output:
result [contains(field, value) contains field value]
result [contains(name, Joe) contains name Joe]
result [lt(quantity, 2.5) lt quantity 2.5]
result [plus(no,44) plus no 44]
Depending on how strict you want to be you could use [a-z]+ or similar instead of .+ to match only certain characters but if you are not worried about bogus values this would probably be fine.
I don't know golang, but I do know regex's, so I'll do what I can here.
You probably want a group each for the "operator", "field", and "value". I'm going to assume for now that each of these can be represented as any combination of alphabetic, numeric, or underscore characters, with length of at least one character. In regex, we have a shortcut for that: \w represents a single alpha-numeric or underscore character, and the + modifier means "one or more". So \w+ means one or more such character in a row. If you want a more complex definition of what these fields can be named, I'll let you specify that in your question.
You say that you want to support "operator(field,value)". I'll start without whitespace anywhere, because it's simpler and you can easily remove all whitespace yourself before running the regex. We'll later add some whitespace support to the regex if you want it, but it'll make life difficult.
To do this, we want three groups, "1(2,3)" where 1 is the operator name, 2 is the field name, and 3 is the value name. Each of these, as given above, will be \w+ in our regex. We'll want to match the open and close parentheses as well as the comma, but we'll throw them away because they're really just delimiters. The parentheses will need to be escaped in the regex, since regex's have a special meaning for parentheses. The result looks like:
(\w+)\((\w+),(\w+)\)
\ 1 / \ 2 / \ 3 /
Where the second line shows you where the groups are each defined.
If you want to support some whitespace, you'll need to add \s* in all such locations. This gets hairy, but you can do it as such:
(\w+)\s*\(\s*(\w+)\s*,\s*(\w+)\s*\)
\ 1 / \ 2 / \ 3 /
You give an example of wanting to support floating point values, and I presume other kinds of values too. You can accomplish this using the "or" pipe, |. For example, group 3, instead of just being \w+, could be defined as
[a-zA-Z_]\w*|\d+\.?|\d*\.\d+
This string will support alphanumeric+underscore strings where the first character must be alphabetic or underscore, OR integers, OR floating point (defined as an integer string with a period at the beginning, middle, or end). Clearly, this can go on and on to support more complex string values, but you get the idea.
So the final regex might look like:
(\w+)\s*\(\s*(\w+)\s*,\s*([a-zA-Z_]\w+|\d+\.?|\d*\.\d+)\s*\)
Sorry for not giving any golang help, I hope someone else can edit my answer and fill in that major gap.

Regex for comparing Strings with spaces

Im trying to compare is a string is present among a list of Strings using regex.
I tried using the following...
(?!MyDisk1$|MyDisk2$)
But this isnt working... for the scenarios like
(?!My disk1$|My Disk2$)
Can you suggest a better approach to deal with such situations..
I get the list of strings from an sql query... So I am not sure where the spaces are present. The list of Strings vary like My Disk1, MyDisk2, My_Disk3, ABCD123, XYZ_123, MNP 123 etc.... or any other String with [a-zA-Z0-9_ ]
You can make the spaces optional using a zero-or-one quantifier (?):
(?!My ?disk1$|My ?Disk2$)
This assertion will reject substrings like MyDisk2 or My Disk2. Or to handle potentially many spaces, use a zero-or-more quantifier (*):
(?!My *disk1$|My *Disk2$)
Note that if you're running this in an engine which ignores whitespace in the pattern you may need to use a character class, like this:
(?!My[ ]*disk1$|My[ ]*Disk2$)
Or to handle spaces or underscores:
(?!My[ _]*disk1$|My[ _]*Disk2$)
Unfortunately if the spaces can be anywhere in the string, (but you still care about matching the other letters in order), you'd have to do something like this:
(?! *M *y *d *i *s *k *1$| *M *y *D *i *s *k *2$)
Or to handle spaces or underscores:
(?![ _]*M[ _]*y[ _]*d[ _]*i[ _]*s[ _]*k[ _]*1$|[ _]*M[ _]*y[ _]*D[ _]*i[ _]*s[ _]*k[ _]*2$)
But to be honest, at that point, you may be better off preprocessing your data before you try to use your regex with it.
use this Regex upending i at the end that will mean that your regex is case-insensitive
/my\s?disk[12]\$/i
this will match all possible scenarios.
You can do this:
/(?[^\s_-]+(\s|_|-)?[^\s_-]*?$)/i
'?' quantifier means 0 or 1 of the preceding pattern.
/i is for case insensitive. The separator can be space or underscore or dash.I have replace My and disk with a string of length 1 or more which does not contain space ,underscore or dash.. Now it wil match "Shikhar Subedi" "dprpradeep" or "MyDisk 54".
The + quantifier means 1 or more. ^ means not. * means 0 or more. So the string after the space is optional.

Regex for Regex validation decimal[19,3]

I want to validate a decimal number (decimal[19,3]). I used this
#"[\d]{1,16}|[\d]{1,16}[\.]\d{1,3}"
but it didn't work.
Below are valid values:
1234567890123456.123
1234567890123456.12
1234567890123456.1
1234567890123456
1234567
0.0
.1
Simplification:
The \d doesn't have to be in []. Use [] only when you want to check whether a character is one of multiple characters or character classes.
. doesn't need to be escaped inside [] - [\.] appears to just allow ., but allowing \ to appear in the string in the place of the . may be a language dependent possibility (?). Or you can just take it out of the [] and keep it escaped.
So we get to:
\d{1,16}|\d{1,16}\.\d{1,3}
(which can be shortened using the optional / "once or not at all" quantifier (?)
to \d{1,16}(\.\d{1,3})?)
Corrections:
You probably want to make the second \d{1,16} optional, or equivalently simply make it \d{0,16}, so something like .1 is allowed:
\d{1,16}|\d{0,16}\.\d{1,3}
If something like 1. should also be allowed, you'll need to add an optional . to the first part:
\d{1,16}\.?|\d{0,16}\.\d{1,3}
Edit: I was under the impression [\d] matches \ or d, but it actually matches the character class \d (corrected above).
This would match your 3 scenarios
^(\d{1,16}|(\d{0,16}\.)?\d{1,3})$
first part: a 0 to 16 digit number
second: a 0 to 16 digit number with 1 to 3 decimals
third: nothing before a dot and then 1 to 3 decimals
the ^ and $ are anchorpoints that match start of line and end of line, so if you need to search for numbers inside lines of text, your should remove those.
Testdata:
Usage in C#
string resultString = null;
try {
resultString = Regex.Match(subjectString, #"\d{1,16}\.?|\d{0,16}\.\d{1,3}").Value;
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Slight optimization
A bit more complicated regex, but a bit more correct would be to have the ?: notation in the "inner" group, if you are not using it, to make that a non-capture group, like this:
^(\d{1,16}|(?:\d{0,16}\.)?\d{1,3})$
Following Regex will help you out -
#"^(\d{1,16}(\.\d{1,3})?|\.\d{1,3})$"
Try something like that
(\d{0,16}\.\d{0,3})|(\d{0,16})
It work with all your examples.
edit. new version ;)
You can try:
^\d{0,16}(?:\.|$)(?:\d{0,3}|)$
match 0 to 16 digits
then match a dot or end of string
and then match 3 more digits

Perl regex | Match second from the right

I'm trying to parse an OID and extract the #18 but I am unsure on how to write it to count Right to Left using a dot as a delimiter:
1.3.6.1.2.1.31.1.1.1.18.10035
This regex will grab the last value
my $ifindex = ($_=~ /^.*[.]([^.]*)$/);
I haven't found a way to tweak it to get the value I need yet.
How about:
my $str = "1.3.6.1.2.1.31.1.1.1.18.10035";
say ((split(/\./, $str))[-2]);
output:
18
If the format is always the same (ie. always second from right) then you can either use:-
m/(\d+)\.\d+$/;
..and the answer will end up in: $1
Or a different approach would be to split the string into an array on the dots and examine the penultimate value in the array.
What you need is simpler:
my $ifindex;
if (/(\d+)\.\d+$/)
{
$ifindex = $1;
}
A couple of comments:
You don't need to match the entire string, only the part you care about. Thus, no need to anchor to the beginning with ^ and use .*. Anchor to the end only.
[.] is a character class, intended for matching groups of characters. e.g., [abc] will match either a, b, or c. It should be avoided when matching a single character; just match that character instead. In this case you do need to escape it, since it is a special character: \..
I have assumed based on your example that all of the terms have to be numbers. Hence, I used \d+ for the terms.
my $ifindex = ($_=~ /^.*[.]([^.]*)[.][^.]*$/);

Regular expression help - comma delimited string

I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
123, 4A67, GGG, 767 would be valid.
12333, 78787&*, GH778 would be invalid
fghkjhfdg8797< would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?
Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma
Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+ means match 1 or more of these symbols
The ,? means match 0 or 1 commas (basically, the comma is optional)
The \s* handles 1 or more spaces after the comma
and finally the outer + says match 1 or more of the pattern.
This will also match
123 123 abc (no commas) which might be a problem
This will also match 123, (ends with a comma) which might be a problem.
Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
test
test, test
test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.
You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.
Analyzing the highlights:
[a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
(?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
[...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+
Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)
Try ^(?!,)((, *)?([a-zA-Z0-9])\b)*$
Step by step description:
Don't match a beginning comma (good for the upcoming "loop").
Match optional comma and spaces.
Match characters you like.
The match of a word boundary make sure that a comma is necessary if more arguments are stacked in string.
Please use - ^((([a-zA-Z0-9\s]){1,45},)+([a-zA-Z0-9\s]){1,45})$
Here, I have set max word size to 45, as longest word in english is 45 characters, can be changed as per requirement