Matching negative and non-negative numbers in perl - regex

I have a Perl script that matches lines that start with (alphanumeric or underscore), followed by any number of spaces, followed by another (alphanumeric or underscore). I realize now that I need to also include, for the second (alphanumeric or underscore), a possibility that this could be a negative number (for instance -50). How can I accomplish this?
Original code:
if ( /^\w[\s]+\w/ and not /^A pdb file/ ) {
...doSomething
}
Unsuccessfully tried things like:
if ( /^\w[\s]+\-*w/ and not /^A pdb file/ )
if ( /^\w[\s]+\-{0,1}w/ and not /^A pdb file/ )
if ( /^\w[\s]+\w|-\w/ and not /^A pdb file/ )
Thanks.

Does this meet your needs?
/^\w+\s*-?\w+$/
It says match:
\w+: any number of alphanumeric characters (including underscore)
\s*: any number of spaces (if you need atleast one space, use \s+)
-?: optional dash
\w+: any number of alphanumeric characters (including underscore). If this set of characters can only be numbers, then use \d+ instead.

Try:
m{
\A # start of the string
\w # a single alphanumeric or underscore
\s+ # one or more white space
(?: # non-capturing grouping
\- # a minus sign
\d+ # one or more digits
)? # match entire group zero or one time
\w # a single alphanumeric or underscore
}msx;

Related

Regex catching adjacent characters with a single character set

I am trying to construct a regex statement that matches a string conforming to the following conditions:
3-63 lowercase alphanumeric characters, plus "." and "-"
May not start or end with . or -
Dashes and periods cannot be adjacent to each other.
abc-123.xyz <- should match
abc123-.xyz <- should not match
I have been able to put this regex together, but it does not catch the third requirement. I've tried to use another negative lookahead/lookbehind,[i.e. - (?!.-|-.) ] but its still matching the strings with adjacent periods and dashes. Here's the regex statement I came up with that fulfills conditions 1 & 2:
^(?!\.|-)([a-z0-9]|\.|-){3,63}(?<!\.|-)$
FYI, this regex is for validating input when specifiying an AWS S3 bucket name in a CloudFormation template.
How about:
^(?=.{3,63}$)[a-z0-9]+(?:[-.][a-z0-9]+)*$
Use this Pattern ^(?!.*[.-](?=[.-]))[^.-][a-z0-9.-]{1,61}[^.-]$ Demo
# ^(?!.*[.-](?=[.-]))[^.-][a-z0-9.-]{1,61}[^.-]$
^ # Start of string/line
(?! # Negative Look-Ahead
. # Any character except line break
* # (zero or more)(greedy)
[.-] # Character in [.-] Character Class
(?= # Look-Ahead
[.-] # Character in [.-] Character Class
) # End of Look-Ahead
) # End of Negative Look-Ahead
[^.-] # Character not in [.-] Character Class
[a-z0-9.-] # Character in [a-z0-9.-] Character Class
{1,61} # (repeated {1,61} times)
[^.-] # Character not in [.-] Character Class
$ # End of string/line
^[a-z0-9](?:[a-z0-9]|[.\-](?=[a-z0-9])){2,62}$
We match a lowercase alphanumeric character, followed by between 2 and 62 repetitions of either:
a lowercase alphanumeric character, or
a . or - (which must be followed by a lowercase alphanumeric character).
The last restriction makes sure that you can't have two ./- characters in a row, or a ./- at the end of the string.

RegEx solution for 1.23+1.23+1

I searched a lot but can not find this regular expression. My problem is that I made a calculator but can not validate my display entirely. My case is with the dot
I need my regular expression to be: digit dot digit operator digit dot ( 1.23+1.23+1.). The dot must be placed only once not like (1..23+ 1.1.1). I have found similar regular expression but it didn't cover the case (1.23 +1.)
Here is my regEx -> /[0-9-+/*]+(\.[0-9][0-9]?)?/g
Could use this
^[+-]?(?:\d+(?:\.\d*)?|\.\d+)(?:[+-](?:\d+(?:\.\d*)?|\.\d+))*$
Expanded:
^ # BOS
[+-]? # Optional Plus or minus
(?: # Decimal term
\d+
(?: \. \d* )?
| \. \d+
)
(?: # Optionally, many more terms
[+-] # Required Plus or minus
(?: # Decimal term
\d+
(?: \. \d* )?
| \. \d+
)
)*
$ # EOS
Check this out(demo):
/^(([-+*\/ ]+)?(\b(\d+\.\d+)\b|\d))+$/
but it will work only if there is one equation per string - it matches at beginning (^) and ant the end ($) of a string. However you can also use it with /m or/and /g modifiers.
EDIT
If it is only about '–' character it is enough to add it to character class:
/^(([-–+*\/ ]+)?(\b(\d+\.\d+)\b|\d))+$/

RegEx to replace prefix and postfix

I would like to build a RegEx expression to replace the prefix and postfix of a string. the general string is built from
a known prefix string
some letter a-z or A-Z
some unknown string with letters, hyphens, backslash, slash and numbers.
a hyphen
an integer number
the symbols #.
some string of letters
Examples:
KnownStringr/df-2e\d-3724#.Gkjsu
KnownStringEd\e4v-bn-824#.YKfg
KnownStringa-YK224E\yy-379924#.awws
I would like to replace the prefix and postfix of the NUMBER so that I get:
MyPrefix3724MyPostfix
MyPrefix824MyPostfix
MyPrefix379924MyPostfix
This regex should do the trick, but you always should specify the language/framework you're using, because not all regex engines support the same features.
The number that you want to capture would be in capture group #3 ((\d+)), which most languages reference as \3
(?:KnownString)([a-zA-Z])(.*?)-(\d+)\#\.[a-zA-Z]+
Explanation:
(?: # Opens NCG
KnownString # Literal KnownString
) # Closes NCG
( # Opens CG1
[a-zA-Z] # Character class (any of the characters within)
# Anything between a and z
# Anything between A and Z
) # Closes CG1
( # Opens CG2
.*? # . denotes any single character, except for newline
# * repeats zero or more times
# ? as few times as possible
)- # Closes CG2
# Literal -
( # Opens CG3
\d+ # Token: \d (digit)
# + repeats one or more times
) # Closes CG3
\# # Literal #
\. # Literal .
[a-zA-Z]+ # Character class (any of the characters within)
# Anything between a and z
# Anything between A and Z
# + repeats one or more times
You haven't specified what the known prefix is, you should be careful to escape special characters in known string, especially period, plus sign, asterisk, question mark, and parentheses.

regex pattern - what is ((?=.*\d)|(?=.*\W+)) and (?![.\n])

can someone kindly explain this regex pattern to me?
under
(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$
what exactly is
((?=.*\d)|(?=.*\W+))
&
(?![.\n])
thank you
These are all lookahead assertions (positive and negative) that are making sure the following text respects some rules without actually capturing the text.
# assert that
(?=^.{8,}$) # there are at least 8 characters
( # and
(?=.*\d) # there is at least a digit
| # or
(?=.*\W+) # there is one or more "non word" characters (\W is equivalent to [^a-zA-Z0-9_])
) # and
(?![.\n]) # there is no . or newline and
(?=.*[A-Z]) # there is at least an upper case letter and
(?=.*[a-z]).*$ # there is at least a lower case letter
.*$ # in a string of any characters
(?! ... ) is the syntax for a negative lookahead (match if there is no ...), (?= ... ) is for a positive lookahead (match if there is ...). This looks a lot like password validation!
String matches to end of line
At least 8 characters in length
At least one digit or non-word character exists (not a-zA-Z0-9_)
No new line found (ie. string is one line long)
At least one uppercase letter exists
At least one lowercase letter exists
This seems to be a RegEx for validating a password.

Regular expression captures unwanted string

I have created the following expression: (.NET regex engine)
((-|\+)?\w+(\^\.?\d+)?)
hello , hello^.555,hello^111, -hello,+hello, hello+, hello^.25, hello^-1212121
It works well except that :
it captures the term 'hello+' but without the '+' : this group should not be captured at all
the last term 'hello^-1212121' as 2 groups 'hello' and '-1212121' both should be ignored
The strings to capture are as follows :
word can have a + or a - before it
or word can have a ^ that is followed by a positive number (not necessarily an integer)
words are separated by commas and any number of white spaces (both not part of the capture)
A few examples of valid strings to capture :
hello^2
hello^.2
+hello
-hello
hello
EDIT
I have found the following expression which effectively captures all these terms, it's not really optimized but it just works :
([a-zA-Z]+(?= ?,))|((-|\+)[a-zA-Z]+(?=,))|([a-zA-Z]+\^\.?\d+)
Ok, there are some issues to tackle here:
((-|+)?\w+(\^.?\d+)?)
^ ^
The + and . should be escaped like this:
((-|\+)?\w+(\^\.?\d+)?)
Now, you'll also get -1212121 there. If your string hello is always letters, then you would change \w to [a-zA-Z]:
((-|\+)?[a-zA-Z]+(\^\.?\d+)?)
\w includes letters, numbers and underscore. So, you might want to restrict it down a bit to only letters.
And finally, to take into consideration of the completely not capturing groups, you'll have to use lookarounds. I don't know of anyway otherwise to get to the delimiters without hindering the matches:
(?<=^|,)\s*((-|\+)?[a-zA-Z]+(\^\.?\d+)?)\s*(?=,|$)
EDIT: If it cannot be something like -hello^2, and if another valid string is hello^9.8, then this one will fit better:
(?<=^|,)\s*((?:-|\+)?[a-zA-Z]+|[a-zA-Z]+\^(?:\d+)?\.?\d+)(?=\s*(?:,|$))
And lastly, if capturing the words is sufficient, we can remove the lookarounds:
([-+]?[a-zA-Z]+|[a-zA-Z]+\^(?:\d+)?\.?\d+)
It would be better if you first state what it is you are looking to extract.
You also don't indicate which Regular Expression engine you're using, which is important since they vary in their features, but...
Assuming you want to capture only:
words that have a leading + or -
words that have a trailing ^ followed by an optional period followed by one or more digits
and that words are sequences of one or more letters
I'd use:
([a-zA-Z]+\^\.?\d+|[-+][a-zA-Z]+)
which breaks down into:
( # start capture group
[a-zA-Z]+ # one or more letters - note \w matches numbers and underscores
\^ # literal
\.? # optional period
\d+ # one or more digits
| # OR
[+-]? # optional plus or minus
[a-zA-Z]+ # one or more letters or underscores
) # end of capture group
EDIT
To also capture plain words (without leading or trailing chars) you'll need to rearrange the regexp a little. I'd use:
([+-][a-zA-Z]+|[a-zA-Z]+\^(?:\.\d+|\d+\.\d+|\d+)|[a-zA-Z]+)
which breaks down into:
( # start capture group
[+-] # literal plus or minus
[a-zA-Z]+ # one or more letters - note \w matches numbers and underscores
| # OR
[a-zA-Z]+ # one or more letters
\^ # literal
(?: # start of non-capturing group
\. # literal period
\d+ # one or more digits
| # OR
\d+ # one or more digits
\. # literal period
\d+ # one or more digits
| # OR
\d+ # one or more digits
) # end of non-capturing group
| # OR
[a-zA-Z]+ # one or more letters
) # end of capture group
Also note that, per your updated requirements, this regexp captures both true non-negative numbers (i.e. 0, 1, 1.2, 1.23) as well as those lacking a leading digit (i.e. .1, .12)
FURTHER EDIT
This regexp will only match the following patterns delimited by commas:
word
word with leading plus or minus
word with trailing ^ followed by a positive number of the form \d+, \d+.\d+, or .\d+
([+-][A-Za-z]+|[A-Za-z]+\^(?:.\d+|\d+(?:.\d+)?)|[A-Za-z]+)(?=,|\s|$)
Please note that the useful match will appear in the first capture group, not the entire match.
So, in Javascript, you'd:
var src="hello , hello ,hello,+hello,-hello,hello+,hello-,hello^1,hello^1.0,hello^.1",
RE=/([+-][A-Za-z]+|[A-Za-z]+\^(?:\.\d+|\d+(?:\.\d+)?)|[A-Za-z]+)(?=,|\s|$)/g;
while(RE.test(src)){
console.log(RegExp.$1)
}
which produces:
hello
hello
hello
+hello
-hello
hello^1
hello^1.0
hello^.1