GPA regex in Perl - regex

I'm attempting to make a GPA validation regex in Perl, and I seem to have something wrong with my logic. You should be able to end a number 0-3 followed by a . with 1 more digit in the range of 0-9. or if the first digit is a 4 it must be followed with a .0 Here's my code:
$get_gpa_input =~ m/[0-3]\.\d[0-9]|[4].[0]/

m/(?: [0-3] [.] [0-9] ) | 4[.]0 /x

Remove the [0-9]. You've also got some extra brackets and you should escape the decimal in '4.0'.
$get_gpa_input =~ m/[0-3]\.\d|4\.0/

If you are doing validation, you don't want to search within a string but rather to force the entire string to match your regex; you do this by adding anchors to the beginning and ending:
/\A (?: [0-3]\.[0-9] | 4\.0 ) \z/x
\A matches only before the first character of the string, \z matches only after the last character of the string.
Avoid using \d in most code since it can match any number of Unicode "digits" that aren't 0 through 9 (though in newer perls, the /a flag reverts it to its old ASCII meaning).

You have \d[0-9] which would require two digits following 0-3. You also don't escape the decimal in the 4 alternate, which may make a difference.
[0-3]\.\d|4\.0

Related

Regex perl with letters and numbers

I need to extract a strings from a text file that contains both letters and numbers. The lines start like this
Report filename: ABCL00-67900010079415.rpt ______________________
All I need is the last 8 numbers so in this example that would be 10079415
while(<DATA>){
if (/Report filename/) {
my ($bagID) = ( m/(\d{8}+)./ );
print $bagID;
}
Right now this prints out the first 8 but I want the last 8.
You just need to escape the dot, so that it would match the 8 digit characters which exists before the dot charcater.
my ($bagID) = ( m/(\d{8}+)\./ );
. is a special character in regex which matches any character. In-order to match a literal dot, you must need to escape that.
To match the last of anything, just precede it with a wildcard that will match as many characters as possible
my ($bag_id) = / .* (\d{8}) /x
Note that I have also use the /x modifier so that the regex can contain insignificant whitespace for readability. Also, your \d{8}+ is what is called a possessive quantifier; it is used for optimising some regex constructions and makes no difference at the end of the pattern

sed making digit optional

I am attempting to replace a date in the format 08/09/2014 but at the same time also the format 8/9/14 using sed. I know the + sign is supposed to match one or more occurrences, and ? 0 or more. I've tried both but none of the dates are being replaced with "testing". I was expecting this would find 1 or more digits followed by a slash, 1 or more digits followed by a slash, 4 digits.
Do I need to escape the special character, or what is wrong here?
sed -f mySed.sed dates.csv
# mySed.sed file
s#[0-9]+/[0-9]+/[0-9][0-9][0-9][0-9]#testing#g
# sample line in dates.csv
...,20/01/2001,2/1/2009,...
You have made several mistakes. Here is a working example:
echo '20/01/2001,2/1/2009' | sed 's~[0-9]\{1,2\}/[0-9]\{1,2\}/\([0-9]\{2\}\)\{1,2\}~toto~g'
Note that the ? means "optional" (in other words 0 or 1 time) and must be escaped.
To be more precise, I have choosen to use this quantifier {m,n} instead of +. But if you use + don't forget to escape it \+ otherwise it will be seen as a literal character.
You need to escape the + quantifiers in your regular expression, and you can use a range for the last set.
s#[0-9]\+/[0-9]\+/[0-9]\{2,4\}#testing#g
Or you can use the range quantifier throughout your pattern.
s#[0-9]\{1,2\}/[0-9]\{1,2\}/[0-9]\{2,4\}#testing#g

regex: find one-digit number

I need to find the text of all the one-digit number.
My code:
$string = 'text 4 78 text 558 my.name#gmail.com 5 text 78998 text';
$pattern = '/ [\d]{1} /';
(result: 4 and 5)
Everything works perfectly, just wanted to ask it is correct to use spaces?
Maybe there is some other way to distinguish one-digit number.
Thanks
First of all, [\d]{1} is equivalent to \d.
As for your question, it would be better to use a zero width assertion like a lookbehind/lookahead or word boundary (\b). Otherwise you will not match consecutive single digits because the leading space of the second digit will be matched as the trailing space of the first digit (and overlapping matches won't be found).
Here is how I would write this:
(?<!\S)\d(?!\S)
This means "match a digit only if there is not a non-whitespace character before it, and there is not a non-whitespace character after it".
I used the double negative like (?!\S) instead of (?=\s) so that you will also match single digits that are at the beginning or end of the string.
I prefer this over \b\d\b for your example because it looks like you really only want to match when the digit is surrounded by spaces, and \b\d\b would match the 4 and the 5 in a string like 192.168.4.5
To allow punctuation at the end, you could use the following:
(?<!\S)\d(?![^\s.,?!])
Add any additional punctuation characters that you want to allow after the digit to the character class (inside of the square brackets, but make sure it is after the ^).
Use word boundaries. Note that the range quantifier {1} (a single \d will only match one digit) and the character class [] is redundant because it only consists of one character.
\b\d\b
Search around word boundaries:
\b\d\b
As explained by the others, this will extract single digits meaning that some special characters might not be respected like "." in an ip address. To address that, see F.J and Mike Brant's answer(s).
It really depends on where the numbers can appear and whether you care if they are adjacent to other characters (like . at the end of a sentence). At the very least, I would use word boundaries so that you can get numbers at the beginning and end of the input string:
$pattern = '/\b\d\b/';
But you might consider punctuation at the end like:
$pattern = '/\b\d(\b|\.|\?|\!)/';
If one-digit numbers can be preceded or followed by characters other than digits (e.g., "a1 cat" or "Call agent 7, pronto!") use
(?<!\d)\d(?!\d)
Demo
The regular expression reads, match a digit (\d) that is neither preceded nor followed by digit, (?<!\d) being a negative lookbehind and (?!\d) being a negative lookahead.

Decimal number regular expression, where digit after decimal is optional

I need a regular expression that validates a number, but doesn't require a digit after the decimal.
ie.
123
123.
123.4
would all be valid
123..
would be invalid
Any would be greatly appreciated!
Use the following:
/^\d*\.?\d*$/
^ - Beginning of the line;
\d* - 0 or more digits;
\.? - An optional dot (escaped, because in regex, . is a special character);
\d* - 0 or more digits (the decimal part);
$ - End of the line.
This allows for .5 decimal rather than requiring the leading zero, such as 0.5
/\d+\.?\d*/
One or more digits (\d+), optional period (\.?), zero or more digits (\d*).
Depending on your usage or regex engine you may need to add start/end line anchors:
/^\d+\.?\d*$/
Debuggex Demo
You need a regular expression like the following to do it properly:
/^[+-]?((\d+(\.\d*)?)|(\.\d+))$/
The same expression with whitespace, using the extended modifier (as supported by Perl):
/^ [+-]? ( (\d+ (\.\d*)?) | (\.\d+) ) $/x
or with comments:
/^ # Beginning of string
[+-]? # Optional plus or minus character
( # Followed by either:
( # Start of first option
\d+ # One or more digits
(\.\d*)? # Optionally followed by: one decimal point and zero or more digits
) # End of first option
| # or
(\.\d+) # One decimal point followed by one or more digits
) # End of grouping of the OR options
$ # End of string (i.e. no extra characters remaining)
/x # Extended modifier (allows whitespace & comments in regular expression)
For example, it will match:
123
23.45
34.
.45
-123
-273.15
-42.
-.45
+516
+9.8
+2.
+.5
And will reject these non-numbers:
. (single decimal point)
-. (negative decimal point)
+. (plus decimal point)
(empty string)
The simpler solutions can incorrectly reject valid numbers or match these non-numbers.
this matches all requirements:
^\d+(\.\d+)?$
Try this regex:
\d+\.?\d*
\d+ digits before optional decimal
.? optional decimal(optional due to the ? quantifier)
\d* optional digits after decimal
I ended up using the following:
^\d*\.?\d+$
This makes the following invalid:
.
3.
This is what I did. It's more strict than any of the above (and more correct than some):
^0$|^[1-9]\d*$|^\.\d+$|^0\.\d*$|^[1-9]\d*\.\d*$
Strings that passes:
0
0.
1
123
123.
123.4
.0
.0123
.123
0.123
1.234
12.34
Strings that fails:
.
00000
01
.0.
..
00.123
02.134
you can use this:
^\d+(\.\d)?\d*$
matches:
11
11.1
0.2
does not match:
.2
2.
2.6.9
^[+-]?(([1-9][0-9]*)?[0-9](\.[0-9]*)?|\.[0-9]+)$
should reflect what people usually think of as a well formed decimal number.
The digits before the decimal point can be either a single digit, in which case it can be from 0 to 9, or more than one digits, in which case it cannot start with a 0.
If there are any digits present before the decimal sign, then the decimal and the digits following it are optional. Otherwise, a decimal has to be present followed by at least one digit. Note that multiple trailing 0's are allowed after the decimal point.
grep -E '^[+-]?(([1-9][0-9]*)?[0-9](\.[0-9]*)?|\.[0-9]+)$'
correctly matches the following:
9
0
10
10.
0.
0.0
0.100
0.10
0.01
10.0
10.10
.0
.1
.00
.100
.001
as well as their signed equivalents, whereas it rejects the following:
.
00
01
00.0
01.3
and their signed equivalents, as well as the empty string.
What language? In Perl style: ^\d+(\.\d*)?$
What you asked is already answered so this is just an additional info for those who want only 2 decimal digits if optional decimal point is entered:
^\d+(\.\d{2})?$
^ : start of the string
\d : a digit (equal to [0-9])
+ : one and unlimited times
Capturing Group (.\d{2})?
? : zero and one times
. : character .
\d : a digit (equal to [0-9])
{2} : exactly 2 times
$ : end of the string
1 : match
123 : match
123.00 : match
123. : no match
123.. : no match
123.0 : no match
123.000 : no match
123.00.00 : no match
try this. ^[0-9]\d{0,9}(\.\d{1,3})?%?$ it is tested and worked for me.
Regular expression:
^\d+((.)|(.\d{0,1})?)$
use \d+ instead of \d{0,1} if you want to allow more then one number use \d{0,2} instead of \d{0,1} if you want to allow up to two numbers after coma. See the example below for reference:
or
^\d+((.)|(.\d{0,2})?)$
or
^\d+((.)|(.\d+)?)$
Explanation
(These are generated by regex101)
^ asserts position at start of a line
\d matches a digit (equivalent to [0-9])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group ((.)|(.\d{0,1})?)
1st Alternative (.)
2nd Capturing Group (.)
. matches any character (except for line terminators)
2nd Alternative (.\d{0,1})?
3rd Capturing Group (.\d{0,1})?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
. matches any character (except for line terminators)
\d matches a digit (equivalent to [0-9])
{0,1} matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of a line
Sandbox
Play with regex here: https://regex101.com/
(?<![^d])\d+(?:\.\d+)?(?![^d])
clean and simple.
This uses Suffix and Prefix, RegEx features.
It directly returns true - false for IsMatch condition
^\d+(()|(\.\d+)?)$
Came up with this. Allows both integer and decimal, but forces a complete decimal (leading and trailing numbers) if you decide to enter a decimal.
In Perl, use Regexp::Common which will allow you to assemble a finely-tuned regular expression for your particular number format. If you are not using Perl, the generated regular expression can still typically be used by other languages.
Printing the result of generating the example regular expressions in Regexp::Common::Number:
$ perl -MRegexp::Common=number -E 'say $RE{num}{int}'
(?:(?:[-+]?)(?:[0123456789]+))
$ perl -MRegexp::Common=number -E 'say $RE{num}{real}'
(?:(?i)(?:[-+]?)(?:(?=[.]?[0123456789])(?:[0123456789]*)(?:(?:[.])(?:[0123456789]{0,}))?)(?:(?:[E])(?:(?:[-+]?)(?:[0123456789]+))|))
$ perl -MRegexp::Common=number -E 'say $RE{num}{real}{-base=>16}'
(?:(?i)(?:[-+]?)(?:(?=[.]?[0123456789ABCDEF])(?:[0123456789ABCDEF]*)(?:(?:[.])(?:[0123456789ABCDEF]{0,}))?)(?:(?:[G])(?:(?:[-+]?)(?:[0123456789ABCDEF]+))|))
For those who wanna match the same thing as JavaScript does:
[-+]?(\d+\.?\d*|\.\d+)
Matches:
1
+1
-1
0.1
-1.
.1
+.1
Drawing: https://regexper.com/#%5B-%2B%5D%3F%28%5Cd%2B%5C.%3F%5Cd*%7C%5C.%5Cd%2B%29

Match a number in a string with letters and numbers

I need to write a Perl regex to match numbers in a word with both letters and numbers.
Example: test123. I want to write a regex that matches only the number part and capture it
I am trying this \S*(\d+)\S* and it captures only the 3 but not 123.
Regex atoms will match as much as they can.
Initially, the first \S* matched "test123", but the regex engine had to backtrack to allow \d+ to match. The result is:
+------------------- Matches "test12"
| +-------------- Matches "3"
| | +--------- Matches ""
| | |
--- --- ---
\S* (\d+) \S*
All you need is:
my ($num) = "test123" =~ /(\d+)/;
It'll try to match at position 0, then position 1, ... until it finds a digit, then it will match as many digits it can.
The * in your regex are greedy, that's why they "eat" also numbers. Exactly what #Marc said, you don't need them.
perl -e '$_ = "qwe123qwe"; s/(\d+)/$numbers=$1/e; print $numbers . "\n";'
"something122320" =~ /(\d+)/ will return 122320; this is probably what you're trying to do ;)
\S matches any non-whitespace characters, including digits. You want \d+:
my ($number) = 'test123' =~ /(\d+)/;
Were it a case where a non-digit was required (say before, per your example), you could use the following non-greedy expressions:
/\w+?(\d+)/ or /\S+?(\d+)/
(The second one is more in tune with your \S* specification.)
Your expression satisfies any condition with one or more digits, and that may be what you want. It could be a string of digits surrounded by spaces (" 123 "), because the border between the last space and the first digit satisfies zero-or-more non-space, same thing is true about the border between the '3' and the following space.
Chances are that you don't need any specification and capturing the first digits in the string is enough. But when it's not, it's good to know how to specify expected patterns.
I think parentheses signify capture groups, which is exactly what you don't want. Remove them. You're looking for /\d+/ or /[0-9]+/