Regex expected to match one letter matches whole word [duplicate] - regex

This question already has answers here:
Regular expression pipe confusion
(5 answers)
Closed 5 months ago.
I have this expression in a switch statement for a script controlpanel
$ctrlinpt=read-host ">>>"
switch -regex ( $ctrlinpt )
{
<#1#>'^\w{2,}|\d{2,}$' { do this }
<#2#>'^c{1}|C{1}$' { do that }
<#3#>'^l{1}|L{1}$' { do somethign else }
<#4#>'^s{1}|S{1}$' { take a break }
<#5#>'^exit$'{ go home }
}
The first condition should match any input with more than one character and it does. However , if the input starts with one of the letters in the other conditions, it also matches their cases
for example
"Task1" matches only case 1
"Legacy" matches case 1 and case 3
"supper" matches case 1 and case 4
I expect cases 2,3, and 4 to only match an input of exactly the one character stated. What am I doing wrong?

The pattern '^l{1}|L{1}$' is interpreted as "a string starting with l OR a string ending with L" - and since PowerShell does case-insensitive comparisons by default, "a string starting with l" perfectly describes Legacy.
To fix this, you can nest the alternation in a grouping construct:
^(?:l{1}|L{1})$
... or use a character class:
^[lL]$
... or, since the matches are case-insensitive by default anyway, simply match on a single character literal:
^l$
Thus, your switch becomes:
$ctrlinpt = read-host ">>>"
switch -regex ( $ctrlinpt ) {
'^(?:\w{2,}|\d{2,})$' { do this }
'^c$' { do that }
'^l$' { do somethign else }
'^s$' { take a break }
'^exit$' { go home }
}
Since regex isn't really required for the last 4 cases, I'd suggest removing the -regex mode switch altogether and use a predicate for the first case instead:
$ctrlinpt = read-host ">>>"
switch ( $ctrlinpt ) {
{ $_ -match '^(?:\w{2,}|\d{2,})$' } { do this }
'c' { do that }
'l' { do somethign else }
's' { take a break }
'exit' { go home }
}

Related

Is it possible to use Regex to make the first of each word Capital, and replace underscores with spaces?

I can do each of them using regex, but I'm unsure how to combine them into a single regex to achieve both.
For example, this is what I'd like to happen:
HELLO_THERE -> Hello There
Found this, but it does the opposite of what I want, it lowercases the first, when i'd like it to lower case everything AFTER the first:
function changeStr(string){
return string.replace(/(?:_| |\b)(\w)/g, function($1){return $1.toLowerCase().replace('_',' ');});
}
changeStr(HELLO_THERE) -> hELLO tHERE
Not in a single regex, no. But you can make use of an anonymous function for the replacement:
function changeStr( string ){
// Match underscores or the alphabet
return string.replace(/[A-Za-z]+|_/g, function( match ){
if( match === '_' ){
// If we have an underscore then return a space
return ' ';
}
else{
// For everything else we capitalize the first char and lowercase the rest
return match.substr(0, 1).toUpperCase()+match.substr(1).toLowerCase();
}
});
}
console.log( changeStr( 'HELLO_THERE_my_frIenD_MonkeyZeus' ) );

A regular expression to exclude a word or character at the start and end of a user inputted answer

Goal:
I have an answer box that will take inputs from a user. To reduce the amount of possible correct answers I enter into the backend I would like to use a RegExp to ignore certain words/characters at the start and at the end of the answer.
So if the answer was: mountain.
I would also accept: the mountain OR a mountain OR the mountains...
What I have tried:
bool checkAnswer(String answer) {
List _answers = widget.answers;
for (var i = 0; i < _answers.length; ++i) {
// This
_correctAnswer = RegExp("?!(the)|(an)|(a)?${widget.answers[i]}s?").hasMatch(answer);
if (_correctAnswer) {
return _correctAnswer;
}
}
return false;
}
This function looks at the user input and compares it to the answers I've inputted on the backend. The problem is that it also accepts individual characters 't,h,e'...etc
So you want to remove a leading "the " or "a ", and a trailing plural "s".
Maybe:
bool checkAnswer(String answer, List<String> validAnswers) {
for (var i = 0; i < validAnswers.length; i++) {
var re = RegExp("^(?:the | an? )?${validAnswers[i]}s?\$", caseSensitive: false);
if (re.hasMatch(answer)) return true;
}
return false;
}
This checks that string is one of the valid answers, optionally prefixed by the , a or an , and optionally suffixed by s.
The leading ^ and trailing $ (\$ because it's in a non-raw string literal) ensures that the regexp matches the entire input (the ^ matches the beginning of the string and $ matches the end of the string). The | matches either the thing before it or the thing after it (the "alternatives").
If you want to match more different suffixes, say "es" or ".", you change the s? to accept those too:
var re = RegExp("^(?:the | an? )?${validAnswers[i]}(?:e?s)?\\.?\$", caseSensitive: false);
I'm guessing you want to accept s or es and, independently of that, a ..
Your original RegExp was not valid. If it runs at all, I cannot predict what it would do. (For example, a RegExp must not start with ?).
I recommend reading a RegExp tutorial.
This is how I ended solving the problem to :
Ignore at the start: the, an, a
Ignore at the end: s, es and .
bool checkAnswer(String answer) {
List _answers = widget.answers;
for (var i = 0; i < _answers.length; i++) {
var re = RegExp("^(?:the |an |a )?${_answers[i]}(?:s|es)?.?\$", caseSensitive: false);
if (re.hasMatch(answer)) return true;
}
return false;
}

How to check is Jenkins pram contains a character

I am trying to check if my Jenkins parameter contains a hostname.
But when I use Regular Expressions to see if it contains the name it doesn't check.
I would guess I have an error in the way I am checking or how I have it wrapped in brackets.
Below is a sample of what I am working with
stage('Release 1') {
when {
expression { params.SECRET_NAME != "" && params.STAGING_ENV != ("*some.host.name*") }
}
steps {
echo "Release 1"
}
}
stage('Release 2') {
when {
expression {params.STAGING_ENV == ("*some.host.name*") && params.SECRET_NAME == ("*+*") }
}
steps {
echo "Release 2"
}
}
}
I want it to skip the stage in my Jenkins pipeline if it does not meet the conditions
Ok, you need multiple changes here, from inside out:
Replace the * with .*. Simply put, in regex * denotes the same (set) of characters any number of times (abc* matches abccccc), whereas .* denotes any character any number of times (abc.* matches abccccc, abcdefg, abcadkhsdalksd, etc.).
Remove the double quotes " surrounding the regex patterns; lest you want them to be interpreted as string literals.
Wrap the regex patterns within delimiters, usually / to define the string boundary.
The brackets () themselves are optional here.
To match regular expressions, replace the equal operator == with the match operator ==~ (strict), which returns a boolean.
There is no "NOT match" operator in Groovy. To invert the match, you need to invert the result of the entire expression.
If the + in *+*should be a literal, then you must escape it as *\+*.
Stitching these together, your pipeline should look like:
stage('Release 1') {
when {
expression {
params.SECRET_NAME != "" && !(params.STAGING_ENV ==~ /.*some.host.name.*/)
}
}
steps {
echo "Release 1"
}
}
stage('Release 2') {
when {
expression {
params.STAGING_ENV ==~ /.*some.host.name.*/ && params.SECRET_NAME ==~ /.*\+.*/
}
}
steps {
echo "Release 2"
}
}
Further reading:
http://docs.groovy-lang.org/latest/html/documentation/core-operators.html
http://web.mit.edu/hackl/www/lab/turkshop/slides/regex-cheatsheet.pdf

Regex performance: validating alphanumeric characters

When trying to validate that a string is made up of alphabetic characters only, two possible regex solutions come to my mind.
The first one checks that every character in the string is alphanumeric:
/^[a-z]+$/
The second one tries to find a character somewhere in the string that is not alphanumeric:
/[^a-z]/
(Yes, I could use character classes here.)
Is there any significant performance difference for long strings?
(If anything, I'd guess the second variant is faster.)
Just by looking at it, I'd say the second method is faster.
However, I made a quick non-scientific test, and the results seem to be inconclusive:
Regex Match vs. Negation.
P.S. I removed the group capture from the first method. It's superfluous, and would only slow it down.
Wrote this quick Perl code:
#testStrings = qw(asdfasdf asdf as aa asdf as8up98;n;kjh8y puh89uasdf ;lkjoij44lj 'aks;nasf na ;aoij08u4 43[40tj340ij3 ;salkjaf; a;lkjaf0d8fua ;alsf;alkj
a a;lkf;alkfa as;ldnfa;ofn08h[ijo ok;ln n ;lasdfa9j34otj3;oijt 04j3ojr3;o4j ;oijr;o3n4f;o23n a;jfo;ie;o ;oaijfoia ;aosijf;oaij ;oijf;oiwj;
qoeij;qwj;ofqjf08jf0 ;jfqo;j;3oj4;oijt3ojtq;o4ijq;onnq;ou4f ;ojfoqn;aonfaoneo ;oef;oiaj;j a;oefij iiiii iiiiiiiii iiiiiiiiiii);
print "test 1: \n";
foreach my $i (1..1000000) {
foreach (#testStrings) {
if ($_ =~ /^([a-z])+$/) {
#print "match"
} else {
#print "not"
}
}
}
print `date` . "\n";
print "test 2: \n";
foreach my $j (1..1000000) {
foreach (#testStrings) {
if ($_ =~ /[^a-z]/) {
#print "match"
} else {
#print "not"
}
}
}
then ran it with:
date; <perl_file>; date
it isn't 100% scientific, but it gives us a good idea. The first Regex took 10 or 11 seconds to execute, the second Regex took 8 seconds.

Why does my regex fail when the number ends in 0?

This is a really basic regex question but since I can't seem to figure out why the match is failing in certain circumstances I figured I'd post it to see if anyone else can point out what I'm missing.
I'm trying to pull out the 2 sets of digits from strings of the form:
12309123098_102938120938120938
1321312_103810312032123
123123123_10983094854905490
38293827_1293120938129308
I'm using the following code to process each string:
if($string && $string =~ /^(\d)+_(\d)+$/) {
if(IsInteger($1) && IsInteger($2)) { print "success ('$1','$2')"; }
else { print "fail"; }
}
Where the IsInterger() function is as follows:
sub IsInteger {
my $integer = shift;
if($integer && $integer =~ /^\d+$/) { return 1; }
return;
}
This function seems to work most of the time but fails on the following for some reason:
1287123437_1268098784380
1287123437_1267589971660
Any ideas on why these fail while others succeed? Thanks in advance for your help!
This is an add-on to the answers from unicornaddict and ZyX: what are you trying to match?
If you're trying to match the sequences left and right of '_', unicorn addict is correct and your regex needs to be ^(\d+)_(\d+)$. Also, you can get rid of the first qualifier and the 'IsIntrger()` function altogether - you already know it's an integer - it matched (\d+)
if ($string =~ /^(\d+)_(\d+)$/) {
print "success ('$1','$2')";
} else {
print "fail\n";
}
If you're trying to match the last digit in each and wondering why it's failing, it's the first check in IsInteger() ( if($intger && ). It's redundant anyway (you know it's an integer) and fails on 0 because, as ZyX notes - it evaluates to false.
Same thing applies though:
if ($string =~ /^(\d)+_(\d)+$/) {
print "success ('$1','$2')";
} else {
print "fail\n";
}
This will output success ('8','8') given the input 12309123098_102938120938120938
Because you have 0 at the end of the second string, (\d)+ puts only the last match in the $N variable, string "0" is equivalent to false.
When in doubt, check what your regex is actually capturing.
use strict;
use warnings;
my #data = (
'1321312_103810312032123',
'123123123_10983094854905490',
);
for my $s (#data){
print "\$1=$1 \$2=$2\n" if $s =~ /^(\d)+_(\d)+$/;
# Output:
# $1=2 $2=3
# $1=3 $2=0
}
You probably intended the second of these two approaches.
(\d)+ # Repeat a regex group 1+ times,
# capturing only the last instance.
(\d+) # Capture 1+ digits.
In addition, both in your main loop and in IsInteger (which seems unnecessary, given the initial regex in the main loop), you are testing for truth rather than something more specific, such as defined or length. Zero, for example, is a valid integer but false.
Shouldn't + be included in the grouping:
^(\d+)_(\d+)$ instead of ^(\d)+_(\d)+$
Many people have commented on your regex, but the problem you had in your IsInteger (which you really don't need for your example). You checked for "truth" when you really want to check for defined:
sub IsInteger {
my $integer = shift;
if( defined $integer && $integer =~ /^\d+$/) { return 1; }
return;
}
You don't need most of the infrastructure in that subroutine though:
sub IsInteger {
defined $_[0] && $_[0] =~ /^\d+$/
}