match parentheses in powershell using regex - regex

I'm trying to check for invalid filenames. I want a filename to only contain lowercase, uppercase, numbers, spaces, periods, underscores, dashes and parentheses. I've tried this regex:
$regex = [regex]"^([a-zA-Z0-9\s\._-\)\(]+)$"
$text = "hel()lo"
if($text -notmatch $regex)
{
write-host 'not valid'
}
I get this error:
Error: "parsing "^([a-zA-Z0-9\s\._-\)\(]+)$" - [x-y] range in reverse order"
What am I doing wrong?

Try to move the - to the end of the character class
^([a-zA-Z0-9\s\._\)\(-]+)$
in the middle of a character class it needs to be escaped otherwise it defines a range

You can replace a-zA-Z0-9 and _ with \w.
$regex = [regex]"^([\w\s\.\-\(\)]+)$"
From get-help about_Regular_Expressions:
\w
Matches any word character.
Equivalent to the Unicode
character categories [\p{Ll}
\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}].
If ECMAScript-compliant behavior
is specified with the ECMAScript
option, \w is equivalent to
[a-zA-Z_0-9].

I guess, add a backslash before the lone hyphen:
$regex = [regex]"^([a-zA-Z0-9\s\._\-\)\(]+)$"

Related

My regexp has anorexia

I'm trying to get multiple key/value pairs from a string where the keys is on the left of an = character and the value on the right. So the following code
$line = <<END;
names='bob,jane, Alexander the Great' colors = "red,green" test= %results
END
my %hash = ($line =~ m/(\w+)\s*=\s*(.+?)/g);
for (keys %hash) { print "$_: $hash{$_}\n"; }
Should output
names: 'bob,jane, Alexander the Great'
colors: "red,green"
test: %results
But my regexp is just returning the first character of the value like
names: '
colors: "
and so on. If I change the second match to (.+) then it matches the whole line after the first =. Can someone fix this regexp?
Because .+? is non-greedy which stops once it finds a match since you're not giving any regex pattern next to non-greedy form.
my %hash = ($line =~ m/(\w+)\s*=\s*(.+?)(?=\h+\w+\h*=|$)/gm);
DEMO
(?=\h+\w+\h*=|$) called positive lookahead which asserts that the match must be followed by
\h+ one or more horizontal spaces.
\w+ one or more word characters.
\h* zero or more horizontal spaces.
= equal symbol.
| OR
$ End of the line anchor.
.+? says match one or more non-newline characters, preferring as few as possible.
You want .+ which matches one or more non-newline characters, preferring as many as possible.
Then it looks like you also need to stop at a matching quote, so
/(\w+)\s*=\s*('.+?'|".+?"|.+)/g
Though if spaces aren't allowed in unquoted values, you want ´\S+´ instead of ´.+´

Why do these two regexes behave differently?

Why do the following two regexes behave differently?
$millisec = "1391613310.1";
$millisec =~ s/.*(\.\d+)?$/$1/;
vs.
$millisec =~ s/\d*(\.\d+)?$/$1/;
This code prints nothing:
perl -e 'my $mtime = "1391613310.1"; my $millisec = $mtime; $millisec =~ s/.*(\.\d+)?$/$1/; print "$millisec";'
While this prints the decimal portion of the string:
perl -e 'my $mtime = "1391613310.1"; my $millisec = $mtime; $millisec =~ s/\d*(\.\d+)?$/$1/; print "$millisec";'
In the first regex, the .* is taking up everything to the end of the string, so there's nothing the optional (.\d+)? can pick up. $1 will be empty, so the string is replaced by an empty string.
In the second regex, only digits are grabbed from the beginning so that \d* stops in front of the dot. (.\d+)? will pick the dot, including the trailing digits.
You're using .\d+ inside parentheses, which will match any character plus digits. If you want to match a dot explicitly, you have to use \..
To make the first regex behave similarly to the second one you would have to write
$millisec =~ s/.*?(\.\d+)?$/$1/;
so that the initial .* doesn't take up everything.
Greed.
Perl's regex engine will match as much as possible with each term before moving on to the next term. So for .*(.\d+)?$ the .* matches the entire string, then (.\d)? matches nothing as it is optional.
\d*(.\d+)?$ can match only up to the dot, so then has to match .1 against (.\d+)?

Regex to remove what ever comes in front of "\" using powershell

wanted one help, wanted a regex to eliminate a "\" and what ever come before it,
Input should be "vmvalidate\administrator"
and the output should be just "administrator"
$result = $subject -creplace '^[^\\]*\\', ''
removes any non-backslash characters at the start of the string, followed by a backslash:
Explanation:
^ # Start of string
[^\\]* # Match zero or more non-backslash characters
\\ # Match a backslash
This means that if there is more than one backslash in the string, only the first one (and the text leading up to it) will be removed. If you want to remove everything until the last backslash, use
$result = $subject -creplace '(?s)^.*\\', ''
No need to use regex, try the split method:
$string.Split('\')[-1]
"vmvalidate\administrator" -replace "^.*?\\"
^ - from the begin of string
.* - any amount of any chars
? - lazy mode of quantifier
\ - "backslash" using escape character ""
All together it means "Replace all characters from the begin of string until backslash"
This is the way I used to do things before I learned about regex or splitting.
"vmvalidate\administrator".SubString("vmvalidate\administrator".IndexOf('\')+1)

How can I allow a literal dot in a Perl regular expression?

I use this condition to check if the value is alphanumeric values:
$value =~ /^[a-zA-Z0-9]+$/
How can I modify this regex to account for a possible dot . in the value without accepting any other special characters?
$value =~ /^[a-zA-Z0-9.]+$/
Using the alnum Posix character class, one char shorter :)
value =~ /^[[:alnum:].]+$/;
Don't forget the /i option and the \d character class.
$value =~ /^[a-z\d.]+$/i
If you don't want to allow any characters other than those allowed in the character class, you shouldn't use the $ end of line anchor since that allows a trailing newline. Use the absolute end-of-string anchor \z instead:
$value =~ /^[a-z0-9.]+\z/i;
Look at perl regular expressions
\w Match "word" character (alphanumeric plus "_")
$value =~ /^[\w+.]\.*$/;

Removing nonnumeric and nonalpha characters from a string?

What is the best way to remove all the special characters from a string - like these:
!##$%^&*(){}|:"?><,./;'[]\=-
The items having these characters removed would rather short, so would it be better to use REGEX on each or just use string manipulation?
Thx
Environment == C#/.NET
It's generally better to have a whitelist than a blacklist.
Regex has a convenient \w that, effectively means alphanumeric plus underscore (some variants also add accented chars (á,é,ô,etc) to the list, others don't).
You can invert that by using \W to mean everything that's not alphanumeric.
So replace \W with empty string will remove all 'special' characters.
Alternatively, if you do need a different set of characters to alphanumeric, you can use a negated character class: [^abc] will match everything that is not a or b or c, and [^a-z] will match everything that is not in the range a,b,c,d...x,y,z
The equivalent to \w is [A-Za-z0-9_] and thus \W is [^A-Za-z0-9_]
in php:
$tests = array(
'hello, world!'
,'this is a test'
,'and so is this'
,'another test with /slashes/ & (parenthesis)'
,'l3375p34k stinks'
);
function strip_non_alphanumerics( $subject )
{
return preg_replace( '/[^a-z0-9]/i', '', $subject );
}
foreach( $tests as $test )
{
printf( "%s\n", strip_non_alphanumerics( $test ) );
}
output would be:
helloworld
thisisatest
andsoisthis
anothertestwithslashesparenthesis
l3375p34kstinks
I prefer regex because the syntax is simpler to read and maintain:
# in Python
import re
re.sub("[abcdef]", "", text)
where abcdef are the properly escaped characters to be removed.
Alternatively, if you want only alphanumeric characters (plus the underscore), you could use:
re.sub("\W", "", text)
where \W represents a non-word character, i.e. [^a-zA-Z_0-9].
here's a simple regex
[^\w]
this should catch all non-word characters this will permit a-z A-Z 0-9 space and _ neither space nor _ were in your list so this works if you wanted to catch these also then I would do something like this:
/[a-z0-90/i
this is the PHP format for a-z and 0-9 the i makes it case-insensitive.
When you just want to have alphanumeric characters, you could just express this by using an inverted character class:
[^A-Za-z0-9]+
This means: every character that is not alphanumeric.
In what language are you going the regex?
For example, in Perl you can do a translation which would translate any of the chars in your list into nothing:
e.g. This will translate 'a','b','c' or 'd' into ''
$sentence =~ tr/abcd//;
Us the "tr" command?
You don't say what enviroment you're in... shell? C program? Java? Each of those would have different best solutions.
You can rather validate them at the frontend by getting the askey values of the keyed in characters.
The ideal approach in PHP would be...
$text = "ABCDEF...Á123";
$text = preg_replace( '/[^\p{L}]/i', '', $text);
print($text); # Output: ABCDEFÁ
Or, in Perl...
my $text = "ABCDEF...Á123";
$text =~ s/[^\p{L}]//gi;
print($text); # Output: ABCDEFÁ
If you simply match on [^a-zA-Z], you will miss all accented characters, which (for the most part), I imagine you would want to retain.