Validating file name input in Powershell - regex

I would like to validate input for file name and check if it contains invalid characters, in PowerShell. I had tried following approach, which works when just one of these character is entered but doesn't seem to work when a given alpha-numeric string contains these characters. I believe I didn't construct regex correctly, what would be the right way to validate whether given string contains these characters? Thanks in advance.
#Validate file name whether it contains invalid characters: \ / : * ? " < > |
$filename = "\?filename.txt"
if($filename -match "^[\\\/\:\*\?\<\>\|]*$")
{Write-Host "$filename contains invalid characters"}
else
{Write-Host "$filename is valid"}

I would use Path.GetInvalidFileNameChars() rather than hardcoding the characters in a regex pattern, and then use the String.IndexOfAny() method to test if the file name contains any of the invalid characters:
function Test-ValidFileName
{
param([string]$FileName)
$IndexOfInvalidChar = $FileName.IndexOfAny([System.IO.Path]::GetInvalidFileNameChars())
# IndexOfAny() returns the value -1 to indicate no such character was found
return $IndexOfInvalidChar -eq -1
}
and then:
$filename = "\?filename.txt"
if(Test-ValidFileName $filename)
{
Write-Host "$filename is valid"
}
else
{
Write-Host "$filename contains invalid characters"
}
If you don't want to define a new function, this could be simplified as:
if($filename.IndexOfAny([System.IO.Path]::GetInvalidFileNameChars()) -eq -1)
{
Write-Host "$filename is valid"
}
else
{
Write-Host "$filename contains invalid characters"
}

To fix the regex:
Try removing the ^ and $ which anchor it to the ends of the string.

Related

PowerShell regex to get just hex part in strings

I'm working on a function that gets the map of string key and it's hex value. I got the string key part working, but I'm having trouble getting the hex part to work. This is my function so far:
function Get-Contents4_h{
[cmdletbinding()]
Param ([string]$fileContent)
#define Error_Failed_To_Do_A 0x81A0 /* random comments */
#define Error_Failed_To_Do_B 0x810A
# create an ordered hashtable to store the results
$errorMap = [ordered]#{}
# process the lines one-by-one
switch -Regex ($fileContent -split '\r?\n') {
'define ([\w]*)' { # Error_Failed_To_Do_ #this works fine
$key = ($matches[1]).Trim()
}
'([0x\w]*)' { # 0x04A etc #this does not work
$errorMap[$key] = ($matches[1]).Trim()
}
}
# output the completed data as object
#[PsCustomObject]$errorMap
return $errorMap
}
I'm going to be looping through the returned map and matching the hex value with the key in another object.
This is what the string parameter to the function looks like:
#define Error_Failed_To_Do_A 0x81A0 /* random comments */
#define Error_Failed_To_Do_B 0x810A
For some reason my
0x\w
regex is not returning anything in regex101.com. I've had luck with that with other hex numbers but not this time.
I've tried this and other variations as well: ^[\s\S]*?#[\w]*[\s\S]+([0x\w]*)
This is with powershell 5.1 and VS Code.
You need to remove the [...] range construct around 0x\w - the 0x occurs exactly once in the input string, and the following characters appears at least once - but the expression [0x\w]* could be satisfied by an empty string (thanks to the *, 0-or-more quantifier).
I'd suggest matching the whole line at once with a single pattern instead:
switch -Regex ($fileContent -split '\r?\n') {
'^\s*#define\s+(\w+)\s+(0x\w+)' {
$key,$value = $Matches[1,2] |ForEach-Object Trim
$errorMap[$key] = $value
}
}
This works for me. The square brackets match any one character inside them at a time. The pattern with the square brackets has 18 matches in this line, the first match being empty string ''. Regex101.com says the same thing (null). https://regex101.com/r/PZ8Y8C/1 This would work 0x[\w]*, but then you might as well drop the brackets. I made an example data file and then a script on how I would do it.
'#define Error_Failed_To_Do_A 0x81A0 /* random comments */' |
select-string [0x\w]* -AllMatches | % matches | measure | % count
18
'#define Error_Failed_To_Do_A 0x81A0 /* random comments */
#define Error_Failed_To_Do_B 0x810A' |
set-content file.txt
# Get-Contents4_h.ps1
Param ($file)
switch -Regex -File $file {
'define (\w+).*(0x\w+)' {
[pscustomobject]#{
Error = $matches[1]
Hex = $matches[2]
}
}
}
.\Get-Contents4_h file.txt
Error Hex
----- ---
Error_Failed_To_Do_A 0x81A0
Error_Failed_To_Do_B 0x810A

How can I properly stop and start metacharacter interpolation in regexp in Perl

Editing to be more concise, pardon.
I need to be able to grep from an array using a string that may contain one of the following characters: '.', '+', '/', '-'. The string will be captured via from the user. The array contains each line of the file I'm searching through (I'm chomping the file into the array to avoid keeping it open while the user is interfacing with the program because it is on a cron and I do not want to have it open when the cron runs), and each line has a unique identifier within it which is the basis for the search string used in the regexp. The code below shows the grep statement I am using, and I use OUR and MY in my programs to make the variables I want access to in all namespaces available, and the ones I use only in subroutines not. If you do want to try and replicate the issue
#!/usr/bin/perl -w
use strict;
use Switch;
use Data::Dumper;
our $pgm_path = "/tmp/";
our $device_info = "";
our #new_filetype1 = ();
our #new_filetype2 = ();
our #dev_info = ();
our #pgm_files = ();
our %arch_rtgs = ();
our $file = "/path/file.csv";
open my $fh, '<', $file or die "Couldn't open $file!\n";
chomp(our #source_file = <$fh>);
close $fh;
print "Please enter the device name:\n";
chomp(our $dev = <STDIN>);
while ($device_info eq "") {
# Grep the device info from the sms file
my #sms_device = grep(/\Q$dev\E/, #source_file);
if (scalar(#sms_device) > 1) {
my $which_dup = find_the_duplicate(\#sms_device);
if ($which_dup eq "program") {
print "\n-> $sms_dev <- must be a program name instead of a device name." .
"\nChoose the device from the list you are working on, specifically.\n";
foreach my $fix(#sms_device) {
my #fix_array = split(',', $fix);
print "$fix_array[1]\n";
undef #fix_array;
}
chomp($sms_dev = <STDIN>);
} else { $device_info = $which_dup; }
} elsif (scalar(#sms_device) == 1) {
($device_info) = #sms_device;
#sms_device = ();
}
}
When I try the code with an anchor:
my #sms_device = grep(/\Q$dev\E^/, #source_file);
No more activity from the program is noticed. It just sits there like it's waiting on some more input from the user. This is not what I expected to happen. The reason I would like to anchor the search pattern is because there are many, many examples of similarly named devices that have the same character order as the search pattern, but also include additional characters that are ignored in the regexp evaluation. I don't want them to be ignored, in the sense that they are included in matches. I want to force an exact match of the string in the variable.
Thanks in advance for wading through my terribly inexperienced code and communication attempts at detailing my problem.
The device id followed by the start of the string? /\Q$dev\E^/ makes no sense. You want the device id to be preceded by the start of the string and followed by the end of the string.
grep { /^\Q$dev\E\z/ }
Better yet, let's avoid spinning up the regex engine for nothing.
grep { $_ eq $dev }
For example,
$ perl -e'my $dev = "ccc"; CORE::say for grep { /^\Q$dev\E\z/ } qw( accc ccc ccce );'
ccc
$ perl -e'my $dev = "ccc"; CORE::say for grep { $_ eq $dev } qw( accc ccc ccce );'
ccc
I would use quotemeta. Here is an example of how it compares:
my $regexp = '\t';
my $metaxp = quotemeta ($regexp);
while (<DATA>) {
print "match \$regexp - $_" if /$regexp/;
print "match \$metaxp - $_" if /$metaxp/;
}
__DATA__
This \t is not a tab
This is a tab
(there is literally a tab in the second line)
The meta version will match line 1, as it turned "\t" into essentially "\t," and the non-meta (original) version will match line 2, which assumes you are looking for a tab.
match $metaxp - This \t is not a tab
match $regexp - This is a tab
Hopefully you get my meaning.
I think adding $regexp = quotemeta ($regexp) (or doing it when you capture the standard input) should meet your need.

Replacing a string in between known strings over mutiple lines in Powershell

I'm entirely new to powershell and regex stuff, but i need help to do the following:
I need to replace 'VALUE' in a file given that i know the strings before and after it. It runs over multiple lines as well. eg:
<knownvalue1>
<knownvalue2>VALUE<knownvalue3>
knownvalue 2 and 3 are not unique so i need to include the knownvalue1 as the kind of 'identifier'
Also, To keep in a similar format as what has been previously done it needs to be along the lines of:
(gc $filename)-replace "(SEARCHPATTERN)","(REPLACEVAL)" | sc $filename
if you can't do it this way, then alternative ways will be okay.
I'm going crazy over this one so any help would be great.
Thanks.
Well, When you're importing the file and reading line for line you could split the line up like this:
Bassicly first we split it up on the delimiter > After doing so the string will be split up in these parts: <knownvalue2 and VALUE<knownvalue3 and (empty)
After that it would need to be split again on > however now that the string is in parts u need to split it on the parts. which makes it tricky.. and at the same time this is pretty dirty, atleast in my opinion..
$value = #()
$value += "<UniqueValue1>"
$value += "<knownvalue2>This Belongs to UniqueValue1<knownvalue3>"
$value += "</UniqueValue1>"
$value += "<UniqueValue2>"
$value += "<knownvalue2>This Belongs to UniqueValue2<knownvalue3>"
$value += "</UniqueValue2>"
[int]$count = 0
foreach($line in $value)
{
if($line.Equals("<UniqueValue1>"))
{
$parts = $value[($count + 1)].Split('>');
"Should be This Belongs to UnqiueValue1<knownvalue3"
$parts[1]
""
"Second Split"
$val = $parts[1].Split('<')
$val[0] + " #should be This belongs to UniqueValue1"
$val[1] + " #should be knownvalue3"
""
"Should be <knownvalue2"
$parts[0]
}
if($line.Equals("<UniqueValue2>"))
{
$parts = $value[($count + 1)].Split('>');
"Should be This belongs to UniqueValue2<knownvalue3"
$parts[1]
""
"Second Split"
$val = $parts[1].Split('<')
$val[0] + " #should be This Belongs to UniqueValue2"
$val[1] + " #should be knownvalue3"
""
"Should be <knownvalue2"
$parts[0]
}
"finished Reading line: $count"
$count++
}
Read the file as one string (gc outputs an array which is a very poor choice) and use a simple regex:
[IO.File]::ReadAllText($filename) -replace `
'(<knownvalue1>[\s\r\n]*<knownvalue2>)VALUE(<knownvalue3>)',
'$1(REPLACEVAL)$2'
Notes:
In PS3.0+ you can use (gc $filename -raw) instead of [IO.File]::ReadAllText($filename)
In case <knownvalue1>SKIP SOME OTHER TAGS<knownvalue2> to catch the SKIP section you can use <knownvalue1>(?s).*?<knownvalue2> to match any characters including newline (CR/LF).

How to extract a substring in perl

I am new in perl and need your help.
I am reading the contents of files in a directory.
I need to extract the substring from the files containing *.dat
Sample strings:
1) # ** Template Name: IFDOS_ARCHIVE.dat
2) # ** profile for IFNEW_UNIX_CMD.dat template **
3) # ** Template IFWIN_MV.dat **
Need to Extract:
1) IFDOS_ARCHIVE.dat
2) IFNEW_UNIX_CMD.dat
3) IFWIN_MV.dat
My code:
if(open(my $jobprofile, "./profiles/$vitem[0].profile")) {
my #jobprofiletemp = <$jobprofile>;
close($jobprofile);
#proftemplates = grep /.dat/,#jobprofiletemp;
my $strproftemp = $proftemplates[0];
my ($tempksh) = $strproftemp =~ / ([^_]*)./;
print "tempksh: $tempksh","\n";
} else { warn "problem opening ./$_\n\n"; }
my regex is not working.
what do you suggest?
I think you'll be better with something like:
while (<$jobprofile>) {
if ( /(\S+)\.dat/ ) {
print "$1\n";
}
}
(the while is there to make sure you parse every single line)
The regular expressions looks for a sequence of non-white-space characters (\S) followed by .dat.
The parenthesis surrounding \S+ capture the match of that part into the special variable $1.
Try this
open my $fh,"<","file.txt";
while (<$fh>)
{
next if /^\s+/; #skip the loop for empty line
($match) = /\s(\w+\.dat)/g; # group the matching word and store the input into the $match
print "$match\n";
}
or simply try the perl one liner
perl -ne' print $1,"\n" if (m/(\w+\.dat)/) ' file.txt
Or you are working in linux try the linux command for to do it
grep -ohP '\w+\.dat' one.txt
o display the matched element only
h for display the output without filename
P for perl regex

perl match multiple numbers from a variable similar to egrep

Want to match the number exactly from the variable which has multiple numbers seperated by pipe symbol similar to egrep.below is the code which i tried.
#!/usr/bin/perl
my $searchnum = $ARGV[0];
my $num = "148|1|0|256";
print $num;
if ($searchnum =~ /$num/)
{
print "found";
}
else
{
print "not-found";
}
Expected o/p
perl number_match.pl 1
found
perl number_match.pl 1432
not-found
The regex /148|1|0|256/ matches if the string that this regex is bound to contains a substring that is either 148, 1, 0 or 256. This means that the option 148 is superfluous, as this matches a subset of strings that match 1.
You probably want to test if the given string is equal to one of these options. If you want to use regexes, you have to anchor the regex at the start and the end of the string:
/^ (?:148|1|0|256) $/x
You could also use the grep builtin:
my $number = ...;
if (grep {$number eq $_} qw/148 1 0 256/) { say "found" }
else { say "not-found" }
The grep function takes a block that has to return a boolean value. It returns all elements from the list on the right where the condition returns true. If at least one element matches, then the whole expression evaluates to true.
You could also use a hash that contains all possible options:
my $number = ...;
my %options = map { $_ => undef } qw/148 1 0 256/;
if ( exists $options{$number} ) { say "found" }
else { say "not-found" }
This is more efficient than grep.
Use:
my $num = '^(148|1|0|256)$';
Here is a oneliner:
perl -e "$_=$ARGV[0]; exit if !/^\d+$/; print \"not-\" unless /^(14|156|0|89)$/;print \"found\n\";"