Powershell Find String Between Characters and Replace - regex

In Powershell script, I have Hashtable contains personal information. The hashtable looks like
{first = "James", last = "Brown", phone = "12345"...}
Using this hashtable, I would like to replace strings in template text file. For each string matches #key# format, I want to replace this string to value that correspond to key in hashtable. Here is a sample input and output:
input.txt
My first name is #first# and last name is #last#.
Call me at #phone#
output.txt
My first name is James and last name is Brown.
Call me at 12345
Could you advise me how to return "key" string between "#"s so I can find their value for the string replacement function? Any other ideas for this problem is welcomed.

You could do this with pure regex, but for the sake of readability, I like doing this as more code than regex:
$tmpl = 'My first name is #first# and last name is #last#.
Call me at #phone#'
$h = #{
first = "James"
last = "Brown"
phone = "12345"
}
$new = $tmpl
foreach ($key in $h.Keys) {
$escKey = [Regex]::Escape($key)
$new = $new -replace "#$escKey#", $h[$key]
}
$new
Explanation
$tmpl contains the template string.
$h is the hashtable.
$new will contain the replaced string.
We enumerate through each of the keys in the hash.
We store a regex escaped version of the key in $escKey.
We replace $escKey surrounded by # characters with the hashtable lookup for the particular key.
One of the nice things about doing this is that you can change your hashtable and your template, and never have to update the regex. It will also gracefully handle the cases where a key has no corresponding replacable section in the template (and vice-versa).

You can create a template using an expandable (double-quoted) here-string:
$Template = #"
My first name is $($hash.first) and last name is $($hash.last).
Call me at $($hash.phone)
"#
$hash = #{first = "James"; last = "Brown"; phone = "12345"}
$Template
My first name is James and last name is Brown.
Call me at 12345

Related

replace not working with regex

I'm trying to replace a string input by a user. I have the following input (as a firstname, lastname)...
John, Doe
I am use the following code:
$userInput = $userInput -replace '\s',''
$firstName = $userInput -replace ",*$",""
$lastName = $userInput -replace "^*,",""
Output looks like the following:
$userInput = John,Doe
$firstName = John,Doe
$lastName = JohnDoe
I need the output to look like this:
$userInput = John,Doe
$firstName = John
$lastName = Doe
What am I doing wrong?
,*$ says to find 0 or more commas at the very end of the string (not what you want).
^*, is.. well, I'm not really sure it would be considered valid regex. I guess it would mean find 0 or more "beginning of string" followed by a comma (it's a weird thing to specify).
So for first name, you would really want something like this:
$firstName = $userInput -replace ',.*$',''
So that says, find a comma followed by 0 or more of any character followed by the end of the string (then replace it with nothing).
For last name:
$lastName = $userInput -replace '^.*?,',''
And this says, find the beginning of the string, followed by 0 or more of any character (non-greedy, that's what the ? after the * means), then replace it with nothing.
Aaaand as I'm writing this, #PetSerAl commented what my last solution was going to be, which is to use a split:
$firstName, $lastName = $userInput -split ',\s*'

Perl regex - Having the delimiter as part of the string itself

I have a long string in the format
id1:2014-08-05 11:24;Does this work?,id2:2014-08-04 13:22; Does this work,too?,id3:2014-07-25 16:56 ...
I am trying to extract the 'date' and 'comment' part out of this, based on the id, which is the input.
For example, if the input is id2, I'd want the comment as 'Does this work, too?' and date as '2014-08-04 13:22'. Here is the regex I have so far.
if($string =~ m/\b$id:(.*?);(.*,?)/){
my $date = $1;
my $comment = substr($2,0,-1); #to remove the last ,
}
Now since there is a ',' as part of the string itself, my regex treats it as a delimiter and just returns 'Does this work' as the comment, leaving out the ',too?' part.
Any help would really help as to how to handle when my string has the delimiter within itself.
I think the best way to do this is to form a hash out of the string. If you start by splitting the string on any comma that's immediately followed by some alphanumeric characters and a colon then the commas within the comments will be ignored and most of your work is done.
Then just use a regex to divide each split into three chunks: the ID, the date/time, and the comment, and put them into a hash. After that you can get the date/time for an ID as $data{id1}[0] and the comment as $data{id1}[1]
This program demonstrates
use strict;
use warnings;
my $s = 'id1:2014-08-05 11:24;Does this work?,id2:2014-08-04 13:22; Does this work,too?,id3:2014-07-25 16:56 ...';
my %data;
for (split /,(?=\w+:)/, $s) {
my #fields = /([^:]+):([^;]+);(.+)/g;
$data{$1} = [ $2, $3 ];
}
print $data{id2}[1], "\n";
output
Does this work,too?
$str = "id1:2014-08-05 11:24;Does this work?,id2:2014-08-04 13:22; Does this work,too?,id3:2014-07-25 16:56; bla";
$id = "id2";
# I need comma to set the end of the last "record"
$str = $str . ",";
if ($str =~ /$id:([\d\-\: ]+);([ \w\?\,]+)\,/) {
print "date = $1\n";
print "comment = $2\n";
}

Powershell: Replacing regex named groups with variables

Say I have a regular expression like the following, but I loaded it from a file into a variable $regex, and so have no idea at design time what its contents are, but at runtime I can discover that it includes the "version1", "version2", "version3" and "version4" named groups:
"Version (?<version1>\d),(?<version2>\d),(?<version3>\d),(?<version4>\d)"
...and I have these variables:
$version1 = "3"
$version2 = "2"
$version3 = "1"
$version4 = "0"
...and I come across the following string in a file:
Version 7,7,0,0
...which is stored in a variable $input, so that ($input -match $regex) evaluates to $true.
How can I replace the named groups from $regex in the string $input with the values of $version1, $version2, $version3, $version4 if I do not know the order in which they appear in $regex (I only know that $regex includes these named groups)?
I can't find any references describing the syntax for replacing a named group with the value of a variable by using the group name as an index to the match - is this even supported?
EDIT:
To clarify - the goal is to replace templated version strings in any kind of text file where the version string in a given file requires replacement of a variable number of version fields (could be 2, 3, or all 4 fields). For example, the text in a file could look like any of these (but is not restricted to these):
#define SOME_MACRO(4, 1, 0, 0)
Version "1.2.3.4"
SomeStruct vs = { 99,99,99,99 }
Users can specify a file set and a regular expression to match the line containing the fields, with the original idea being that the individual fields would be captured by named groups. The utility has the individual version field values that should be substituted in the file, but has to preserve the original format of the line that will contain the substitutions, and substitute only the requested fields.
EDIT-2:
I think I can get the result I need with substring calculations based on the position and extent of each of the matches, but was hoping Powershell's replace operation was going to save me some work.
EDIT-3:
So, as Ansgar correctly and succinctly describes below, there isn't a way (using only the original input string, a regular expression about which you only know the named groups, and the resulting matches) to use the "-replace" operation (or other regex operations) to perform substitutions of the captures of the named groups, while leaving the rest of the original string intact. For this problem, if anybody's curious, I ended up using the solution below. YMMV, other solutions possible. Many thanks to Ansgar for his feedback and options provided.
In the following code block:
$input is a line of text on which substitution is to be performed
$regex is a regular expression (of type [string]) read from a file that has been verified to contain at least one of the supported named groups
$regexToGroupName is a hash table that maps a regex string to an array of group names ordered according to the order of the array returned by [regex]::GetGroupNames(), which matches the left-to-right order in which they appear in the expression
$groupNameToVersionNumber is a hash table that maps a group name to a version number.
Constraints on the named groups within $regex are only (I think) that the expression within the named groups cannot be nested, and should match at most once within the input string.
# This will give us the index and extent of each substring
# that we will be replacing (the parts that we will not keep)
$matchResults = ([regex]$regex).match($input)
# This will hold substrings from $input that were not captured
# by any of the supported named groups, as well as the replacement
# version strings, properly ordered, but will omit substrings captured
# by the named groups
$lineParts = #()
$startingIndex = 0
foreach ($groupName in $regexToGroupName.$regex)
{
# Excise the substring leading up to the match for this group...
$lineParts = $lineParts + $input.Substring($startingIndex, $matchResults.groups[$groupName].Index - $startingIndex)
# Instead of the matched substring, we'll use the substitution
$lineParts = $lineParts + $groupNameToVersionNumber.$groupName
# Set the starting index of the next substring that we will keep...
$startingIndex = $matchResults.groups[$groupName].Index + $matchResults.groups[$groupName].Length
}
# Keep the end of the original string (if there's anything left)
$lineParts = $lineParts + $input.Substring($startingIndex, $input.Length - $startingIndex)
$newLine = ""
foreach ($part in $lineParts)
{
$newLine = $newLine + $part
}
$input= $newLine
Simple Solution
In the scenario where you simply want to replace a version number found somewhere in your $input text, you could simply do this:
$input -replace '(Version\s+)\d+,\d+,\d+,\d+',"`$1$Version1,$Version2,$Version3,$Version4"
Using Named Captures in PowerShell
Regarding your question about named captures, that can be done by using curly brackets. i.e.
'dogcatcher' -replace '(?<pet>dog|cat)','I have a pet ${pet}. '
Gives:
I have a pet dog. I have a pet cat. cher
Issue with multiple captures & solution
You can't replace multiple values in the same replace statement, since the replacement string is used for everything. i.e. if you did this:
'dogcatcher' -replace '(?<pet>dog|cat)|(?<singer>cher)','I have a pet ${pet}. I like ${singer}''s songs. '
You'd get:
I have a pet dog. I like 's songs. I have a pet cat. I like 's songs. I have a pet . I like cher's songs.
...which is probably not what you're hoping for.
Rather, you'd have to do a match per item:
'dogcatcher' -replace '(?<pet>dog|cat)','I have a pet ${pet}. ' -replace '(?<singer>cher)', 'I like ${singer}''s songs. '
...to get:
I have a pet dog. I have a pet cat. I like cher's songs.
More Complex Solution
Bringing this back to your scenario, you're not actually using the captured values; rather you're hoping to replace the spaces they were in with new values. For that, you'd simply want this:
$input = 'I''m running Programmer''s Notepad version 2.4.2.1440, and am a big fan. I also have Chrome v 56.0.2924.87 (64-bit).'
$version1 = 1
$version2 = 3
$version3 = 5
$version4 = 7
$v1Pattern = '(?<=\bv(?:ersion)?\s+)\d+(?=\.\d+\.\d+\.\d+)'
$v2Pattern = '(?<=\bv(?:ersion)?\s+\d+\.)\d+(?=\.\d+\.\d+)'
$v3Pattern = '(?<=\bv(?:ersion)?\s+\d+\.\d+\.)\d+(?=\.\d+)'
$v4Pattern = '(?<=\bv(?:ersion)?\s+\d+\.\d+\.\d+\.)\d+'
$input -replace $v1Pattern, $version1 -replace $v2Pattern, $version2 -replace $v3Pattern,$version3 -replace $v4Pattern,$version4
Which would give:
I'm running Programmer's Notepad version 1.3.5.7, and am a big fan. I also have Chrome v 1.3.5.7 (64-bit).
NB: The above could be written as a 1 liner, but I've broken it down to make it simpler to read.
This takes advantage of regex lookarounds; a way of checking the content before and after the string you're capturing, without including those in the match. i.e. so when we select what to replace we can say "match the number that appears after the word version" without saying "replace the word version".
More info on those here: http://www.regular-expressions.info/lookaround.html
Your Example
Adapting the above to work for your example (i.e. where versions may be separated by commas or dots, and there's no consistency to their format beyond being 4 sets of numbers:
$input = #'
#define SOME_MACRO(4, 1, 0, 0)
Version "1.2.3.4"
SomeStruct vs = { 99,99,99,99 }
'#
$version1 = 1
$version2 = 3
$version3 = 5
$version4 = 7
$v1Pattern = '(?<=\b)\d+(?=\s*[\.,]\s*\d+\s*[\.,]\s*\d+\s*[\.,]\s*\d+\b)'
$v2Pattern = '(?<=\b\d+\s*[\.,]\s*)\d+(?=\s*[\.,]\s*\d+\s*[\.,]\s*\d+\b)'
$v3Pattern = '(?<=\b\d+\s*[\.,]\s*\d+\s*[\.,]\s*)\d+(?=\s*[\.,]\s*\d+\b)'
$v4Pattern = '(?<=\b\d+\s*[\.,]\s*\d+\s*[\.,]\s*\d+\s*[\.,]\s*)\d+\b'
$input -replace $v1Pattern, $version1 -replace $v2Pattern, $version2 -replace $v3Pattern,$version3 -replace $v4Pattern,$version4
Gives:
#define SOME_MACRO(1, 3, 5, 7)
Version "1.3.5.7"
SomeStruct vs = { 1,3,5,7 }
Regular expressions don't work that way, so you can't. Not directly, that is. What you can do (short of using a more appropriate regular expression that groups the parts you want to keep) is to extract the version string and then in a second step replace that substring with the new version string:
$oldver = $input -replace $regexp, '$1,$2,$3,$4'
$newver = $input -replace $oldver, "$Version1,$Version2,$Version3,$Version4"
Edit:
If you don't even know the structure, you must extract that from the regular expression as well.
$version = #($version1, $version2, $version3, $version4)
$input -match $regexp
$oldver = $regexp
$newver = $regexp
for ($i = 1; $i -le 4; $i++) {
$oldver = $oldver -replace "\(\?<version$i>\\d\)", $matches["version$i"]
$newver = $newver -replace "\(\?<version$i>\\d\)", $version[$i-1]
}
$input -replace $oldver, $newver

Can anyone explain me this regex meaning

I would like to understand this expression meaning.
$req_msg =~ s/ \${$toReplace}/$replacements->{$toReplace}/g;
Prerequisite for this to work are two variables:
$toReplace - contains an arbitrary value
$replacements - a HASH ref containing, erm, replacements
Given $toReplace contains "foo", the contents of $req_msq are searched for ${foo} (with a leading single space) with every occurence of this being replaced with $replacements->{foo}.
$req_msg =~ s/ \${$toReplace}/$replacements->{$toReplace}/g;
s is used for substitution. $content=~ s/old_value/new_value/modifier; (modifier can be i, g, x, along or combination)
Ex:
$content = "Hi I am a coder and I like coding very much!";
$content =~ s/i/eye/i;
now $content will contain "Heye eye am a coders and eye like coding very much"
In the same way ${$toReplace} which simply means a scalar reference is the old value which needs to be replace and $replacements->{$toReplace} means $replacements is a hash reference whose key is $toReplace .
It is smiliar to $hash_value = hash_ref->{key};
whereever it finds the value returned by scalar reference , gets replace by hash reference's key with the corresponding value found in $req_msg
But I guess you asked this question because you got blank replacement. That may be due to scalar reference problem.
This code snippet may help in removing your doubt.
#!/usr/bin/perl
use strict;
use warnings;
my $value = "Jassi";
my $scalar_ref = \$value;
print "scalar_ref = $scalar_ref \n and value = $value and ${$scalar_ref}\n";
my %hash = ("Jassi", "aliencoders");
my $hash_ref = \%hash;
my $reg_msg = "Hi this is Jassi";
print "reg_msg = $reg_msg \n";
$reg_msg =~ s/${$scalar_ref}/$hash_ref->{${$scalar_ref}}/;
print "reg_msg after s = $reg_msg\n";
See the second last line!
It replaces every occurance of the text ${blabla} with whatever is stored in the hash reference $replacements with the key blabla, e.g.:
$replacements = { 'blabla' => 'blubb' };
will make every ${blabla} being replaced by blubb in $req_msg.

Perl Arrays and grep

I think its more a charachters, anyway, I have a text file, consisted of something like that:
COMPANY NAME
City
Addresss,
Address number
Email
phone number
and so on... (it repeats itself, but with different data...), lets assume thing text is now in $strting variable.
I want to have an array (#row), for example:
$row[0] = "COMPANY NAME";
$row[1] = "City";
$row[2] = "Addresss,
Address number";
$row[3] = "Email";
$row[4] = "phone number";
At first I though, well thats easily can be done with grep, something like that:
1) #rwo = grep (/^^$/, $string);
No GO!
2) #row = grep (/\n/, $string);
still no go, tried also with split and such, still no go.
any idea?
thanks,
FM has given an answer that works using split, but I wanted to point out that Perl makes this really easy if you're reading this data from a filehandle. All you need to do is to set the special variable $/ to an empty string. This puts Perl into "paragraph mode". In this mode each record returned by the file input operator will contain a paragraph of text rather than the usual line.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
local $/ = '';
my #row = <DATA>;
chomp #row;
print Dumper(\#row);
__DATA__
COMPANY NAME
City
Addresss,
Address number
Email
phone number
The output is:
$ ./addr
$VAR1 = [
'COMPANY NAME',
'City',
'Addresss,
Address number',
'Email ',
'phone number'
];
The way I understand your question, you want to grab the items separated by at least one blank line. Although /\n{2,}/ would be correct in a literal sense (split on one or more newlines), I would suggest the regex below, because it will handle nearly blank lines (those containing only whitespace characters).
use strict;
use warnings;
my $str = 'COMPANY NAME
City
Addresss,
Address number
Email
phone number';
my #items = split /\n\s*\n/, $str;
use strict;
use warnings;
my $string = "COMPANY NAME
City
Addresss,
Address number
Email
phone number";
my #string_parts = split /\n\n+/, $string;
foreach my $test (#string_parts){
print"$test\n";
}
OUTPUT:
COMPANY NAME
City
Addresss,
Address number
Email
phone number
grep cannot take a string as an argument.
This is why you need to split the string on the token that you're after (as FM shows).
While it isn't clear what you need this for, I would strongly recommend considering the Tie::File module: