Powershell - Regex date range replace - regex

I have an input file which contains some start dates and if those dates are before a specific date 1995-01-01 (YYYY-MM-DD format) then replace the date with the minimum value e.g.
<StartDate>1970-12-23</StartDate>
would be changed to
<StartDate>1995-01-01</StartDate>
<StartDate>1996-05-12</StartDate> is ok and would remain unchanged.
I was hoping to use regex replace but checking for the date range isn't working as expected. I was hoping to use something like this for the range check
\b(?:1900-01-(?:3[01]|2[1-31])|1995/01/01)\b

You can use a simple regex like '<StartDate>(\d{4}-\d{2}-\d{2})</StartDate>' to match <StartDate>, 4 digits, -, 2 digits, -, 2 digits, and </StartDate>, and then use a callback method to parse the captured into group 1 date and use Martin's code there to compare dates. If the date is before the one defined, use the min date, else, use the one captured.
$callback = {
param($match)
$current = [DateTime]$match.Groups[1].Value
$minimum = [DateTime]'1995-01-01'
if ($minimum -gt $current)
{
'<StartDate>1995-01-01</StartDate>'
}
else {
'<StartDate>' + $match.Groups[1].Value + '</StartDate>'
}
}
$text = '<StartDate>1970-12-23</StartDate>'
$rex = [regex]'<StartDate>(\d{4}-\d{2}-\d{2})</StartDate>'
$rex.Replace($text, $callback)
To use it with Get-Content and Foreach-Object, you may define the $callback as above and use
$rex = [regex]'<StartDate>(\d{4}-\d{2}-\d{2})</StartDate>'
(Get-Content $path\$xml_in) | ForEach-Object {$rex.Replace($_, $callback)} | Set-Content $path\$outfile

You don't have to use regex here. Just cast the dates to DateTime and compare them:
$currentDate = [DateTime]'1970-12-23'
$minDate = [DateTime]'1995-01-01'
if ($minDate -gt $currentDate)
{
$currentDate = $minDate
}

Related

How to remove and ID from a string

I have a string that looks like this, they are ids in a table:
1,2,3,4,5,6,7,8,9
If someone deletes something from the database, I will need to update the string. I know that doing this it will remove the value, but not the commas. Any idea how can I check if the id has a comma before and after so my string doesn't break?
$new_values = $original_values[0];
$new_values =~ s/$car_id//;
Result: 1,2,,4,5,6,7,8,9 using the above sample (bad). It should be 1,2,4,5,6,7,8,9.
To remove the $car_id from the string:
my $car_id = 3;
my $new_values = q{1,2,3,4,5,6,7,8,9};
$new_values = join q{,}, grep { $_ != $car_id }
split /,/, $new_values;
say $new_values;
# Prints:
# 1,2,4,5,6,7,8,9
If you already removed the id(s), and you need to remove the extra commas, reformat the string like so:
my $new_values = q{,,1,2,,4,5,6,7,8,9,,,};
$new_values = join q{,}, grep { /\d/ } split /,/, $new_values;
say $new_values;
# Prints:
# 1,2,4,5,6,7,8,9
You can use
s/^$car_id,|,$car_id\b//
Details
^ - start of string
$car_id - variable value
, - comma
| - or
, - comma
$car_id - variable value
\b - word boundary.
s/^\Q$car_id\E,|,\Q$car_id\E\b//
Another approach is to store an extra leading and trailing comma (,1,2,3,4,5,6,7,8,9,)
The main benefit is that it makes it easier to search for the id using SQL (since you can search for ,$car_id,). Same goes for editing it.
On the Perl side, you'd use
s/,\K\Q$car_id\E,// # To remove
substr($_, 1, -1) # To get actual string
Ugly way: use regex to remove the value, then simplify
$new_values = $oringa_value[0];
$new_values =~ s/$car_id//;
$new_values =~ s/,+/,/;
Nice way: split and merge
$new_values = $oringa_value[0];
my #values = split(/,/, $new_values);
my $index = 0;
$index++ until $values[$index] eq $car_id;
splice(#values, $index, 1);
$new_values = join(',', #values);

PowerShell Normalize List of Names

I have some really messed up names from a system that I'm trying to match First and Last names in AD. Just need to parse the strings. I have names such as :
Hagstrom, N.P., Ana (Analise)
Banas, R.N., Cynthia
Saltzmann, N.P., April
Lee, Christopher
Rajaram, Pharm.D., Sharmee
Goode Jr, John (Jack) L
Reyes, R.N., Meghan
Miller, M.S., Adrienne M
Chavez, Gabriela
Stevens, MS, CCC-SLP, Christopher
Lockwood Flores, R.N., Jessica
I have tried this, but for some reason, the GivenName isn't being returned properly.
$Name = "Saltzmann, N.P., April"
$GivenName = $Name.Split(",")[$Name.Split(",").GetUpperBound(0)]
$SN = $Name.Split(",")[0]
If ($SN.IndexOf("-") -gt -1) {
$HypenLast = $SN.Split("-")[0]
$SNName = $SN.Split("-")[1]
}
If ($GivenName.IndexOf(" ") -gt -1) {
$GivenName = $GivenName.Replace("(","").Replace(")","").Split(" ")[0]
$MiddleName =$GivenName.Replace("(","").Replace(")","").Split(" ")[1]
}
Trying to take everything before the first comma and everything after last comma, but take letters before the second space of the first name.
Trying to get LastName FirstName but then need to flip it to FirstName LastName. Thanks.
All of the names could be piped to a script block that uses a regex with some named capture groups. The named capture group values can be extracted to rebuild the name you need using string interpolation.
$nameList | ForEach-Object {
$match = [Text.RegularExpression.Regex]::Match($_, "(?<last>[\w\s]+),(?:.*,)?(?:\s*)(?<first>\w+)")
$lastName = $match.Groups["last"].Value
$firstName = $match.Groups["first"].Value
"$firstName $lastName"
}

PowerShell regular expression on logfile is capturing too much

I am trying to extract some text from a logfile, and I'm having problems.
Example text I am working on is:
ahksjhadjsadhsah
sakdsjakdjks
ksajdksaj
REF=35464
sadsad
213213
213
2
13
I need to extract the value "35464" (the REF number). I have limited knowledge of regular expressions, but thought 'REF=([0-9]+)' would do this.
Now I'm not sure how best I should be doing reading this file, so I've tried a couple of ways:
select-string -path e:\powershell\log.txt -pattern 'REF=([0-9]+)' | % { $_.Matches } | % { $_.Value }
Which gives me "REF=35464" - which I don't understand (why REF= is included), because I thought the 'capture' was only the parts in ()'s?
I also tried:
$data=Get-Content e:\powershell\log.txt
$data -match 'REF=([0-9]+)'
$Matches
But $Matches is empty.
I also tried a similar method to the above, but line by line, for example:
foreach ($line in $data)
{
$line -match 'REF=([0-9]+)'
}
I either get no matches or the full match (including the REF= part). I've also tried groups (that is, '(REF=)([0-9]+)'), and I can't get what I need.
How should I be reading the file? What is wrong with my regular expression?
I just need this extracted value as a usable variable.
It may be the way you are trying to access the capture group
I put this quick static class together to illustrate how to get the match you are looking for.
Note: I am using the # symbol on the regex and your input string to make them literals.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace SkunkWorks.RegexPractice
{
public static class RegexPractice2
{
public static string input = #"ahksjhadjsadhsah
sakdsjakdjks
ksajdksaj
REF=35464
sadsad
213213
213
2
13";
static string pat = #"REF=([0-9]+)";
public static void Do()
{
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
Match m = r.Match(input);
int matchCount = 0;
while (m.Success)
{
Console.WriteLine("Match" + (++matchCount));
for (int i = 1; i <= 2; i++)
{
Group g = m.Groups[i];
Console.WriteLine("Group" + i + "='" + g + "'");
CaptureCollection cc = g.Captures;
for (int j = 0; j < cc.Count; j++)
{
Capture c = cc[j];
System.Console.WriteLine("Capture" + j + "='" + c + "', Position=" + c.Index);
}
}
m = m.NextMatch();
}
}
}
}
What I usually do when I need to extract a substring from an array of strings is to use the automatic variable $Matches that is generated from using the -match operator in a Where statement. Like this:
$Data | Where{$_ -match "REF=([0-9]+)"} | ForEach{$Matches[1]}
Now, the $Matches variable there will be an array. The first entry will be the entire line that it matched, and the second object will be just the captured text, that is why I specify [1]. Now, about your RegEx that you're matching on... technically it's acceptable, but it's not very specific, so it really could return just the first number since [0-9]+ means 1 or more character that falls within the [0-9] scope. If you want to be sure that you get all of the numbers you can tell it to get everything to the end of the line by using the end-of-line character $ in your match like: REF=([0-9]+)$. We can't really tell if there's any whitespace after the numbers, so you might want to allow for that too using the \s notation that looks for any whitespace character (spaces, tabs, whatever), and using the asterisks after it which means zero or more. Then it becomes REF=([0-9]+)\s*$, which gets you exactly what you were looking for. Lastly, I would use \d instead of [0-9] because it does the same thing and it's shorter and simpler, and specifically made for the job. So, we have:
$Data | Where{$_ -match "REF=(\d+)\s*$"} | ForEach{$Matches[1]}
And that is broken down step by step and explained here: https://regex101.com/r/dG7jC7/1

Perl Replace 26 characters with numeric

I would like to replace a string with the numerical correspondent.
For example (one-liner on Windows):
perl -e "$_ = \"abcdefghijklmnopqrstuvwxyz\"; tr\a-z\1-9\;"
The result is:
12345678999999999999999999
This works until 9 but how I can assign the numeric correspondent after character i?
I would like to know how I can assign 2 sign to one 1 sign,
for example,
12 -> j, 13 -> k, etc.
To identify the numerical value it would makes sense to assign
"1-", "2-", ... "25-", "26".
perl -E"$_ = 'abcdefghijklmnopqrstuvwxyz'; s/([a-z])/ord($1)-96/ge; say;"
or if you have 5.14+
perl -E"say 'abcdefghijklmnopqrstuvwxyz' =~ s/([a-z])/ord($1)-96/ger;"
You can substitute any rule instead of ord($1) - 96.
I don't believe tr/// can do that unfortunately - it's a one-to-one character substitution. So you're going to have to go the long way round:
my %indicies = map { $_ => (ord($_) - ord('a')) + 1 } ('a' .. 'z');
my $result = join '', map { $indicies{$_} } split(//, $string);
Unfortunately that's not a one-liner.

In Perl, how many groups are in the matched regex?

I would like to tell the difference between a number 1 and string '1'.
The reason that I want to do this is because I want to determine the number of capturing parentheses in a regular expression after a successful match. According the perlop doc, a list (1) is returned when there are no capturing groups in the pattern. So if I get a successful match and a list (1) then I cannot tell if the pattern has no parens or it has one paren and it matched a '1'. I can resolve that ambiguity if there is a difference between number 1 and string '1'.
You can tell how many capturing groups are in the last successful match by using the special #+ array. $#+ is the number of capturing groups. If that's 0, then there were no capturing parentheses.
For example, bitwise operators behave differently for strings and integers:
~1 = 18446744073709551614
~'1' = Î ('1' = 0x31, ~'1' = ~0x31 = 0xce = 'Î')
#!/usr/bin/perl
($b) = ('1' =~ /(1)/);
print isstring($b) ? "string\n" : "int\n";
($b) = ('1' =~ /1/);
print isstring($b) ? "string\n" : "int\n";
sub isstring() {
return ($_[0] & ~$_[0]);
}
isstring returns either 0 (as a result of numeric bitwise op) which is false, or "\0" (as a result of bitwise string ops, set perldoc perlop) which is true as it is a non-empty string.
If you want to know the number of capture groups a regex matched, just count them. Don't look at the values they return, which appears to be your problem:
You can get the count by looking at the result of the list assignment, which returns the number of items on the right hand side of the list assignment:
my $count = my #array = $string =~ m/.../g;
If you don't need to keep the capture buffers, assign to an empty list:
my $count = () = $string =~ m/.../g;
Or do it in two steps:
my #array = $string =~ m/.../g;
my $count = #array;
You can also use the #+ or #- variables, using some of the tricks I show in the first pages of Mastering Perl. These arrays have the starting and ending positions of each of the capture buffers. The values in index 0 apply to the entire pattern, the values in index 1 are for $1, and so on. The last index, then, is the total number of capture buffers. See perlvar.
Perl converts between strings and numbers automatically as needed. Internally, it tracks the values separately. You can use Devel::Peek to see this in action:
use Devel::Peek;
$x = 1;
$y = '1';
Dump($x);
Dump($y);
The output is:
SV = IV(0x3073f40) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 1
SV = PV(0x30698cc) at 0x3073484
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x3079bb4 "1"\0
CUR = 1
LEN = 4
Note that the dump of $x has a value for the IV slot, while the dump of $y doesn't but does have a value in the PV slot. Also note that simply using the values in a different context can trigger stringification or nummification and populate the other slots. e.g. if you did $x . '' or $y + 0 before peeking at the value, you'd get this:
SV = PVIV(0x2b30b74) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 1
PV = 0x3079c5c "1"\0
CUR = 1
LEN = 4
At which point 1 and '1' are no longer distinguishable at all.
Check for the definedness of $1 after a successful match. The logic goes like this:
If the list is empty then the pattern match failed
Else if $1 is defined then the list contains all the catpured substrings
Else the match was successful, but there were no captures
Your question doesn't make a lot of sense, but it appears you want to know the difference between:
$a = "foo";
#f = $a =~ /foo/;
and
$a = "foo1";
#f = $a =~ /foo(1)?/;
Since they both return the same thing regardless if a capture was made.
The answer is: Don't try and use the returned array. Check to see if $1 is not equal to ""