PowerShell Normalize List of Names - regex

I have some really messed up names from a system that I'm trying to match First and Last names in AD. Just need to parse the strings. I have names such as :
Hagstrom, N.P., Ana (Analise)
Banas, R.N., Cynthia
Saltzmann, N.P., April
Lee, Christopher
Rajaram, Pharm.D., Sharmee
Goode Jr, John (Jack) L
Reyes, R.N., Meghan
Miller, M.S., Adrienne M
Chavez, Gabriela
Stevens, MS, CCC-SLP, Christopher
Lockwood Flores, R.N., Jessica
I have tried this, but for some reason, the GivenName isn't being returned properly.
$Name = "Saltzmann, N.P., April"
$GivenName = $Name.Split(",")[$Name.Split(",").GetUpperBound(0)]
$SN = $Name.Split(",")[0]
If ($SN.IndexOf("-") -gt -1) {
$HypenLast = $SN.Split("-")[0]
$SNName = $SN.Split("-")[1]
}
If ($GivenName.IndexOf(" ") -gt -1) {
$GivenName = $GivenName.Replace("(","").Replace(")","").Split(" ")[0]
$MiddleName =$GivenName.Replace("(","").Replace(")","").Split(" ")[1]
}
Trying to take everything before the first comma and everything after last comma, but take letters before the second space of the first name.
Trying to get LastName FirstName but then need to flip it to FirstName LastName. Thanks.

All of the names could be piped to a script block that uses a regex with some named capture groups. The named capture group values can be extracted to rebuild the name you need using string interpolation.
$nameList | ForEach-Object {
$match = [Text.RegularExpression.Regex]::Match($_, "(?<last>[\w\s]+),(?:.*,)?(?:\s*)(?<first>\w+)")
$lastName = $match.Groups["last"].Value
$firstName = $match.Groups["first"].Value
"$firstName $lastName"
}

Related

Matching string between two markers that are filepaths and contain special characters

I'm trying to write a ruby script that will return text between two other strings. The issues is that the two matching string contain special characters. Escaping the special characters is not solving the problem.
I've tried escaping special characters, different matching patterns, and providing variables with the matching strings without much luck.
I've also tested a simplified match by using only ODS and NAME as delimiters. That seemed to work.
####Example contents of logfile
#### 'aaaaaaaaa ODS | Filename = /tmp/bbbbbb | NAME = ccccc'
log_to_scan = 'logfile'
marker1 = 'ODS | FILENAME = /tmp/'
marker2 = ' | NAME'
contents = File.read(log_to_scan)
print contents.match(/ODS \| FILENAME = \/tmp\/(.*) \| NAME/m[1].strip
print contents.match(/marker1(.*)marker2/m)[1].strip
Given the sample contents above, I am expecting the output to be bbbbbb. However, I am getting either nothing or a NoMethod error. Not sure what else to true or what I'm mistake I'm making.
str = 'aaaaaaaaa ODS | Filename = /tmp/bbbbbb | NAME = ccccc'
marker1 = 'ODS | FILENAME = /tmp/'
marker2 = ' | NAME'
r = /(?<=#{Regexp.escape(marker1)}).*(?=#{Regexp.escape(marker2)})/i
#=> /(?<=ODS\ \|\ FILENAME\ =\ \/tmp\/).*(?=\ \|\ NAME)/i
str[r]
#=> "bbbbbb"
or
r = /#{Regexp.escape(marker1)}(.*)#{Regexp.escape(marker2)}/i
str[r,1]
#=> "bbbbbb"
or, if the string to be matched is known to be lower-case, or it is permissible to return that string downcased:
s = str.downcase
#=> "aaaaaaaaa ods | filename = /tmp/bbbbbb | name = ccccc"
m1 = marker1.downcase
#=> "ods | filename = /tmp/"
m2 = marker2.downcase
#=> " | name"
id1 = s.index(m1) + m1.size
#=> 32
id2 = s.index(m2, id1+1) - 1
#=> 37
str[id1..id2]
#=> "bbbbbb"
See Regexp::escape. In #1,
(?<=#{Regexp.escape(marker1)})
is a positive lookbehind, requiring marker1 to appear immediately before the match.
(?=#{Regexp.escape(marker2)})
is a positive lookahead, requiring marker2 to immediately follow the match.
In #3, I used the form of String#index that takes a second argument ("offset").
Your original expression is just fine, we would be slightly modifying it here, if there might be other additional spaces in your string input and it might work:
^.+?ODS(\s+)?\|(\s+)?FILENAME(\s+)?=(\s+)?\/tmp\/(.+?)(\s+)?\|(\s+)?NAME(\s+)?=(\s+)?(.+?)$
and our desired outputs are in these two capturing groups:
(.+?)
Test
re = /^.+?ODS(\s+)?\|(\s+)?FILENAME(\s+)?=(\s+)?\/tmp\/(.+?)(\s+)?\|(\s+)?NAME(\s+)?=(\s+)?(.+?)$/mi
str = 'aaaaaaaaa ODS | Filename = /tmp/bbbbbb | NAME = ccccc'
# Print the match result
str.scan(re) do |match|
puts match.to_s
end
Demo
How about String#scanf?
> require 'scanf'
> str = 'ODS | FILENAME = /tmp/ | NAME'
> str.scanf('ODS | FILENAME = %s | NAME')
=> ["/tmp/"]

How to Extract people's last name start with "S" and first name not start with "S"

As the title shows, how do I capture a person who:
Last name start with letter "S"
First name NOT start with letter "S"
The expression should match the entire last name, not just the first letter, and first name should NOT be matched.
Input string is like the following:
(Last name) (First name)
Duncan, Jean
Schmidt, Paul
Sells, Simon
Martin, Jane
Smith, Peter
Stephens, Sheila
This is my regular expression:
/([S].+)(?:, [^S])/
Here is the result I have got:
Schmidt, P
Smith, P
the result included "," space & letter "P" which should be excluded.
The ideal match would be
Schmidt
Smith
You can try this pattern: ^S\w+(?=, [A-RT-Z]).
^S\w+ matches any word (name in your case) that start with S at the beginning,
(?=, [A-RT-Z]) - positive lookahead - makes sure that what follows, is not the word (first name in your case) starting with S ([A-RT-Z] includes all caps except S).
Demo
I did something similar to catch the initials. I've just updated the code to fit your need. Check it:
public static void Main(string[] args)
{
//Your code goes here
Console.WriteLine(ValidateName("FirstName LastName", 'L'));
}
private static string ValidateName(string name, char letter)
{
// Split name by space
string[] names = name.Split(new string[] {" "}, StringSplitOptions.RemoveEmptyEntries);
if (names.Count() > 0)
{
var firstInitial = names.First().ToUpper().First();
var lastInitial = names.Last().ToUpper().First();
if(!firstInitial.Equals(letter) && lastInitial.Equals(letter))
{
return names.Last();
}
}
return string.Empty;
}
In you current regex you capture the lastname in a capturing group and match the rest in a non capturing group.
If you change your non capturing group (?: into a positive lookahead (?= you would only capture the lastname.
([S].+)(?=, [^S]) or a bit shorter S.+(?=, [^S])
Your regex worked for me fine
$array = ["Duncan, Jean","Schmidt, Paul","Sells, Simon","Martin, Jane","Smith, Peter","Stephens, Sheila"];
foreach($array as $el){
if(preg_match('/([S].+)(?:,)( [^S].+)/',$el,$matches))
echo $matches[2]."<br/>";
}
The Answer I got is
Paul
Peter

Search for multiple string occurrence in String via PHP

I am working on an ecommerce website using MVC,php. I have a field called description. The user can enter multiple product id's in the description field.
For example {productID = 34}, {productID = 58}
I am trying to get all product ID's from this field. Just the product ID.
How do i go about this?
This solution doesn't use a capture group. Rather, it uses \K so that the full string elements become what would otherwise be captured using parentheses. This is a good practice because it reduces the array element count by 50%.
$description="{productID = 34}, {productID = 58}";
if(preg_match_all('/productID = \K\d+/',$description,$ids)){
var_export($ids[0]);
}
// output: array(0=>'34',1=>'58')
// \K in the regex means: keep text from this point
Without using regex, something like this should work for returning the string positions:
<code>
$html = "dddasdfdddasdffff";
$needle = "asdf";
$lastPos = 0;
$positions = array();
while (($lastPos = strpos($html, $needle, $lastPos))!== false) {
$positions[] = $lastPos;
$lastPos = $lastPos + strlen($needle);
}
// Displays 3 and 10
foreach ($positions as $value) {
echo $value ."<br />";
}
</code>

Powershell - Regex date range replace

I have an input file which contains some start dates and if those dates are before a specific date 1995-01-01 (YYYY-MM-DD format) then replace the date with the minimum value e.g.
<StartDate>1970-12-23</StartDate>
would be changed to
<StartDate>1995-01-01</StartDate>
<StartDate>1996-05-12</StartDate> is ok and would remain unchanged.
I was hoping to use regex replace but checking for the date range isn't working as expected. I was hoping to use something like this for the range check
\b(?:1900-01-(?:3[01]|2[1-31])|1995/01/01)\b
You can use a simple regex like '<StartDate>(\d{4}-\d{2}-\d{2})</StartDate>' to match <StartDate>, 4 digits, -, 2 digits, -, 2 digits, and </StartDate>, and then use a callback method to parse the captured into group 1 date and use Martin's code there to compare dates. If the date is before the one defined, use the min date, else, use the one captured.
$callback = {
param($match)
$current = [DateTime]$match.Groups[1].Value
$minimum = [DateTime]'1995-01-01'
if ($minimum -gt $current)
{
'<StartDate>1995-01-01</StartDate>'
}
else {
'<StartDate>' + $match.Groups[1].Value + '</StartDate>'
}
}
$text = '<StartDate>1970-12-23</StartDate>'
$rex = [regex]'<StartDate>(\d{4}-\d{2}-\d{2})</StartDate>'
$rex.Replace($text, $callback)
To use it with Get-Content and Foreach-Object, you may define the $callback as above and use
$rex = [regex]'<StartDate>(\d{4}-\d{2}-\d{2})</StartDate>'
(Get-Content $path\$xml_in) | ForEach-Object {$rex.Replace($_, $callback)} | Set-Content $path\$outfile
You don't have to use regex here. Just cast the dates to DateTime and compare them:
$currentDate = [DateTime]'1970-12-23'
$minDate = [DateTime]'1995-01-01'
if ($minDate -gt $currentDate)
{
$currentDate = $minDate
}

Matching the pattern with foreign character

Here i do a regular expression where _pattern is the list of teams and _name is the keyword i would like to find whether it matches the _pattern.
Result shows that it matched. I'm wondering why is it possible because the keyword is totally different to the _pattern. I suspect that it is related with the é symbol.
string _pattern = "Ipswich Town F.C.|Ipswich Town Football Club|Ipswich|The Blues||Town|The Tractor Boys|Ipswich Town";
string _name = "Estudiantes de Mérida";
regex = new Regex( #"(" + _pattern + #")", RegexOptions .IgnoreCase );
Match m = regex. Match (_name );
if (m . Success)
{
var g = m. Groups [1 ]. Value;
break ;
}
It has nothing to do with the é symbol. Let's go over a few things..
Is it right that there are 2 | in as your questions formulates :
The Blues||Town
Also the point has special meaning in a regex so you should escape it
meaIpswich Town F\.C\.
And alternatives should be enclosed with parenthesis:
(Ipswich Town F.C.)|(Ipswich Town Football Club)|(Ipswich)|
The parenthesis in the following java line are not necessary
regex = new Regex( #"(" + _pattern + #")"
Aneway, The reason that it matches is not do to a valid regex. I think it has to do with your use of the java API.
The regex that I would rewrite for your purposes is:
^((Ipswich Town F\.C\.)|(Ipswich Town Football Club)|(Ipswich)|(The Blues)|(Town)|(The Tractor Boys)|(Ipswich Town))$
As you can see, there are quit a few differences.