Replace last comma with or using ColdFusion - regex

What is the best way to convert an array of values in ColdFusion
[ Fed Jones, John Smith, George King, Wilma Abby]
and to a list where the last comma is an or
Fed Jones, John Smith, George King or Wilma Abby
I thought REReplace might work but haven't found the right expression yet.

If you've got an array, combining the last element with an ArrayToList is the simplest way (as per Henry's answer).
If you've got it as a string, using rereplace is a valid method, and would work like so:
<cfset Names = rereplace( Names , ',(?=[^,]+$)' , ' or ' ) />
Which says match a comma, then check (without matching) that there are no more commas until the end of the string (which of course will only apply for the last comma, and it will thus be replaced).

It'd be easier to manipulate in the array level first, before converting into a list.
names = ["Fed Jones", "John Smith", "George King", "Wilma Abby"];
lastIndex = arrayLen(names);
last = names[lastIndex];
arrayDeleteAt(names, lastIndex);
result = arrayToList(names, ", ") & " or " & last;
// result == "Fed Jones, John Smith, George King or Wilma Abby"

Another option is to work with a list / string using listLast and the JAVA lastIndexOf() method of the result string.
<cfscript>
names = ["Fed Jones", "John Smith", "George King", "Wilma Abby"];
result = arraytoList(names,', ');
last = listLast(result);
result = listLen(result) gt 1 ? mid(result, 1, result.lastIndexOf(',')) & ' or' & last : result;
</cfscript>
<cfoutput>#result#</cfoutput>
Result:
Fed Jones, John Smith, George King or Wilma Abby

Related

Regex: cut optional right side [duplicate]

Given the following:
"John Smith"
"John Smith (123)"
"John Smith (123) (456)"
I'd like to capture:
"John Smith"
"John Smith", "123"
"John Smith (123)", "456"
What Java regex would allow me to do that?
I've tried (.+)\s\((\d+)\)$ and it works fine for "John Smith (123)" and "John Smith (123) (456)" but not for "John Smith". How can I change the regex to work for the first input as well?
You may turn the first .+ lazy, and wrap the later part with a non-capturing optional group:
(.+?)(?:\s\((\d+)\))?$
^ ^^^ ^^
See the regex demo
Actually, if you are using the regex with String#matches() the last $ is redundant.
Details:
(.+?) - Group 1 capturing one or zero characters other than a linebreak symbol, as few as possible (thus, allowing the subsequent subpattern to "fall" into a group)
(?:\s\((\d+)\))? - an optional sequence of a whitespace, (, Group 2 capturing 1+ digits and a )
$ - end of string anchor.
A Java demo:
String[] lst = new String[] {"John Smith","John Smith (123)","John Smith (123) (456)"};
Pattern p = Pattern.compile("(.+?)(?:\\s\\((\\d+)\\))?");
for (String s: lst) {
Matcher m = p.matcher(s);
if (m.matches()) {
System.out.println(m.group(1));
if (m.group(2) != null)
System.out.println(m.group(2));
}
}

PowerShell Normalize List of Names

I have some really messed up names from a system that I'm trying to match First and Last names in AD. Just need to parse the strings. I have names such as :
Hagstrom, N.P., Ana (Analise)
Banas, R.N., Cynthia
Saltzmann, N.P., April
Lee, Christopher
Rajaram, Pharm.D., Sharmee
Goode Jr, John (Jack) L
Reyes, R.N., Meghan
Miller, M.S., Adrienne M
Chavez, Gabriela
Stevens, MS, CCC-SLP, Christopher
Lockwood Flores, R.N., Jessica
I have tried this, but for some reason, the GivenName isn't being returned properly.
$Name = "Saltzmann, N.P., April"
$GivenName = $Name.Split(",")[$Name.Split(",").GetUpperBound(0)]
$SN = $Name.Split(",")[0]
If ($SN.IndexOf("-") -gt -1) {
$HypenLast = $SN.Split("-")[0]
$SNName = $SN.Split("-")[1]
}
If ($GivenName.IndexOf(" ") -gt -1) {
$GivenName = $GivenName.Replace("(","").Replace(")","").Split(" ")[0]
$MiddleName =$GivenName.Replace("(","").Replace(")","").Split(" ")[1]
}
Trying to take everything before the first comma and everything after last comma, but take letters before the second space of the first name.
Trying to get LastName FirstName but then need to flip it to FirstName LastName. Thanks.
All of the names could be piped to a script block that uses a regex with some named capture groups. The named capture group values can be extracted to rebuild the name you need using string interpolation.
$nameList | ForEach-Object {
$match = [Text.RegularExpression.Regex]::Match($_, "(?<last>[\w\s]+),(?:.*,)?(?:\s*)(?<first>\w+)")
$lastName = $match.Groups["last"].Value
$firstName = $match.Groups["first"].Value
"$firstName $lastName"
}

Extract a substring if it has an exact match in another vector

Update: the first version of this question was implicitly asking how to extract a substring if it has ANY match in another vector, for which #Colonel Beauvel provided an elegant response:
This does the trick, base R:
newname = sapply(nametitle, function(u){
bool = sapply(name, function(x) grepl(x, u))
if(any(bool)) name[bool][1] else NA })
newname
John Smith, MD PhD Jane Doe, JD
"John" "Jane"
However, I did not realize that I was actually asking for a way to find exact matches until the function kindly contributed did not work for all elements in my vector. Therefore, the following is my revised question.
Say I have the following character vector of generic names and their academic degrees:
nametitle <- c("John Smith, MD PhD", "Jane Doe, JD", "John-Paul Jones, MS")
And I have a "look-up" vector of first names:
name <- c("John", "Jane", "Mark", "Steve")
What I want to do is search each element of nametitle, and if part of the element (i.e., a substring of each string) is an exact match of an element from name, then in a new vector newname, write that element of nametitle with the corresponding element of name, or if there is no exact match, write the original value from nametitle.
Therefore, what I'd expect the proper function to do is return newname with the three elements below:
[1] "John" [2] "Jane" [3] "John-Paul Jones, MS"
I've attempted the following using the function contributed above:
newname = sapply(nametitle, function(u){
bool = sapply(name, function(x) grepl(x, u))
if(any(bool)) name[bool][1] else NA })
Which performs just fine for elements "John Smith, MD Phd" and "Jane Doe, JD", but not for "John-Paul Jones, MS" -- this element is replaced with "John" in the new vector newname.
There may be a simple change that can be made to the original function contributed by #Colonel Beauvel to resolve this issue, but using nested sapply functions is throwing me through a loop (pun intended?). Thanks.
This does the trick, base R:
newname = sapply(nametitle, function(u){
bool = sapply(name, function(x) grepl(x, u))
if(any(bool)) name[bool][1] else NA
})
#>newname
#John Smith, MD PhD Jane Doe, JD
# "John" "Jane"
Here's an easy way. First, create a regex pattern based on your name vector:
pattern <- paste0(".*(?<=\\s|^)(", paste(name, collapse = "|"), ")(?=\\s|$).*")
# [1] ".*(?<=\\s|^)(John|Jane|Mark|Steve)(?=\\s|$).*"
If you use this pattern, a single sub command will do the trick:
sub(pattern, "\\1", nametitle, perl = TRUE)
# [1] "John" "Jane" "John-Paul Jones, MS"

Remove trailing punctuation from concatenated string

I have several string variables that I would like to turn into a comma-separated string in one variable. When I use egen concat with the punct(", ") option I get trailing commas if that associated row is missing from entries, which is common in my data.
I thought that I could remove the trailing commas with regexm() and a for loop, but my concatenated string variable doesn't change.
How do I get this REGEX to match in Stata? (Or maybe I'm on totally the wrong path.)
clear
input str5 name1 str5 name2 str5 name3
Tom Dick Harry
Tom "" ""
end
ds name*
local n: word count `r(varlist)'
display `n'
egen names = concat(name*), punct(", ")
generate names2 = names
forvalues i = 1/`n' {
replace names2 = regexr(names2, ",.$", "")
}
list
This provides:
. list
+-------------------------------------------------------------+
| name1 name2 name3 names names2 |
|-------------------------------------------------------------|
1. | Tom Dick Harry Tom, Dick, Harry Tom, Dick, Harry |
2. | Tom Tom, , Tom, , |
+-------------------------------------------------------------+
egen's concat() function just implements a loop. You can write your own instead:
gen names = name1
forval j = 2/4 {
replace names = cond(mi(names), name`j', names + "," + name`j') if !mi(name`j')
}
Does something like this work for your data?
clear
input str5 name1 str5 name2 str5 name3 str5 name4
Tom Dick Harry Hank
Tom "" "" Hank
Tom "" Harry "" Hank
Tom "" "" ""
end
list
egen names = concat(name*), punct(" ")
gen names2 = subinstr(itrim(names), " ", ", ", .)
list
If your string variables have spaces, e.g. "Hank and Gloria", that will fail.

Parse 'family' names into people + last name with regex

Given the following string, I'd like to parse into a list of first names + a last name:
Peter-Paul, Mary & Joël Van der Winkel
(and the simpler versions)
I'm trying to work out if I can do this with a regex. I've got this far
(?:([^, &]+))[, &]*(?:([^, &]+))
But the problem here is that I'd like the last name to be captured in a different capture.
I suspect I'm beyond what's possible, but just in case...
UPDATE
Extracting captures from the group was new for me, so here's the (C#) code I used:
string familyName = "Peter-Paul, Mary & Joël Van der Winkel";
string firstperson = #"^(?<First>[-\w]+)"; //.Net syntax for named capture
string lastname = #"\s+(?<Last>.*)";
string others = #"(?:(?:\s*[,|&]\s*)(?<Others>[-\w]+))*";
var reg = new Regex(firstperson + others + lastname);
var groups = reg.Match(familyName).Groups;
Console.WriteLine("LastName=" + groups["Last"].Value);
Console.WriteLine("First person=" + groups["First"].Value);
foreach(Capture firstname in groups["Others"].Captures)
Console.WriteLine("Other person=" + firstname.Value);
I had to tweak the accepted answer slightly to get it to cover cases such as:
Peter-Paul&Joseph Van der Winkel
Peter-Paul & Joseph Van der Winkel
Assuming a first name can not be two words with a space (otherwise Peter Paul Van der Winkel is not automatically parsable), then the following set of rules applies:
(first name), then any number of (, first name) or (& first name)
Everything left is the last name.
^([-\w]+)(?:(?:\s?[,|&]\s)([-\w]+)\s?)*(.*)
Seems that this might do the trick:
((?:[^, &]+\s*[,&]+\s*)*[^, &]+)\s+([^,&]+)