convert comma-separated string-pairs with regex

convert comma-separated string-pairs with regex - regex

I have a comma-separated list of first- and lastnames which I need to convert to SQL
(whitespace exists after the comma):
joe, cool
alice, parker
etc.
should become:
( firstname ='joe' and lastname = 'cool' ) or
( firstname ='alice' and lastname = 'parker' )
How can I achieve this with a regular expression?

In Perl you can do this:
s/(\S+),\s*(\S+)/( firstname ='\1' and lastname = '\2' )/
From the command line:
> perl -pe "s/(\S+),\s*(\S+)/( firstname ='\1' and lastname = '\2' )/" input.txt
Input:
joe, cool
alice, parker
Output:
( firstname ='joe' and lastname = 'cool' )
( firstname ='alice' and lastname = 'parker' )

Related

PowerShell Normalize List of Names

I have some really messed up names from a system that I'm trying to match First and Last names in AD. Just need to parse the strings. I have names such as :
Hagstrom, N.P., Ana (Analise)
Banas, R.N., Cynthia
Saltzmann, N.P., April
Lee, Christopher
Rajaram, Pharm.D., Sharmee
Goode Jr, John (Jack) L
Reyes, R.N., Meghan
Miller, M.S., Adrienne M
Chavez, Gabriela
Stevens, MS, CCC-SLP, Christopher
Lockwood Flores, R.N., Jessica
I have tried this, but for some reason, the GivenName isn't being returned properly.
$Name = "Saltzmann, N.P., April"
$GivenName = $Name.Split(",")[$Name.Split(",").GetUpperBound(0)]
$SN = $Name.Split(",")[0]
If ($SN.IndexOf("-") -gt -1) {
$HypenLast = $SN.Split("-")[0]
$SNName = $SN.Split("-")[1]
}
If ($GivenName.IndexOf(" ") -gt -1) {
$GivenName = $GivenName.Replace("(","").Replace(")","").Split(" ")[0]
$MiddleName =$GivenName.Replace("(","").Replace(")","").Split(" ")[1]
}
Trying to take everything before the first comma and everything after last comma, but take letters before the second space of the first name.
Trying to get LastName FirstName but then need to flip it to FirstName LastName. Thanks.

All of the names could be piped to a script block that uses a regex with some named capture groups. The named capture group values can be extracted to rebuild the name you need using string interpolation.
$nameList | ForEach-Object {
$match = [Text.RegularExpression.Regex]::Match($_, "(?<last>[\w\s]+),(?:.*,)?(?:\s*)(?<first>\w+)")
$lastName = $match.Groups["last"].Value
$firstName = $match.Groups["first"].Value
"$firstName $lastName"
}

Remove trailing punctuation from concatenated string

I have several string variables that I would like to turn into a comma-separated string in one variable. When I use egen concat with the punct(", ") option I get trailing commas if that associated row is missing from entries, which is common in my data.
I thought that I could remove the trailing commas with regexm() and a for loop, but my concatenated string variable doesn't change.
How do I get this REGEX to match in Stata? (Or maybe I'm on totally the wrong path.)
clear
input str5 name1 str5 name2 str5 name3
Tom Dick Harry
Tom "" ""
end
ds name*
local n: word count `r(varlist)'
display `n'
egen names = concat(name*), punct(", ")
generate names2 = names
forvalues i = 1/`n' {
replace names2 = regexr(names2, ",.$", "")
}
list
This provides:
. list
+-------------------------------------------------------------+
| name1 name2 name3 names names2 |
|-------------------------------------------------------------|
1. | Tom Dick Harry Tom, Dick, Harry Tom, Dick, Harry |
2. | Tom Tom, , Tom, , |
+-------------------------------------------------------------+

egen's concat() function just implements a loop. You can write your own instead:
gen names = name1
forval j = 2/4 {
replace names = cond(mi(names), name`j', names + "," + name`j') if !mi(name`j')
}

Does something like this work for your data?
clear
input str5 name1 str5 name2 str5 name3 str5 name4
Tom Dick Harry Hank
Tom "" "" Hank
Tom "" Harry "" Hank
Tom "" "" ""
end
list
egen names = concat(name*), punct(" ")
gen names2 = subinstr(itrim(names), " ", ", ", .)
list
If your string variables have spaces, e.g. "Hank and Gloria", that will fail.

How can I use Regular Expression to convert String to different substring?

I have a text file containing lines, similar to these
000001 , Line 1 of text , customer 1 name
000002 , Line 2 of text , customer 2 name
000003 , Line 3 of text , customer 3 name
= = =
= = =
= = =
000087 , Line 87 of text, customer 87 name
= = =
= = =
001327 , Line 1327 of text, customer 1327 name
= = =
= = =
= = =
I can write a program that reads each line of the above file to convert it to the following format:
000001 , 1st Line , 1st Customer name
000002 , 2nd Line , 2nd Customer name
000003 , 3rd Line , 3rd Customer name
= = =
= = =
= = =
000087 , 87th Line, 87th Customer name
= = =
= = =
001327 , 1327th Line, 1327th Customer name
= = =
= = =
= = =
My Question: is there a straight forward method to achieve the same output using Regular expression?
I tried the following:
Dim pattern As String = "(\d{6}) , (Line \d+ of text) , (customer \d name)"
Dim replacement As String = " $1 , $2 Line , $3 Customer name "
Dim rgx As New Regex(pattern)
Dim result As String = rgx.Replace(my_input_file, replacement)
but the result is far from the desired output.
Please help

Your regex captures too much. The groups should capture only digits:
Dim pattern As String = "(\d{6}) , Line (\d+) of text , customer (\d+) name"
Also, as you want to replace the numbers with ordinal numbers, you should rather use String.Format to do the formatting (line by line):
Dim match as Match = rgx.match(my_input_file_line)
Dim outputLine as String = String.Format(" {0} , {1} Line , {2} Customer name", _
m.Groups(1).Value, GetOrdinal(m.Groups(2).Value), GetOrdinal(m.Groups(3).Value))
where GetOrdinal is a method that changes a string for number to an ordinal number.

Your matching groups are to big. What you want to match are the numbers.
Replace (\d{6}) , Line (\d+) of text , customer (\d+) name
by $1 , $2th Line , $3th Customer name
Then replace 1th by 1st
Then replace 2th by 2nd
Then replace 3th by 3rd
I do not know if it was your intention to match a real cutomer name itself and replace it in another order ... was it?
Then you could use (with global and multiline flags)
^(\d{6}) , Line (\d+) of text , ([^ ]+) (\d) ([^ ]+)$
and replace with $1 , $2th Line , $4th $3 $5
Tip: I allways use http://www.gskinner.com/RegExr/ to test my patterns and experiment with them!

Is there a reason for using regex? Maybe i have misuderstood the requirement, but it seems to be a fix format where only the first part matters, so you could use this simple query:
IEnumerable<string> lines = File.ReadLines(#"folder\input_text.txt");
IEnumerable<string> result = lines
.Where(l => l.Trim().Length > 0)
.Select(l => int.Parse(l.Split(',').First().Trim()))
.Select(num => string.Format("{0} , {1} Line , {1} Customer name"
, num.ToString("D6")
, num + (num == 1 ? "st" : num == 2 ? "nd" : "rd")));
You can use File.WriteAllLines to write the result to the output file:
File.WriteAllLines(#"folder\desired_output.txt", result);

Replace last comma with or using ColdFusion

What is the best way to convert an array of values in ColdFusion
[ Fed Jones, John Smith, George King, Wilma Abby]
and to a list where the last comma is an or
Fed Jones, John Smith, George King or Wilma Abby
I thought REReplace might work but haven't found the right expression yet.

If you've got an array, combining the last element with an ArrayToList is the simplest way (as per Henry's answer).
If you've got it as a string, using rereplace is a valid method, and would work like so:
<cfset Names = rereplace( Names , ',(?=[^,]+$)' , ' or ' ) />
Which says match a comma, then check (without matching) that there are no more commas until the end of the string (which of course will only apply for the last comma, and it will thus be replaced).

It'd be easier to manipulate in the array level first, before converting into a list.
names = ["Fed Jones", "John Smith", "George King", "Wilma Abby"];
lastIndex = arrayLen(names);
last = names[lastIndex];
arrayDeleteAt(names, lastIndex);
result = arrayToList(names, ", ") & " or " & last;
// result == "Fed Jones, John Smith, George King or Wilma Abby"

Another option is to work with a list / string using listLast and the JAVA lastIndexOf() method of the result string.
<cfscript>
names = ["Fed Jones", "John Smith", "George King", "Wilma Abby"];
result = arraytoList(names,', ');
last = listLast(result);
result = listLen(result) gt 1 ? mid(result, 1, result.lastIndexOf(',')) & ' or' & last : result;
</cfscript>
<cfoutput>#result#</cfoutput>
Result:
Fed Jones, John Smith, George King or Wilma Abby

Regex Find One Word But Not Another

How would I detect if a string contained for instance my first name but not my last name using only a regular expression?

The easiest way to use regular expressions for this is to use simple regexes and logical connectives. Since you only want simple matches and since you didn't list a language, here is a basic implementation in Perl:
my $str1="firstname lastname blah blah blah";
my $str2="blurg firstname etc";
foreach($str1,$str2)
{
if(/firstname/ and !/lastname/)
{
print "$_ matched firstname and not lastname!\n";
}
else
{
print "No match for $_\n";
}
}
As expected, the output is:
No match for firstname lastname blah blah blah
blurg firstname etc matched firstname and not lastname!

How about:
#!/usr/bin/perl
use Modern::Perl;
while (<DATA>) {
chomp;
say /^(?=.*\bfirstname\b)(?!.*\blastname\b)/ ? "OK : $_" : "KO : $_";
}
__DATA__
jhjg firstname jkhkjh lastname kljh
jhgj lastname kjhjk firstname kjhkjh
jhgdf firstname sjhdfg not_my_lastname
jshgdf no_my_lastname jhg firstname jkhghg
output:
KO : jhjg firstname jkhkjh lastname kljh
KO : jhgj lastname kjhjk firstname kjhkjh
OK : jhgdf firstname sjhdfg not_my_lastname
OK : jshgdf no_my_lastname jhg firstname jkhghg

^(?=.*firstname)(?=.*lastname)
With many regex versions you can make something like this. I'm using zero-width lookahead to search for your firstname and your lastname. They don't "move" the regex cursor, so both are scanned starting from the first character. The regex will fail if firstname or lastname isn't present.
Seriously, the regex should be more complex, otherwise you could have these situations:
firstname = `name`
lastname = `lastname`
lastname // ok with the given rules
and with
firstname = `firstname`
lastname = `lastname`
firstnamelastname // ok with the given rules, even without the space
xfirstnamex xlastnamex // ok with the given rules
The regex would need to be:
^(.*\bfirstname\b.*\blastname\b)|(.*\blastname\b.*\bfirstname\b)
so checking for both orders of firstname and lastname and checking that there is a word separator before and after firstname and lastname.
I'll add that what I have showed are perfect examples of thing not to do. You want to user regexes? You don't! First you try to use the string functions of your language. Then, if they fail, you can try with regexes.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

convert comma-separated string-pairs with regex - regex

Related

PowerShell Normalize List of Names

Remove trailing punctuation from concatenated string

How can I use Regular Expression to convert String to different substring?

Replace last comma with or using ColdFusion

Regex Find One Word But Not Another

Categories

Resources