Tidy up names using regexp in google sheets script

Tidy up names using regexp in google sheets script - regex

I'm trying to find a neat code to standardize names, using google sheet script and regexp.
Each value (all in caps) has a first name composed of 1 or 2 words.
Then comes the second name with just the initial(s). If it is a composed name, there are 2 initials. When there are 2 initials, the formatting varies greatly.
What I want to achieve, using google script : keep the first name full, and standardize the second name as one or two characters followed by a dot.
"LE FARD MA." , "LE FARD M.A." , "LE FARD M-A" , "LE FARD M A." , "LE FARD M. A." , "LE FARD M.-A." , "LE FARD M-A." : should all return "LE FARD MA."
"FARD P". , "FARD P" : should return "FARD P."
I have been trying to learn about regexp for the past few hours, but can't manage to find an answer. Any help much appreciated.

Thanks to the comment of s1c0j1, I managed to get a code working.
function Test() {
var a ="LE FARD M-A.";
var regex = new RegExp(/(?:\s+)(\w?)(?:\.?\s?-?)(\w?)\.?$/);
var res = regex(a);
var haha = a.replace(res[0]," " + res[1] + res[2] + ".");
Logger.log(haha);
}
Maybe not the most elegant way, but problem solved, thanks.

Related

Google Sheets RegexpReplace with computable replacers

I'm trying to replace a pattern with some string computed with other GSheets functions. For example, I want to make all the int numbers in the string ten times larger: "I want to multiply 2 numbers in this string by 10" should turn into "I want to multiply 20 numbers in this string by 100".
Assuming for short, that my string is in A1 cell, I've tried a construction
REGEXREPLACE(A1, "([0-9]+)", TEXT(10*VALUE("$1"),"###"))
But it seems REGEXREPLACE firstly computes the arguments and only after that yields regular expression rules. So it converts 3rd argument
TEXT(10*VALUE("$1"),"###") => TEXT(10*1,"###") => "10"
and then just replaces all integers in the string with 10.
It turns out, I need to substitute the group $1 BEFORE implementing outer functions in the 3rd argument. Is there any way to do such a thing?

Maybe there's another way. See if this works
=join(" ", ArrayFormula(if(isnumber(split(A1, " ")), split(A1, " ")*10, split(A1, " "))))

try:
=ARRAYFORMULA(JOIN(" ", IFERROR(SPLIT(A1, " ")*10, SPLIT(A1, " "))))
or:
=ARRAYFORMULA(JOIN(" ", IF(ISNUMBER(SPLIT(A1, " ")), SPLIT(A1, " ")*10, SPLIT(A1, " "))))

"Dave&#39 s Market" to "Daves Market"

I have some strings like "Dave&#39 s Market" or "C&#39 est la vie" I would like to convert to "Daves Market" and "Cest la vie" respectively. I know it is something like '[&#39]+' but I cannot get the optional " s" to be just "s".

The regex substitution s/&#39 //g should work, see this demo.

What you'd rather want is something like replacing all those escaped literals with what they should actually be, e.g.:
Dave's -> Dave's
Me & Her -> Me & Her
Then you'll have to use some kind of replacement code and regex.
An example(in JavaScript):
var m = new Map();
m.set("'", "'");
m.set("&", "&");
// and so on
m.forEach(function(value, key) {
// text contains your text
text = text.replace(new RegExp(key), value);
}

Remove everything except period and numbers from string regex in R

I know there are many questions on stack overflow regarding regex but I cannot accomplish this one easy task with the available help I've seen. Here's my data:
a<-c("Los Angeles, CA","New York, NY", "San Jose, CA")
b<-c("c(34.0522, 118.2437)","c(40.7128, 74.0059)","c(37.3382, 121.8863)")
df<-data.frame(a,b)
df
a b
1 Los Angeles, CA c(34.0522, 118.2437)
2 New York, NY c(40.7128, 74.0059)
3 San Jose, CA c(37.3382, 121.8863)
I would like to remove the everything but the numbers and the period (i.e. remove "c", ")" and "(". This is what I've tried thus far:
str_replace(df$b,"[^0-9.]","" )
[1] "(34.0522, 118.2437)" "(40.7128, 74.0059)" "(37.3382, 121.8863)"
str_replace(df$b,"[^\\d\\)]+","" )
[1] "34.0522, 118.2437)" "40.7128, 74.0059)" "37.3382, 121.8863)"
Not sure what's left to try. I would like to end up with the following:
[1] "34.0522, 118.2437" "40.7128, 74.0059" "37.3382, 121.8863"
Thanks.

If I understand you correctly, this is what you want:
df$b <- gsub("[^[:digit:]., ]", "", df$b)
or:
df$b <- strsplit(gsub("[^[:digit:]. ]", "", df$b), " +")
> df
a b
1 Los Angeles, CA 34.0522, 118.2437
2 New York, NY 40.7128, 74.0059
3 San Jose, CA 37.3382, 121.8863
or if you want all the "numbers" as a numeric vector:
as.numeric(unlist(strsplit(gsub("[^[:digit:]. ]", "", df$b), " +")))
[1] 34.0522 118.2437 40.7128 74.0059 37.3382 121.8863

Try this
gsub("[\\c|\\(|\\)]", "",df$b)
#[1] "34.0522, 118.2437" "40.7128, 74.0059" "37.3382, 121.8863"

Not a regular expression solution, but a simple one.
The elements of b are R expressions, so loop over each element, parsing it, then creating the string you want.
vapply(
b,
function(bi)
{
toString(eval(parse(text = bi)))
},
character(1)
)

Here is another option with str_extract_all from stringr. Extract the numeric part using str_extract_all into a list, convert to numeric, rbind the list elements and cbind it with the first column of 'df'
library(stringr)
cbind(df[1], do.call(rbind,
lapply(str_extract_all(df$b, "[0-9.]+"), as.numeric)))

Regex pattern for ex "w l" should match sentences starting with 2 words one with "w" and another with "l"

For ex :
Lets say "Welcome Lion" is the string then if the user types "W L" then that string should match.
Please give regex pattern for this scenario. Thank you.

I guess you could try something like this :
entered_string = "W L";
matching_string = "Welcome Lion";
if matching_string.replace(/[^[:upper:]]/g, "").equals(entered_string) {
return true;
}
return false;
Note that this is pseudo code and won't probably compile in any language.
matching_string.replace(/[^[:upper:]]/g, "")
This part removes each non uppercase letter.
But then as jwodder said I don't get what you were wondering about 2 \b.

Finally the following pattern works
var char1 = "w";
var char2 = "l";
var regex_str = "\\b" + char1 + "(\\w).*?" + "\\b" + char2;
new RegExp(regex_str, "gim");
now change char1 and char2 as per user input and this code should work for any case
#yami please update if you find this answer helpful

Parse 'family' names into people + last name with regex

Given the following string, I'd like to parse into a list of first names + a last name:
Peter-Paul, Mary & Joël Van der Winkel
(and the simpler versions)
I'm trying to work out if I can do this with a regex. I've got this far
(?:([^, &]+))[, &]*(?:([^, &]+))
But the problem here is that I'd like the last name to be captured in a different capture.
I suspect I'm beyond what's possible, but just in case...
UPDATE
Extracting captures from the group was new for me, so here's the (C#) code I used:
string familyName = "Peter-Paul, Mary & Joël Van der Winkel";
string firstperson = #"^(?<First>[-\w]+)"; //.Net syntax for named capture
string lastname = #"\s+(?<Last>.*)";
string others = #"(?:(?:\s*[,|&]\s*)(?<Others>[-\w]+))*";
var reg = new Regex(firstperson + others + lastname);
var groups = reg.Match(familyName).Groups;
Console.WriteLine("LastName=" + groups["Last"].Value);
Console.WriteLine("First person=" + groups["First"].Value);
foreach(Capture firstname in groups["Others"].Captures)
Console.WriteLine("Other person=" + firstname.Value);
I had to tweak the accepted answer slightly to get it to cover cases such as:
Peter-Paul&Joseph Van der Winkel
Peter-Paul & Joseph Van der Winkel

Assuming a first name can not be two words with a space (otherwise Peter Paul Van der Winkel is not automatically parsable), then the following set of rules applies:
(first name), then any number of (, first name) or (& first name)
Everything left is the last name.
^([-\w]+)(?:(?:\s?[,|&]\s)([-\w]+)\s?)*(.*)

Seems that this might do the trick:
((?:[^, &]+\s*[,&]+\s*)*[^, &]+)\s+([^,&]+)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Tidy up names using regexp in google sheets script - regex

Related

Google Sheets RegexpReplace with computable replacers

"Dave&#39 s Market" to "Daves Market"

Remove everything except period and numbers from string regex in R

Regex pattern for ex "w l" should match sentences starting with 2 words one with "w" and another with "l"

Parse 'family' names into people + last name with regex

Categories

Resources