Bulk find-and-replace regexs in Google Sheets - regex

Is there a function, script, or add-on that can apply a large series of regex replacements to a range of data in Google Sheets? I have one sheet with a list of addresses and another with several dozen pairs of regular expressions in two columns (e.g. "St.$" and "Street"), and I want to replace all instances of the first column of phrases in the address list with the corresponding phrase in the other.
This question was initially closed as being answered by this, but even after some significant tweaking to fit my situation, I can only get that formula to replace one phrase per address, and any other matching phrases will be replaced by the first word (so "1234 N. Main St." becomes "1234 North Main North" instead of "1234 North Main Street"). Is there a method that specifically doesn't do that?

Assuming all your data is in a single column, here's a script (much cleaner and more extensible than the formula approach):
Note: this is not an in-place replacement.
function processColumn(column)
{
// Add more as needed:
// [Regex, Replacement]
let replaceTable =
[
[/\bN\./g, 'North'],
[/\bS\./g, 'South'],
[/\bSt\./g, 'Street']
];
// A column is an array of rows
// A row is an array of values.
return column.map(row =>
// This is an optimization to skip over blank values
row[0]
? replaceTable.reduce(
// Replace one value at a time, in order
(curString, tableEntry) => curString.replace(tableEntry[0], tableEntry[1]),
row[0]
)
: ''
);
}
Usage Example:
All your data is in A:A, like so:
A
123 N. Main St.
124 N. Main St.
19 S. Main St.
Then in column B:
=processColumn(A:A)
This will place all the processed values in column B:
B
123 North Main Street
124 North Main Street
19 South Main Street

Possibly this:
values = sheet.getDataRange().getDisplayValues()
values = values.map(outer => outer.map(inner => inner
.replaceAll(/\bN\./g, 'North')
.replaceAll(/\bS\./g, 'South')
.replaceAll(/\bSt\./g, 'Street')
)
);
Change this values = sheet.getDataRange().getDisplayValues() to what ever you need

Related

PHPOffice/PHPPresentation (PowerPoint) - how to prevent paragraph text from wrapping to a new line when in a table cell? Is there a dedicated method?

Inside a table cell I have a text - a number in fact but with space thousands separators like:
123 456 789
This data is dynamic and the value can vary a lot. I need a method to prevent this number from wrapping to a new line inside cell like:
123 456
789
I need to achieve something like when using the CSS property 'white-space: nowrap'.
Here is my code:
$oShape = $currentSlide->createTableShape(3); // 3 columns
$oShape->setWidth(180);
//...
foreach ($seriesData as $data) {
$oRow = $oShape->createRow();
//...
$oCell = $oRow->nextCell();
// $oCell->setWidth(70); // I don't want to define fixed width here
$oCell->createTextRun($data['string'])->getFont()->setBold(true)->setSize(8);
$oCell->getActiveParagraph()->getAlignment()->setHorizontal(Alignment::HORIZONTAL_RIGHT)->setMarginLeft(1)->setMarginTop(0)->setMarginRight(2)->setMarginBottom(0);
// ... etc. the rest of the cells that I don't have a problem with
The issue is that the paragraph inside this cell often wraps to a new line.
Is there a dedicated method or property for no wrap in such case?

Extract line-by-line info from multi-line cells in Google Sheets

A Google Sheet has multi-line information in its cells, e.g. address.
Each address can have a different number of lines, but we know:
1st line is always the name
Penultimate line is always the post code and city
Last line is always the country
And we are trying to split the address into 4 columns:
Name
Street address (i.e. the address left over after extracting name, post code & city, country)
Post code and city
Country
So col B (name) reads first line =REGEXEXTRACT(A1,”(\w.*)”)
Of course, I can figure out how many lines there are in each cell by counting next line character
=LEN(A1)-LEN(SUBSTITUTE(A1,CHAR(10),””))
If the formula returns 6, then there are 7 lines
How do we get columns C (street address), D (postcode and city) and E (country) formulaically?
I mean, sure, I can get country for this cell with
=REGEXEXTRACT(A2,”(\n.*){6}”)
but I can’t copy the formula over….the 6 above is manual input, which defeats the purpose. Since this is regex, it obviously can’t take cell references instead of 6, e.g.
=REGEXEXTRACT(Amazon!B4,”(\n.*){F1}”)
(if for example, I stored in column F the number of next line characters in column A)
try:
=ARRAYFORMULA(IFNA({REGEXEXTRACT(A2:A, "(.*)\n"),
REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(A2:A, "\n(.*)$", ), "\n(.*)$", ), "^(.*)\n", ),
REGEXEXTRACT(REGEXREPLACE(A2:A, "\n(.*)$", ), "(.*)$"),
REGEXEXTRACT(A2:A, "(.*)$")}))

Google Sheets: How can I extract partial text from a string based on a column of different options?

Goal: I have a bunch of keywords I'd like to categorise automatically based on topic parameters I set. Categories that match must be in the same column so the keyword data can be filtered.
e.g. If I have "Puppies" as a first topic, it shouldn't appear as a secondary or third topic otherwise the data cannot be filtered as needed.
Example Data: https://docs.google.com/spreadsheets/d/1TWYepApOtWDlwoTP8zkaflD7AoxD_LZ4PxssSpFlrWQ/edit?usp=sharing
Video: https://drive.google.com/file/d/11T5hhyestKRY4GpuwC7RF6tx-xQudNok/view?usp=sharing
Parameters Tab: I will add words in columns D-F that change based on the keyword data set and there will often be hundreds, if not thousands, of options for larger data sets.
Categories Tab: I'd like to have a formula or script that goes down the columns D-F in Parameters and fills in a corresponding value (in Categories! columns D-F respectively) based on partial match with column B or C (makes no difference to me if there's a delimiter like a space or not. Final data sheet should only have one of these columns though).
Things I've Tried:
I've tried a bunch of things. Nested IF formula with regexmatch works but seems clunky.
e.g. this formula in Categories! column D
=IF(REGEXMATCH($B2,LOWER(Parameters!$D$3)),Parameters!$D$3,IF(REGEXMATCH($B2,LOWER(Parameters!$D$4)),Parameters!$D$4,""))
I nested more statements changing out to the next cell in Parameters!D column (as in , manually adding $D$5, $D$6 etc) but this seems inefficient for a list thousands of words long. e.g. third topic will get very long once all dog breed types are added.
Any tips?
Functionality I haven't worked out:
if a string in Categories B or C contains more than one topic in the parameters I set out, is there a way I can have the first 2 to show instead of just the first one?
e.g. Cell A14 in Categories, how can I get a formula/automation to add both "Akita" & "German Shepherd" into the third topic? Concatenation with a CHAR(10) to add to new line is ideal format here. There will be other keywords that won't have both in there in which case these values will just show up individually.
Since this data set has a bunch of mixed breeds and all breeds are added as a third topic, it would be great to differentiate interest in mixes vs pure breeds without confusion.
Any ideas will be greatly appreciated! Also, I'm open to variations in layout and functionality of the spreadsheet in case you have a more creative solution. I just care about efficiently automating a tedious task!!
Try using custom function:
To create custom function:
1.Create or open a spreadsheet in Google Sheets.
2.Select the menu item Tools > Script editor.
3.Delete any code in the script editor and copy and paste the code below into the script editor.
4.At the top, click Save save.
To use custom function:
1.Click the cell where you want to use the function.
2.Type an equals sign (=) followed by the function name and any input value — for example, =DOUBLE(A1) — and press Enter.
3.The cell will momentarily display Loading..., then return the result.
Code:
function matchTopic(p, str) {
var params = p.flat(); //Convert 2d array into 1d
var buildRegex = params.map(i => '(' + i + ')').join('|'); //convert array into series of capturing groups. Example (Dog)|(Puppies)
var regex = new RegExp(buildRegex,"gi");
var results = str.match(regex);
if(results){
// The for loops below will convert the first character of each word to Uppercase
for(var i = 0 ; i < results.length ; i++){
var words = results[i].split(" ");
for (let j = 0; j < words.length; j++) {
words[j] = words[j][0].toUpperCase() + words[j].substr(1);
}
results[i] = words.join(" ");
}
return results.join(","); //return with comma separator
}else{
return ""; //return blank if result is null
}
}
Example Usage:
Parameters:
First Topic:
Second Topic:
Third Topic:
Reference:
Custom Functions
I've added a new sheet ("Erik Help") with separate formulas (highlighted in green currently) for each of your keyword columns. They are each essentially the same except for specific column references, so I'll include only the "First Topic" formula here:
=ArrayFormula({"First Topic";IF(A2:A="",,IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) & IFERROR(CHAR(10)&REGEXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))))})
This formula first creates the header (which can be changed within the formula itself as you like).
The opening IF condition leaves any row in the results column blank if the corresponding cell in Column A of that row is also blank.
JOIN is used to form a concatenated string of all keywords separated by the pipe symbol, which REGEXEXTRACT interprets as OR.
IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) will attempt to extract any of the keywords from each concatenated string in Columns B and C. If none is found, IFERROR will return null.
Then a second-round attempt is made:
& IFERROR(CHAR(10)&REGEXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>"")))))
Only this time, REGEXREPLACE is used to replace the results of the first round with null, thus eliminating them from being found in round two. This will cause any second listing from the JOIN clause to be found, if one exists. Otherwise, IFERROR again returns null for round two.
CHAR(10) is the new-line character.
I've written each of the three formulas to return up to two results for each keyword column. If that is not your intention for "First Topic" and "Second Topic" (i.e., if you only wanted a maximum of one result for each of those columns), just select and delete the entire round-two portion of the formula shown above from the formula in each of those columns.

Counting number of guests in delimited string in Google Sheets

I have a Google Sheets cell containing a list of people attending an event. Some of the guests will bring friends. So the cell (A1) can look like this:
Ben, Sarah + 2, James , Mary + 5
I need to count the total number of people attending, which in this case is 11. And so I was thinking of using a formula along these lines:
=count(SPLIT(SUBSTITUTE(A1,"+",","),","))
But this doesn't work because it's only counting the numbers as 1 item, and the COUNT function doesn't appear to work.
How can I make this work, so that it correctly gives the number of attendees as 11?
You can use the following formula
=IF(LEN(A2),
SUM(COUNTA(SPLIT(A2,",")),
IFERROR(SPLIT(REGEXREPLACE(A2,"\D"," ")," "))),"")
Functions used:
IF
LEN
SUM
COUNTA
SPLIT
IFERROR
REGEXREPLACE
You can obtain the total sum by doing this:
=if(regexmatch(A1,"\+"),sum(ArrayFormula(query(split(transpose(split(A1,",")),"\+"),"select count(Col1), sum(Col2)",0))),if(isblank(A1),"",counta(split(A1,","))))
Explanation:
if(regexmatch(A1,"\+"), something, if(isblank(A1),"",counta(split(A1,",")))) => if there are not + signs in the answer, then check if the cell is empty or not, if empty, print blank, else just count how many people are between commas, otherwise calculate with the plus ones. (explanation below)
split(A1,",")),"\+") => red area => will separate the cell by commas, and the result can be seen in the red area in the picture
split(TRANSPOSE(split(A1,",")),"\+") => green area => will loop through each of the results above and separate to a cell on the right the values that have a + sign between them, can be seen in the green area in the image
query(split(TRANSPOSE(split(A1,",")),"\+"),"select count(Col1), sum(Col2)",0) => blue area => then we will query the 2 columns, in the left one we want to count the number of rows in that column (the columns with the names), on the next column we want to sum the values (the plus ones)
sum(ArrayFormula(query(split(transpose(split(A1,",")),"\+"),"select count(Col1), sum(Col2)",0))) => yellow area => then we will sum the values of the 2 columns to get the final result

How to create new column that parses correct values from a row to a list

I am struggling on creating a formula with Power Bi that would split a single rows value into a list of values that i want.
So I have a column that is called ID and it has values such as:
"ID001122, ID223344" or "IRRELEVANT TEXT ID112233, MORE IRRELEVANT;ID223344 TEXT"
What is important is to save the ID and 6 numbers after it. The first example would turn into a list like this: {"ID001122","ID223344"}. The second example would look exactly the same but it would just parse all the irrelevant text from between.
I was looking for some type of an loop formula where you could use the text find function to find ID starting point and use middle function to extract 8 characters from the start but I had no progress in finding such. I tried making lists from comma separator but I noticed that not all rows had commas to separate IDs.
The end results would be that the original value is on one column next to the list of parsed values which then could be expanded to new rows.
ID Parsed ID
"Random ID123456, Text;ID23456" List {"ID123456","ID23456"}
Any of you have former experience?
Hey I found the answer by myself using a good article similar to my problem.
Here is my solution without any further text parsing which i can do later on.
each let
PosList = Text.PositionOf([ID],"ID",Occurrence.All),
List = List.Transform(PosList, (x) => Text.Middle([ID],x,8))
in List
For example this would result "(ID343137,ID352973) ID358388" into {ID343137,ID352973,ID358388}
Ended up being easier than I thought. Suppose the solution relied again on the lists!