Extract line-by-line info from multi-line cells in Google Sheets - regex

A Google Sheet has multi-line information in its cells, e.g. address.
Each address can have a different number of lines, but we know:
1st line is always the name
Penultimate line is always the post code and city
Last line is always the country
And we are trying to split the address into 4 columns:
Name
Street address (i.e. the address left over after extracting name, post code & city, country)
Post code and city
Country
So col B (name) reads first line =REGEXEXTRACT(A1,”(\w.*)”)
Of course, I can figure out how many lines there are in each cell by counting next line character
=LEN(A1)-LEN(SUBSTITUTE(A1,CHAR(10),””))
If the formula returns 6, then there are 7 lines
How do we get columns C (street address), D (postcode and city) and E (country) formulaically?
I mean, sure, I can get country for this cell with
=REGEXEXTRACT(A2,”(\n.*){6}”)
but I can’t copy the formula over….the 6 above is manual input, which defeats the purpose. Since this is regex, it obviously can’t take cell references instead of 6, e.g.
=REGEXEXTRACT(Amazon!B4,”(\n.*){F1}”)
(if for example, I stored in column F the number of next line characters in column A)

try:
=ARRAYFORMULA(IFNA({REGEXEXTRACT(A2:A, "(.*)\n"),
REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(A2:A, "\n(.*)$", ), "\n(.*)$", ), "^(.*)\n", ),
REGEXEXTRACT(REGEXREPLACE(A2:A, "\n(.*)$", ), "(.*)$"),
REGEXEXTRACT(A2:A, "(.*)$")}))

Related

Google Sheets: How can I extract partial text from a string based on a column of different options?

Goal: I have a bunch of keywords I'd like to categorise automatically based on topic parameters I set. Categories that match must be in the same column so the keyword data can be filtered.
e.g. If I have "Puppies" as a first topic, it shouldn't appear as a secondary or third topic otherwise the data cannot be filtered as needed.
Example Data: https://docs.google.com/spreadsheets/d/1TWYepApOtWDlwoTP8zkaflD7AoxD_LZ4PxssSpFlrWQ/edit?usp=sharing
Video: https://drive.google.com/file/d/11T5hhyestKRY4GpuwC7RF6tx-xQudNok/view?usp=sharing
Parameters Tab: I will add words in columns D-F that change based on the keyword data set and there will often be hundreds, if not thousands, of options for larger data sets.
Categories Tab: I'd like to have a formula or script that goes down the columns D-F in Parameters and fills in a corresponding value (in Categories! columns D-F respectively) based on partial match with column B or C (makes no difference to me if there's a delimiter like a space or not. Final data sheet should only have one of these columns though).
Things I've Tried:
I've tried a bunch of things. Nested IF formula with regexmatch works but seems clunky.
e.g. this formula in Categories! column D
=IF(REGEXMATCH($B2,LOWER(Parameters!$D$3)),Parameters!$D$3,IF(REGEXMATCH($B2,LOWER(Parameters!$D$4)),Parameters!$D$4,""))
I nested more statements changing out to the next cell in Parameters!D column (as in , manually adding $D$5, $D$6 etc) but this seems inefficient for a list thousands of words long. e.g. third topic will get very long once all dog breed types are added.
Any tips?
Functionality I haven't worked out:
if a string in Categories B or C contains more than one topic in the parameters I set out, is there a way I can have the first 2 to show instead of just the first one?
e.g. Cell A14 in Categories, how can I get a formula/automation to add both "Akita" & "German Shepherd" into the third topic? Concatenation with a CHAR(10) to add to new line is ideal format here. There will be other keywords that won't have both in there in which case these values will just show up individually.
Since this data set has a bunch of mixed breeds and all breeds are added as a third topic, it would be great to differentiate interest in mixes vs pure breeds without confusion.
Any ideas will be greatly appreciated! Also, I'm open to variations in layout and functionality of the spreadsheet in case you have a more creative solution. I just care about efficiently automating a tedious task!!
Try using custom function:
To create custom function:
1.Create or open a spreadsheet in Google Sheets.
2.Select the menu item Tools > Script editor.
3.Delete any code in the script editor and copy and paste the code below into the script editor.
4.At the top, click Save save.
To use custom function:
1.Click the cell where you want to use the function.
2.Type an equals sign (=) followed by the function name and any input value — for example, =DOUBLE(A1) — and press Enter.
3.The cell will momentarily display Loading..., then return the result.
Code:
function matchTopic(p, str) {
var params = p.flat(); //Convert 2d array into 1d
var buildRegex = params.map(i => '(' + i + ')').join('|'); //convert array into series of capturing groups. Example (Dog)|(Puppies)
var regex = new RegExp(buildRegex,"gi");
var results = str.match(regex);
if(results){
// The for loops below will convert the first character of each word to Uppercase
for(var i = 0 ; i < results.length ; i++){
var words = results[i].split(" ");
for (let j = 0; j < words.length; j++) {
words[j] = words[j][0].toUpperCase() + words[j].substr(1);
}
results[i] = words.join(" ");
}
return results.join(","); //return with comma separator
}else{
return ""; //return blank if result is null
}
}
Example Usage:
Parameters:
First Topic:
Second Topic:
Third Topic:
Reference:
Custom Functions
I've added a new sheet ("Erik Help") with separate formulas (highlighted in green currently) for each of your keyword columns. They are each essentially the same except for specific column references, so I'll include only the "First Topic" formula here:
=ArrayFormula({"First Topic";IF(A2:A="",,IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) & IFERROR(CHAR(10)&REGEXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))))})
This formula first creates the header (which can be changed within the formula itself as you like).
The opening IF condition leaves any row in the results column blank if the corresponding cell in Column A of that row is also blank.
JOIN is used to form a concatenated string of all keywords separated by the pipe symbol, which REGEXEXTRACT interprets as OR.
IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))) will attempt to extract any of the keywords from each concatenated string in Columns B and C. If none is found, IFERROR will return null.
Then a second-round attempt is made:
& IFERROR(CHAR(10)&REGEXEXTRACT(REGEXREPLACE(LOWER(B2:B&C2:C),IFERROR(REGEXEXTRACT(LOWER(B2:B&C2:C),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>""))))),""),JOIN("|",LOWER(FILTER(Parameters!D3:D,Parameters!D3:D<>"")))))
Only this time, REGEXREPLACE is used to replace the results of the first round with null, thus eliminating them from being found in round two. This will cause any second listing from the JOIN clause to be found, if one exists. Otherwise, IFERROR again returns null for round two.
CHAR(10) is the new-line character.
I've written each of the three formulas to return up to two results for each keyword column. If that is not your intention for "First Topic" and "Second Topic" (i.e., if you only wanted a maximum of one result for each of those columns), just select and delete the entire round-two portion of the formula shown above from the formula in each of those columns.

Bulk find-and-replace regexs in Google Sheets

Is there a function, script, or add-on that can apply a large series of regex replacements to a range of data in Google Sheets? I have one sheet with a list of addresses and another with several dozen pairs of regular expressions in two columns (e.g. "St.$" and "Street"), and I want to replace all instances of the first column of phrases in the address list with the corresponding phrase in the other.
This question was initially closed as being answered by this, but even after some significant tweaking to fit my situation, I can only get that formula to replace one phrase per address, and any other matching phrases will be replaced by the first word (so "1234 N. Main St." becomes "1234 North Main North" instead of "1234 North Main Street"). Is there a method that specifically doesn't do that?
Assuming all your data is in a single column, here's a script (much cleaner and more extensible than the formula approach):
Note: this is not an in-place replacement.
function processColumn(column)
{
// Add more as needed:
// [Regex, Replacement]
let replaceTable =
[
[/\bN\./g, 'North'],
[/\bS\./g, 'South'],
[/\bSt\./g, 'Street']
];
// A column is an array of rows
// A row is an array of values.
return column.map(row =>
// This is an optimization to skip over blank values
row[0]
? replaceTable.reduce(
// Replace one value at a time, in order
(curString, tableEntry) => curString.replace(tableEntry[0], tableEntry[1]),
row[0]
)
: ''
);
}
Usage Example:
All your data is in A:A, like so:
A
123 N. Main St.
124 N. Main St.
19 S. Main St.
Then in column B:
=processColumn(A:A)
This will place all the processed values in column B:
B
123 North Main Street
124 North Main Street
19 South Main Street
Possibly this:
values = sheet.getDataRange().getDisplayValues()
values = values.map(outer => outer.map(inner => inner
.replaceAll(/\bN\./g, 'North')
.replaceAll(/\bS\./g, 'South')
.replaceAll(/\bSt\./g, 'Street')
)
);
Change this values = sheet.getDataRange().getDisplayValues() to what ever you need

Counting number of guests in delimited string in Google Sheets

I have a Google Sheets cell containing a list of people attending an event. Some of the guests will bring friends. So the cell (A1) can look like this:
Ben, Sarah + 2, James , Mary + 5
I need to count the total number of people attending, which in this case is 11. And so I was thinking of using a formula along these lines:
=count(SPLIT(SUBSTITUTE(A1,"+",","),","))
But this doesn't work because it's only counting the numbers as 1 item, and the COUNT function doesn't appear to work.
How can I make this work, so that it correctly gives the number of attendees as 11?
You can use the following formula
=IF(LEN(A2),
SUM(COUNTA(SPLIT(A2,",")),
IFERROR(SPLIT(REGEXREPLACE(A2,"\D"," ")," "))),"")
Functions used:
IF
LEN
SUM
COUNTA
SPLIT
IFERROR
REGEXREPLACE
You can obtain the total sum by doing this:
=if(regexmatch(A1,"\+"),sum(ArrayFormula(query(split(transpose(split(A1,",")),"\+"),"select count(Col1), sum(Col2)",0))),if(isblank(A1),"",counta(split(A1,","))))
Explanation:
if(regexmatch(A1,"\+"), something, if(isblank(A1),"",counta(split(A1,",")))) => if there are not + signs in the answer, then check if the cell is empty or not, if empty, print blank, else just count how many people are between commas, otherwise calculate with the plus ones. (explanation below)
split(A1,",")),"\+") => red area => will separate the cell by commas, and the result can be seen in the red area in the picture
split(TRANSPOSE(split(A1,",")),"\+") => green area => will loop through each of the results above and separate to a cell on the right the values that have a + sign between them, can be seen in the green area in the image
query(split(TRANSPOSE(split(A1,",")),"\+"),"select count(Col1), sum(Col2)",0) => blue area => then we will query the 2 columns, in the left one we want to count the number of rows in that column (the columns with the names), on the next column we want to sum the values (the plus ones)
sum(ArrayFormula(query(split(transpose(split(A1,",")),"\+"),"select count(Col1), sum(Col2)",0))) => yellow area => then we will sum the values of the 2 columns to get the final result

How to count the number of blank cells in one column based on the first blank row in another column

I have a spreadsheet set up with tv program titles in column B, the next 20 or so columns are tracking different information about that title. I need to count the number of blank cells in column R relating to the range in column B that contains titles (ie, up to the first blank row in column B.)
I can easily set up a formula to count the number of empty cells in a given range in column R, the problem is as I add more titles to the sheet I would have to keep updating the range in the formula [a simple =COUNTIF(R3:R1108, "")]. I've done a little googling of the problem but haven't quite found anything that fits the situation. I thought I would be able to get the following to work but I didn't fully understand what was going on with them and they weren't giving the expected results.
I've tried these formulas:
=ArrayFormula(sum(MIN("B3:B"&MIN(IF((R3:R)>"",ROW(B3:B)-1)))))
=ArrayFormula(sum(INDIRECT("B3:B"&MIN(IF((R3:R)>"",ROW(B3:B)-1)))))
And
=if(SUM(B3:B)="","",SUM(R3:R))
All of the above formulas give "0" as the result. Based on the COUNTIF formula I have set up it should be 840, which is a number I would expect. Currently, there are 1106 rows containing data and 840 is a reasonable number to expect in this situation.
Is this what you're looking for?
=COUNTBLANK(INDIRECT(CONCATENATE("R",3,":R",(3+COUNTA(B3:B)))))
This counts the number of non-blank rows in the B column (starting at B3), and uses that to determine the rows to perform COUNTBLANK in, in column R (starting at R3). CONCATENATE is a way to give it a range by adding strings together, and the INDIRECT allows for the range reference to be a string.
a proper way would be:
=ARRAYFORMULA(COUNTBLANK(INDIRECT(ADDRESS(3, 18, 4)&":"&
ADDRESS(MAX(IF(B3:B<>"", ROW(B3:B), )), 18, 4)))
or shorter:
=ARRAYFORMULA(COUNTBLANK(INDIRECT("R3:"&
ADDRESS(MAX(IF(B3:B<>"", ROW(B3:B), )), 18, 4))))
or shorter:
=ARRAYFORMULA(COUNTBLANK(INDIRECT("R3:R"&MAX(IF(B3:B<>"", ROW(B3:B), ))))

How to populate a value when comparing two columns, VLOOKUP or IF?

I'm trying to create "Sale Rep" summaries by "Shop", where I can simply filter a column by the rep's name, them populate a total sales for each shop next to the relevant filter result.
I'm using this to filter all the Stores by Scott:
=(filter(D25:D47,A25:A47 = "Scott"))
Next, want to associate the Store/Account in F to populate with the corresponding value of E inside of G. So, G25 should populate the value of E25 ($724), G26 with E26 ($822), and F27 with E38 ($511.50)
I don't know how to write the formula correctly, but something like this is what I'm trying to do: =IF(F25=D25:D38),E25 I know that's not right, and it won't work in a fill down. But I'm basically trying to look for and copy over the correct value match of D and E inside of G. So, Misty Mountain Medicince in F27 will be matched to the value of E38 and populated in G27.
The filter is what's throwing me off, because it's not a simple fill down. And I don't know how to match filtered results from one column to a matched value in another.
Hope the screenshot helps. Screenshot of table:
Change Field Rep: Scott to Scott and you might apply:
=query(A25:E38,"select D,E where A='"&F24&"'")
// Enter the following into G25 and copy down column G
=(filter(E25:E47, D25:D47 = F25))
or
// Enter the following into G25 will expand with content in F upto row 47
=ArrayFormula(IF(F25:F47 <> 0, VLOOKUP(F25:F47, D25:E47, 2, FALSE),))