Splitting up data in Google Spreadsheet for adjacent columns - regex

I would like to split up data from a spreadsheet column (Work ID#, last name, first name) into two adjacent columns.. so that I have 3 separate columns for this data.
Here is a link to my spreadsheet: https://docs.google.com/spreadsheets/d/1gsLYnrNHEMbZTML1YZG-NhtDYO_FAQaAPwGFvtqn2kg/edit?usp=sharing

I'm working on the basis that column B in your sheet (last_name) contains your merged data, (employee_ID last_name, first_name).
You can do this with three formula, using regexreplace and regexextract.
employee_ID
Put this in cell A1:
=arrayformula({"employee_ID";iferror(if($B2:$B="","",value(regexextract($B2:$B,"^.*\d"))),"")})
first_name
Put this in cell C1:
=arrayformula({"first_name";if($B2:$B="","",regexreplace($B2:$B,"^.*\,\ ",""))})
last_name
I created a new column D in front of department. This goes in cell D1:
=arrayformula({"last_name";if($B2:$B="","",regexreplace(regexreplace($B2:$B,".*\d\ ",""),"\,.*",""))})
Summary
Each one has an array using {}. The first part of the array is the column heading in "", like "employee_ID". This is so you can keep your formula in row 1 incase you want to add a filter view on the dataset. Then a ; is a return, then the rest of the cells below: iferror(if($B2:$B="","",value(regexextract($B2:$B,"^.*\d"))),"").
regexextract($B2:$B,"^.*\d") looks for anything at the start ^.*, then a number \d.
value() converts the result to a number. iferror handles where a number can't be found.
On first_name, anything at the beginning ^.* followed by a comma \, then a space \ is replaced with "" (ie. removed).
On last_name, .*\d\ replaces anything .* followed by a number \d followed by a space \ with nothing "" (ie. removed), then from that result another replace removes a comma \, followed by anything .*.
Formula in one cell
If you want to do the split in one go, in say new columns F,G and H, put this in cell F1:
=arrayformula(iferror(trim(split({"employee_ID,last_name,first_name";if($B2:$B="","",value(regexextract($B2:$B,"^.*\d"))&", "&regexreplace($B2:B,"^.*\d\ ",""))},",")),""))
However, employee_ID is then formatted as text.

I believe the easiest for you is to use the following logic and single formula
=ArrayFormula(IFERROR(SPLIT(REGEXREPLACE(A2:A,"(\d+) (.+), (.+)","$1#$2#$3"),"#")))
Please write the headings manually. Including them in the formula un-necessarily complicates things. Just make sure that everything below B2, C2 and D2 is cleared.
Please let us know about info on how the formula works.
Functions used:
ArrayFormula
IFERROR
SPLIT
REGEXREPLACE

Related

SUM multiple values after a substring within all cells in a column in Google Sheets

For an open source chat analyser in Google Sheets, I need to extract all numeric values after a substring (Example), then total them.
For example, if a cell contains Example1 another text 123 Example500 text, Example1 and Example500 should be extracted out, and their numeric values summed to 501.
This is complicated further by needing to obtain the total for a column of messages.
What I've tried already:
=REGEXEXTRACT(A1, "Example(\d+)"): This only extracts the first matching value, but works!
=SUM(SPLIT(A1, "Example")): This works for messages that only include my target string, but falls apart when other strings are included. The output could possibly be filtered to results that start with a number, but this is very messy and possibly a red herring.
CONCATENATEing all my cells together, then searching for numbers. This is error-prone due to additional numbers within messages.
Another idea is to substitute each Example(\d+) to $1 the captured digit and space |. or replace anything else with empty string (regex101 demo). Knowing that $1 is unset on the right side of the alternation. Then split on space and sum up digits (any other occurring digits have been removed). If Example is a placeholder, replace with e.g. [[:alpha:]]+ for one or more alphabetic characters.
=IF(ISTEXT(A1);SUM(SPLIT(REGEXREPLACE(A1;"Example(\d+)|.";"$1 ");" "));0)
I added IF(ISTEXT(A1);...) for only processing text in the source field (to avoid errors). Else if empty or no text it's set to 0. Just remove if the field always contains text and this is unneeded.
Edit from #TheMaster: As a array formula, we can use BYROW
=BYROW(A:A; LAMBDA(row; IF(ISTEXT(row); SUM(SPLIT(
REGEXREPLACE(row;"Example(\d+)|.";"$1 ");" "));)))
try:
=LAMBDA(x, REGEXEXTRACT(A1, "(\w+)\d+")&
SUMPRODUCT(IF(IFERROR(REGEXMATCH(x, "\w+\d+")),
REGEXEXTRACT(x, "\w+(\d+)"), )))(SPLIT(A1, " "))
update 1:
=LAMBDA(x, REGEXEXTRACT(A1, "(\D+)\d+")&
SUMPRODUCT(IF(IFERROR(REGEXMATCH(x, "\D+\d+")),
REGEXEXTRACT(x, "\D+(\d+)"), )))(SPLIT(A1, " "))
update 2:
=INDEX(LAMBDA(xx, REGEXEXTRACT(xx, "(\D+)\d+")&
BYROW(LAMBDA(x, IF(IFERROR(REGEXMATCH(x, "\D+\d+")),
REGEXEXTRACT(x, "\D+(\d+)"), ))(SPLIT(xx, " ")), LAMBDA(x, SUMPRODUCT(x))))
(A1:INDEX(A:A, MAX((A:A<>"")*ROW(A:A)))))
if you start from A2 just change A1: to A2:

Conditional Formatting Cells in a Column Only Containing Letters F-Z Excluding RR

How could I highlight cells using conditional formatting if I want to only highlight cells containing letters F-Z? The formula would also need to exclude RR.
For example, if a cell in that column contains RR or any letter from A to E, I don't want to highlight it. But, if it contains any letter from F all the way to Z, I want it automatically highlighted.
Tried using regexmatch, but it's not working.
use:
=REGEXMATCH(A2, "[F-Z]")*(REGEXMATCH(A2, "RR")=FALSE)

If string contains text from list, write matched text to cell

I'm working on a large Google spreadsheet with key phrases, strings of text, in column A. I want to search column A based on a list of keywords that live in another sheet. When a keyword matches a word in a string in column A, I want to print that word in an adjacent cell to column A.
Here's a simple spreadsheet to work with that I think demonstrates what I'm trying to do.
https://docs.google.com/spreadsheets/d/1tNcroABVP0UdP4CiJldxLZgdrJF33TYT4mL1DZJfD1Q/edit?usp=sharing
I want to print that word in an adjacent cell to column A
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A2:A, LOWER(TEXTJOIN("|", 1, 'KEYWORD LIST'!A2:A)))))
Try with this (put at B2):
=arrayformula(if(ISNUMBER(find(
transpose({'KEYWORD LIST'!$A$2:$A$6}),A2:A6)),
transpose({'KEYWORD LIST'!$A$2:$A$6}),""))

How to fix an a concatenate formula conditional to a non blank cell

I'm trying to concatenate cells from columns C, D & E as long as Cell A is not blank (has a value)
Cell A is a formula based on listing weekdays:
=IF(WORKDAY(A$28-1;ROW(21:21))>A$29;"";WORKDAY(A$28-1;ROW(21:21)))
My concatenate formula is as follows:
=IF(ISBLANK(A24);"";CONCATENATE(E24;CHAR(10);D24;CHAR(10);C24))
The formula is still concatenating even if Cell A is blank
snapshot of excel
Blank
Blank (Empty Cell) means 'nothing' inside the cell, no formula no value i.e. a cell containing an Empty String ("") is not blank. As soon as you have put a formula or any value into a cell, it isn't blank anymore.
You should use this formula:
=IF(A24="";"";CONCATENATE(E24;CHAR(10);D24;CHAR(10);C24))
where A24="" includes blank cells, so you don't have to use ISBLANK, too.

How can I separate a string by underscore (_) in google spreadsheets using regex?

I need to create some columns from a cell that contains text separated by "_".
The input would be:
campaign1_attribute1_whatever_yes_123421
And the output has to be in different columns (one per field), with no "_" and excluding the final number, as it follows:
campaign1 attribute1 whatever yes
It must be done using a regex formula!
help!
Thanks in advance (and sorry for my english)
=REGEXEXTRACT("campaign1_attribute1_whatever_yes_123421","(("&REGEXREPLACE("campaign1_attribute1_whatever_yes_123421","((_)|(\d+$))",")$1(")&"))")
What this does is replace all the _ with parenthesis to create capture groups, while also excluding the digit string at the end, then surround the whole string with parenthesis.
We then use regex extract to actuall pull the pieces out, the groups automatically push them to their own cells/columns
To solve this you can use the SPLIT and REGEXREPLACE functions
Solution:
Text - A1 = "campaign1_attribute1_whatever_yes_123421"
Formula - A3 = =SPLIT(REGEXREPLACE(A1,"_+\d*$",""), "_", TRUE)
Explanation:
In cell A3 We use SPLIT(text, delimiter, [split_by_each]), the text in this case is formatted with regex =REGEXREPLACE(A1,"_+\d$","")* to remove 123421, witch will give you a column for each word delimited by ""
A1 = "campaign1_attribute1_whatever_yes_123421"
A2 = "=REGEXREPLACE(A1,"_+\d*$","")" //This gives you : *campaign1_attribute1_whatever_yes*
A3 = SPLIT(A2, "_", TRUE) //This gives you: campaign1 attribute1 whatever yes, each in a separate column.
I finally figured it out yesterday in stackoverflow (spanish): https://es.stackoverflow.com/questions/55362/c%C3%B3mo-separo-texto-por-guiones-bajos-de-una-celda-en...
It was simple enough after all...
The reason I asked to be only in regex and for google sheets was because I need to use it in Google data studio (same regex functions than spreadsheets)
To get each column just use this regex extract function:
1st column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){0}([^_]*)_')
2nd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){1}([^_]*)_')
3rd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){2}([^_]*)_')
etc...
The only thing that has to be changed in the formula to switch columns is the numer inside {}, (column number - 1).
If you do not have the final number, just don't put the last "_".
Lastly, remember to do all the calculated fields again, because (for example) it gets an error with CPC, CTR and other Adwords metrics that are calculated automatically.
Hope it helps!