How to reshape timestamp in Google Sheets? - regex

I have an imported cell in Google Sheets with the following string representing a time/date format:
2019-03-30T14:39:07-03:00
What would be the correct REGEXTRACT or DATEVALUE formula solution so that it will result in a valid Google Sheets date&time format?
The -03:00 at the end of the string can be ignored.
2019-03-30T14:39:07-03:00
should result in
yyyy-mm-dd hh:mm:ss

You can use REGEXREPLACE and use this regex,
.*?(\d{4}(?:-\d{2}){2}).(\d{2}(?::\d{2}){2}).*
And replace it with $1 $2.
Demo
I've tested and it works well in Google sheets.
Just use this,
=REGEXREPLACE(A1, ".*?(\d{4}(?:-\d{2}){2}).(\d{2}(?::\d{2}){2}).*", "$1 $2")
And replace in the cell you want to get your desired value.
Here is a screenshot showing the google sheets demo,
As you can see in the samples, this regex will find the date, no matter the date is surrounded by any optional text. In each of the case, you will have your desired date extract in the next column or any column you want.

all you need is:
=SUBSTITUTE(LEFT(A1, 19), "T", " ")
for the whole column:
=ARRAYFORMULA(SUBSTITUTE(LEFT(A1:A, 19), "T", " "))

Related

Filter with REGEXMATCH in Google sheet to filter out containing text in cells

Right now I have these data and I'm trying to filter out the data containing in cell C3, C4, etc.
I have no problem filtering the regexmatch data for 1 cell as shown below
but I'm unable to do regexmatch for more than 2 cells like so for example, it seems like I'm unable to make the pipework between cells as I'll get parse error, I tried adding in "C3|C4" too.
and
The wanted output that I wanted is as below but I could only hardcode the containing text in which isn't what I'm looking for. I'm hoping that I could have some tips to regexmatch the text in more than 1 cell such that it could regexmatch the text in cell C3(Apple) and C4(Pear) and show the wanted output.
you need to use TEXTJOIN for dynamic list in C column:
=IF(TEXTJOIN( , 1, C3:C)<>"", FILTER(A2:A, REGEXMATCH(LOWER(A2:A),
TEXTJOIN("|", 1, LOWER(C3:C)))), "no input")
You may use
=IF(C3<>"", FILTER(A2:A,REGEXMATCH(A2:A, TEXTJOIN("|", TRUE, C3:C4) )), "no input")
Or, you may go a step further and match Apple or Pear as whole words using \b word boundaries and a grouping construct around the alternatives:
=IF(C3<>"", FILTER(A2:A,REGEXMATCH(A2:A, "\b(?:" & TEXTJOIN("|", TRUE, C3:C4) & ")\b")), "no input")
And if you need to make the search case insensitive, just append (?i) at the start:
=IF(C3<>"", FILTER(A2:A,REGEXMATCH(A2:A, "(?i)\b(?:" & TEXTJOIN("|", TRUE, C3:C4) & ")\b")), "no input")
See what the TEXTJOIN documentation says:
Combines the text from multiple strings and/or arrays, with a specifiable delimiter separating the different texts.
So, when you pass TRUE as the second argument, you do not have to worry if the range contains empty cells, and the regex won't be ruined by extraneous |||.
Test:

Extracting Google Ads CIDs using Google Sheets Regex

I am trying to create a regex that will extract CIDs from a cell that contains both text and CIDs.Here is an example feild of text I would like to extract the CIDs from:
Example1:
Pokemon FY19 - Instream
261-963-9423
Pokemon - pokedex FY19 - Bumper
334-724-7943
Example2:
Instream: 856-613-9156
Bumper: 999-448-5246
The CIDs are the XXX-XXX-XXXX ids.
I have tried this =REGEXEXTRACT(A2, "\d{3}-\d{3}-\d{4}") but it only returns the first CID in the field when I need it to return all.
I expect the out but to be 261-963-9423 334-724-7943, but the output is just 261-963-9423
You can try to remove all lines except the lines that start with digits
=REGEXREPLACE(A2,"(?:^|\n)[A-Za-z_-].*",)
Or change all matching ids to (.*) and extract them later:
=REGEXEXTRACT(A2, "\Q"&REGEXREPLACE(A2, "\d{3}-\d{3}-\d{4}","\\E(.*)\\Q")&"\E")
=ARRAYFORMULA(SUBSTITUTE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(IFERROR(
REGEXEXTRACT(IFERROR(SPLIT(A1:A, CHAR(10))), "\d+-\d+-\d+")))
,,999^99))), " ", CHAR(10)))

How can I separate a string by underscore (_) in google spreadsheets using regex?

I need to create some columns from a cell that contains text separated by "_".
The input would be:
campaign1_attribute1_whatever_yes_123421
And the output has to be in different columns (one per field), with no "_" and excluding the final number, as it follows:
campaign1 attribute1 whatever yes
It must be done using a regex formula!
help!
Thanks in advance (and sorry for my english)
=REGEXEXTRACT("campaign1_attribute1_whatever_yes_123421","(("&REGEXREPLACE("campaign1_attribute1_whatever_yes_123421","((_)|(\d+$))",")$1(")&"))")
What this does is replace all the _ with parenthesis to create capture groups, while also excluding the digit string at the end, then surround the whole string with parenthesis.
We then use regex extract to actuall pull the pieces out, the groups automatically push them to their own cells/columns
To solve this you can use the SPLIT and REGEXREPLACE functions
Solution:
Text - A1 = "campaign1_attribute1_whatever_yes_123421"
Formula - A3 = =SPLIT(REGEXREPLACE(A1,"_+\d*$",""), "_", TRUE)
Explanation:
In cell A3 We use SPLIT(text, delimiter, [split_by_each]), the text in this case is formatted with regex =REGEXREPLACE(A1,"_+\d$","")* to remove 123421, witch will give you a column for each word delimited by ""
A1 = "campaign1_attribute1_whatever_yes_123421"
A2 = "=REGEXREPLACE(A1,"_+\d*$","")" //This gives you : *campaign1_attribute1_whatever_yes*
A3 = SPLIT(A2, "_", TRUE) //This gives you: campaign1 attribute1 whatever yes, each in a separate column.
I finally figured it out yesterday in stackoverflow (spanish): https://es.stackoverflow.com/questions/55362/c%C3%B3mo-separo-texto-por-guiones-bajos-de-una-celda-en...
It was simple enough after all...
The reason I asked to be only in regex and for google sheets was because I need to use it in Google data studio (same regex functions than spreadsheets)
To get each column just use this regex extract function:
1st column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){0}([^_]*)_')
2nd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){1}([^_]*)_')
3rd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){2}([^_]*)_')
etc...
The only thing that has to be changed in the formula to switch columns is the numer inside {}, (column number - 1).
If you do not have the final number, just don't put the last "_".
Lastly, remember to do all the calculated fields again, because (for example) it gets an error with CPC, CTR and other Adwords metrics that are calculated automatically.
Hope it helps!

Google Sheets Trim after either of characters

I have a column of cells in Google Sheets with text in the following formats:
PLAYBILL59; Code Description Here
BROADWAYBOX59: Code Description Here
TICKETCODE: Code Description Here
I want to create a formula that deletes everything after and including either a colon or semi-colon, that would leave:
PLAYBILL59
BROADWAYBOX59
TICKETCODE
I've been trying for hours with no luck.
Any suggestions very appreciated.
Let's say that your colum is A, then you can use REGEXEXTRACT in your formula like
=REGEXEXTRACT(A1; "[A-Z0-9-a-z]+")
Assuming your text string is in A1, try:
=SUBSTITUTE(SUBSTITUTE(A1, "; Code Description Here",""), ": Code Description Here", "")

Use RegEx to find dates and increment year by a value

I have a large number of files that contain dates. I would like to use a Regular Expression to find the dates and if possible increment the year of the date by 10.
The files can have multiple date formats ..
04/22/78
06-OCT-14
How would one write a regular expression that could find, increment, and replace the dates, or even just the year of the dates?
I plan to use a text editor like Text Pad, UltraEdit, or Notepad++ to search the files
Assuming the pattern of date is date.month.year. . in date can be any field separator.
You can use simple perl program to do this:
perl -ne 's/(\d+)$/($1+10)/e && print' filename
This will add 10 to the year, and print the date.
Output for this is:
04/22/88
06-OCT-24
Just wrote this python snippet to get it done.
import re
def add_ten_years(date):
reg = "((\d{2})(.)(\w{2,4})(.)(\d{2}))"
mat = re.search(reg, date)
if mat:
mat = mat.groups()
return ''.join(mat[1:5])+str(int(mat[5])+10)
print add_ten_years("04/22/78")
print add_ten_years("06-OCT-14")
You can configure the regex pattern to generalize it even more. Or can be easily translated to other languages. Hope it helped!