Google Sheets Regexreplace for adding year to months - regex

I have a dataset like this:
2019-01-02-03-04-05-06-07-08-09-10-11-12
2020-01-02-03-04-05-06-07-08-09-10-11-12
2021-01-02-03-04-05-06-07-08-09-10-11-12
2022-01-02-03-04-05-06-07-08-09-10-11-12
2023-01-02-03-04-05-06-07-08-09-10-11-12
2024-01-02-03-04-05-06-07-08-09-10-11-12
All of the content, for each row, in in one sing column.
I want to apply a regexreplace() to have a result like this:
2019-01,2019-02,2019-03,2019-04,2019-05,2019-06,2019-07,2019-08,2019-09,2019-10,2019-11,2019-12
2020-01,2020-02,2020-03,2020-04,2020-05,2020-06,2020-07,2020-08,2020-09,2020-10,2020-11,2020-12
2021-01,2021-02,2021-03,2021-04,2021-05,2021-06,2021-07,2021-08,2021-09,2021-10,2021-11,2021-12
2022-01,2022-02,2022-03,2022-04,2022-05,2022-06,2022-07,2022-08,2022-09,2022-10,2022-11,2022-12
2023-01,2023-02,2023-03,2023-04,2023-05,2023-06,2023-07,2023-08,2023-09,2023-10,2023-11,2023-12
2024-01,2024-02,2024-03,2024-04,2024-05,2024-06,2024-07,2024-08,2024-09,2024-10,2024-11,2024-12
That is basically replacing each "-" by the first 4 numbers in the corresponding row.
As I know those are the year in the first part and the months in number in the second part, I know I can used the following formula to have the expected result:
=regexreplace(A1,"^([0-9]{4})-([0-9]{1,2})-([0-9]{1,2})-([0-9]{1,2})-([0-9]{1,2})-([0-9]{1,2})-([0-9]{1,2})-([0-9]{1,2})-([0-9]{1,2})-([0-9]{1,2})-([0-9]{1,2})-([0-9]{1,2})-([0-9]{1,2})","$1-$2,$1-$3,$1-$4,$1-$5,$1-$6,$1-$7,$1-$8,$1-$9,$1-$10,$1-$11,$1-$12,$1-$13")
PS.: my data is in A1
But how to have something more dynamic, to replace several parts of the string with one portion of the same string?

try:
=ARRAYFORMULA(IFERROR(TEXT(INDEX(SPLIT(A1:A, "-"),,1)&"-"&
TRANSPOSE(QUERY(TRANSPOSE(SPLIT(A1:A, "-")), "offset 1", 0)), "yyyy-mm")))
then:
=ARRAYFORMULA(REGEXREPLACE(TRANSPOSE(QUERY(TRANSPOSE(IFERROR(
TEXT(INDEX(SPLIT(A1:A, "-"),,1)&"-"&
TRANSPOSE(QUERY(TRANSPOSE(SPLIT(A1:A, "-")), "offset 1", 0)),
"yyyy-mm")&",")),,999^99)), ",$", ))
or without spaces:
=ARRAYFORMULA(REGEXREPLACE(TRANSPOSE(QUERY(TRANSPOSE(IFERROR(
TEXT(INDEX(SPLIT(A1:A, "-"),,1)&"-"&
TRANSPOSE(QUERY(TRANSPOSE(SPLIT(A1:A, "-")), "offset 1", 0)),
"yyyy-mm")&",")),,999^99)), " |,$", ))

Would you really want REGEXREPLACE? Alternatively:
=ARRAYFORMULA(TEXTJOIN(",",TRUE,LEFT(A1,5)&SPLIT(MID(A1,6,LEN(A1)),"-")))
Using REGEXREPLACE, maybe something like:
=LEFT(A1,7)&REGEXREPLACE(A1,"(\d{4}-\d{2})?(-)",","&LEFT(A1,5))

Related

Google Query Trim string data

I am using the following data:
cat1-001A
cat1-001B
cat1-001C
dog2-001A
etc.
the query I used is
=query(sheet1!A1:A,"select A where A is like '%cat%'",1)
Using this query pulls data as cat 1-3. Is there a way within the query to trim the text to only return the word cat? I do not want the -1, -2, -3.
so the value returned will be cat.
not within query. use:
=INDEX(REGEXREPLACE(QUERY(Sheet1!A1:A,
"select A where A contains 'cat'", 1), "-\d+$", ))
update:
=INDEX(REGEXREPLACE(QUERY(Sheet1!A1:A,
"select A where lower(A) contains 'cat'", 1), "-\d+.+", ))
=INDEX(REGEXREPLACE(QUERY(Sheet1!A1:A,
"select A where lower(A) contains 'cat'", 1), "\d+-\d+.+", ))
Try this:
=REGEXREPLACE(QUERY(Sheet1!A1:A,
"select A where A contains 'cat'", 1), "\d-\d+.*",)

Importhtml Query Extract Between String

I'm trying to find a formula that fits two tables
=QUERY(IMPORTHTML(A1,"table", 16), "Select Col4")
output is
Page 1/10Page 2/10Page 3/10Page 4/10Page 5/10Page 6/10Page 7/10Page 8/10Page
9/10Page 10/10
Another:
=QUERY(IMPORTHTML(A2,"table", 16), "Select Col4")
output is
Page 1/3Page 2/3Page 3/3
I want to extract the digits between "space" and "/" Is there a way to do this in this formula itself?
I then tried this
=transpose(SPLIT(REGEXREPLACE(A2,"Page|/10","~"),"~",0,1))
This also doesn't work since I have to manually change /10 to /3 in the second formula
Is there any way to achieve this for both data?
The sheet is here
try:
=ARRAYFORMULA(IF(ROW(A1:B)<=(1*{
REGEXEXTRACT(IMPORTXML(A1, "//option[#value='21']"), "\d+$"),
REGEXEXTRACT(IMPORTXML(B1, "//option[#value='21']"), "\d+$")}),
ROW(A1:B), ))

How to find and return a value based on more than one text value in a column in Google Sheets

I am trying to extract relevant links from a huge list of links based on the text that is present in A:A. I have succeeded in extracting relevant links based on the value in A:A using the following formula:
=ArrayFormula({"Profile";if(len("*"&A2:A&"*"),iferror(vlookup(substitute("*"&C2:C&"*"," ","-")&"-"&"*"&A2:A&"*",{regexextract(D2:D,"^.+/(.+)\..+$"),D2:D},2,)),)})
Here is the URL to the Google Sheet
https://docs.google.com/spreadsheets/d/1Y1emSB2G2h_d1AIHNAqP6pIsG6-tK4sIIVCBGGjVd4g/edit?usp=sharing
The challenge I'm facing is that the formula returns blank results when a row in A:A contains more than one name i.e first and last names. I have tried all means but I can't get it to work when the value in the first column contains more than one name.
Please assist me if you know the solution to this.
try:
={"Profile"; ARRAYFORMULA(IFNA((VLOOKUP(REGEXEXTRACT(LOWER(A2:A), TEXTJOIN("|", 1,
IFNA(REGEXEXTRACT(E2:E, LOWER(TEXTJOIN("|", 1, SUBSTITUTE(A2:A, " ", "|"))))))), {
IFNA(REGEXEXTRACT(E2:E, LOWER(TEXTJOIN("|", 1, SUBSTITUTE(A2:A, " ", "|"))))), E2:E}, 2, 0))))}

Looking for the proper way to format the text in a column and compare that with the value of a cell?

I am trying to format the information from a column that I am querying and compare that to information in a cell. I have tried to hack together various ways to do this, but I am not a proficient SQL/spreadsheet user.
In COLUMN I there is nothing.
In COLUMN K there is a match on A2.
In COLUMN N there is Information formatted like 31'-40' and 41'+.
I would prefer to use = instead of contains.
The REPLACE Function seems to work when I substitute N for a String and run it on the W3 School Website.
The REGEXREPLACE seems to work on D2. I would expect them to match, but they do not.
COUNT( QUERY( '2019'!A2:P, "select D where I='' and upper(K) contains '" & UPPER(A2) & "' and REPLACE(REPLACE(REPLACE(N, '-', ''), '''', ''), '+','') contains '"& Regexreplace(D2,"[[:punct:]]","") &"' ")
I get 0 matches.
you almost had it, but try like this:
=COUNTA(FILTER(2019!D2:D, I2:I="",
REGEXMATCH(UPPER(K2:K), UPPER(A2)),
REGEXMATCH(UPPER(N2:N), UPPER(D2))))

How to split column in Power Query by the first space?

I am using Power Query and have a column called LandArea; example data is "123.5 sq mi". It is of data type text. I want to remove the "sq mi" part so I just have the number value, 123.5. I tried the Replace function to replace "sq mi" with blank but that doesn't work because it looks at the entire text. So I tried to use Split where I split it on the space and it generated this formula below, and it did create a new column, but with null for all values. The original column still had "123.5 sq mi".
Table.SplitColumn(#"Reordered Columns1","LandArea",Splitter.SplitTextByDelimiter(" ", QuoteStyle.None),{"LandArea.1", "LandArea.2"})
When just splitting at the left-most delimiter:
Table.SplitColumn(#"Reordered Columns1","LandArea",Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.None, false),{"LandArea.1", "LandArea.2"})
I have also tried changing to QuoteStyle.Csv. Any idea how I can get this to work?
Use this to create a custom column:
= Table.AddColumn(
#"Reordered Columns1",
"NewColumn",
each Text.Start([LandArea],Text.PositionOf([LandArea]," "))
)
UPDATE: Every one appears to have "sq mi"
= Table.AddColumn(#"Changed Type", "Custom", each Text.Replace([LandArea]," sq mi",""),type number)
Hope it helps.
This is what I ended up using:
Table.AddColumn(#"Reordered Columns1", "LandArea2",
each Text.Start([LandArea], Text.PositionOf([LandArea], "sq")-1))
I avoided trying to find whitespace.