Counting and adding multiple variables from single cell in sheets - regex

I have a sheets document that has cells that users input data into. They know to input the data in a certain format; a 'number' and a 'letter', followed by a space, a SKU number, and then a comma.
I'd like to have a formula that counts the amount of each 'letters' and then adds the 'numbers' for each letter.
There are only five 'letters' users can choose from; M, E, T, W, B.
The data they input isn't restricted to a set order, and there isn't a limit of how much they can input, as long as it follows the aforementioned syntax.
I attached a screenshot of an example of how this should look.
The yellow cell is the user inputted data, and the green cells is data created by formula.
Or here's a link to a live version: link
I tried doing it with COUNTIF but that didn't work. I'm guessing it would be done with an array, but I don't know where to start. If I can see an example of something similar, I could probably do the rest.

yes:
=INDEX(REGEXREPLACE(SPLIT(REGEXREPLACE(FLATTEN(QUERY(TRANSPOSE(QUERY(TRANSPOSE(SORT(TRANSPOSE(QUERY(SPLIT(
FLATTEN(REGEXREPLACE(TRIM(SPLIT(A2:A9, ",")), "\b(\d+(?:\.\d+)?)(.+?)\b(.*)", ROW(A2:A9)&"×$1$2×$1×$2")), "×"),
"select count(Col2),sum(Col3) where Col2 is not null group by Col1 pivot Col4 label count(Col2)''")))),
"offset 1", 0)*1&TRIM(REGEXREPLACE(TRANSPOSE(SORT(FLATTEN(QUERY(SPLIT(
FLATTEN(REGEXREPLACE(TRIM(SPLIT(A2:A9, ",")), "\b(\d+(?:\.\d+)?)(.+?)\b(.*)", ROW(A2:A9)&"×$1$2×$1×$2")), "×"),
"select count(Col2),sum(Col3) where Col2 is not null group by Col1 pivot Col4 limit 0 label count(Col2)''")))),
".*sum", ))),,9^9)), "([^ ]+ [^ ]+) ", "$1×"), "×"), "(\d+(?:\.\d+)?)$", "($1)"))

I've added a new sheet ("Erik Help") with the following solution:
=ArrayFormula(FILTER( SPLIT("B E M T W", " ") & " (" & IFERROR(VLOOKUP(ROW(A1:A) & SPLIT("B E M T W", " "), QUERY(FLATTEN(SPLIT(QUERY(FLATTEN(IFERROR(REPT(ROW(A1:A) & REGEXEXTRACT(SPLIT(REGEXREPLACE(A1:A&",", "\d+,", ""), " ", 0, 1), "\D") & "~", 1*REGEXEXTRACT(SPLIT(REGEXREPLACE(A1:A&",", "\d+,", ""), " ", 0, 1), "\d+")))), "WHERE Col1 <>'' "), "~", 1, 1)), "Select Col1, COUNT(Col1) GROUP BY Col1"), 2, FALSE), 0)&")", A1:A<>""))

Related

How to search a row for cell value(s) then output the header?

I'm new here and trying to automate. I have a work roster and would like for it to output who are on duty on a daily basis. Ideally. it would check today's date then search the corresponding table row for the relevant personnel each day.
Screenshot
Spreadsheet: here
Desired output:
On today's date, AM shifts are Person 1 (duty) Person 2, PM shifts are Person 3
Current formula:
="On "&textjoin("",TRUE,B7)&": AM shifts are "
&OFFSET(INDEX(A3:E3,MATCH("AM Duty",A3:E3,0)),1-row(vlookup(today(),A1:E5,2,0)),0)&" (duty) "
&OFFSET(INDEX(A3:E3,MATCH("AM Reg",A3:E3,0)),1-row(vlookup(today(),A1:E5,2,0)),0)
&", PM shifts are "
&OFFSET(INDEX(A3:E3,MATCH("PM Reg",A3:E3,0)),1-row(vlookup(today(),A1:E5,2,0)),0)
Some problems with formula:
Row needs to adjust according to today's date as it goes down the list, currently it's hardcoded A3:E3
Unsure how to capture repeated AM Reg in each row
Not sure if I'm overcomplicating things here, and open to better solutions. Thank you in advance!
try:
=INDEX(TEXT(TODAY(), "On dd mmmm yy: A\M \s\hift\s ar\e ")&
TEXTJOIN(", ", 1, IF(REGEXMATCH(VLOOKUP(TODAY(), A2:E5, {2,3,4,5}, ), "AM Duty"), B1:E1&" (duty), ", ))&
TEXTJOIN(", ", 1, IF(REGEXMATCH(VLOOKUP(TODAY(), A2:E5, {2,3,4,5}, ), "AM Reg"), B1:E1, ))&" and PM shifts are "&
TEXTJOIN(", ", 1, IF(REGEXMATCH(VLOOKUP(TODAY(), A2:E5, {2,3,4,5}, ), "PM Duty"), B1:E1&" (duty), ", ))&
TEXTJOIN(", ", 1, IF(REGEXMATCH(VLOOKUP(TODAY(), A2:E5, {2,3,4,5}, ), "PM Reg"), B1:E1, )))

Athena- extract substring from string - comma delimited

I want to create Athena view from Athena table.
In table, the column value is "lastname, firstname" so I want to extract these values as 'lastname' and 'firstname' and then need to store it into seperate columns in a view. example- firstname need to be stored into new column- 'first_name' and lastname need to be store into new column - 'last_name'
whats the SQL function which I can use here? I tried split function but then it's giving me an array.
Assuming the input strings have a fixed and known number of elements, you can do something like this:
WITH data(value) AS (
VALUES ('Aaa,Bbb')
)
SELECT elements[1], elements[2]
FROM (
SELECT split(value, ',') AS elements
FROM data
)
=>
_col0 | _col1
-------+-------
Aaa | Bbb
(1 row)
create or replace view "names" as
select
SPLIT_PART("column_name",',', 1) as first_name
, SPLIT_PART("column_name", ',', 2) as last_name
from myTable
You can use UNNEST on the split result:
WITH dataset AS (
SELECT * FROM (VALUES
('aaa,bbb'),
('aaa1,bbb1')
) AS t (str))
SELECT str_col
FROM dataset
CROSS JOIN UNNEST(split(str, ',')) as tmp(str_col)
Output:
str_col
aaa
bbb
aaa1
bbb1
UPD
If you have at least one comma guaranteed than it is as easy as:
WITH dataset AS (
SELECT * FROM (VALUES
('aaa,bbb'),
('aaa1,bbb1')
) AS t (str))
SELECT splt[1] last_name, splt[2] first_name
FROM
(SELECT split(str, ',') as splt
FROM dataset)
Output:
last_name
first_name
aaa
bbb
aaa1
bbb1
In case you can have varing number of commas but limitied to some number you can use TRY:
WITH dataset AS (
SELECT * FROM (VALUES
('aaa,bbb'),
('aaa1,bbb1,ddd1')
) AS t (str))
SELECT splt[1], splt[2], TRY(splt[3])
FROM
(SELECT split(str, ',') as splt
FROM dataset)
Output:
_col0
_col1
_col2
aaa
bbb
aaa1
bbb1
ddd1

generate a one-column table that contains hundreds of different categories using M or DAX

I need to split my products into a total of 120 predefined price clusters/buckets. These clusters can overlap and look somewhat like that:
As I dont want to write down all of these strings manually: Is there a convenient way to do this in M or DAX directly using a bit of code?
Thanks in advance!
Dave
With m-Query you can create a function. Open the query editor. Richt click and create empty query. Create function (ignore warning) and call it : RowGenerator.
Open advanced editor and past the following code:
let
Bron = (base as number, start as number, end as number) => let
Bron = Table.FromList(List.Generate(() => start, each _ <= end, each _ + 1), Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Aangepaste kolom toegevoegd" = Table.AddColumn(Bron, "Aangepast", each Number.ToText(base) & " - " & Number.ToText([Column1]))
in
#"Aangepaste kolom toegevoegd"
in
Bron
This function creates a table where base is your first number and start, end the range.
Add another empty query, open the advanged editor and paste:
let
Bron = List.Generate(() => 0, each _ < 5, each _ + 1),
#"Geconverteerd naar tabel" = Table.FromList(Bron, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Aangeroepen aangepaste functie" = Table.AddColumn(#"Geconverteerd naar tabel", "test", each RowGenerator(_[Column1], _[Column1] + 1, 5)),
#"test uitgevouwen" = Table.ExpandTableColumn(#"Aangeroepen aangepaste functie", "test", {"Column1", "Aangepast"}, {"Column1.1", "Price Cluster"}),
#"Kolommen verwijderd" = Table.RemoveColumns(#"test uitgevouwen",{"Column1", "Column1.1"})
in
#"Kolommen verwijderd"
This creates first a list of 5 rows, then it calls the previous made function for each row and the last step is to expend the rows and remove the not needed columns.
Enjoy:
You can create this bucket by DAX (New Table):
Table = SELECTCOLUMNS(
GENERATE(SELECTCOLUMNS(GENERATESERIES(0,10,1),"FirstPart",[Value]), SELECTCOLUMNS(GENERATESERIES(0,10,1),"SecondPart",[Value]))
,"Bucket", [FirstPart] & " - " & [SecondPart]
)
Table = SELECTCOLUMNS(
GENERATE(SELECTCOLUMNS(GENERATESERIES(0,9,1),"FirstPart",[Value]), TOPN([FirstPart], SELECTCOLUMNS(GENERATESERIES(1,9,1),"SecondPart",[Value]), [SecondPart],ASC))
,"Bucket", [FirstPart] & " - " & [SecondPart]
)

Concatenate cells between dynamic start and end row

I'm trying to concatenate a number of cells into one if they are between two cells with a certain string.
For example: In the Element column there are modalOpen and modalClose and in between those are modalFields. Between modalOpen and modalClose I need to add the Name of each row with Element modalField into the Output column for the modalOpen row.
The number of modalFields can vary from 2 - 20.
delete everything in C column and paste this in C2 cell:
=ARRAYFORMULA(TRIM(SUBSTITUTE(IFERROR(VLOOKUP(B2:B,
SPLIT(TRANSPOSE(SPLIT(QUERY(IF(B2:B<>"",
IF(A2:A="modalOpen", "♥"&B2:B&"♦"&B2:B&" with",
IF(A3:A="modalClose", "& <"&B2:B&">", "<"&B2:B&">,")), )
,,999^99), "♥")), "♦"), 2, 0)), ">, & ", "> & ")))
=ARRAYFORMULA(REGEXREPLACE(TRIM(TRANSPOSE(SPLIT(QUERY(FILTER(IF(A2:A="modalClose","",IF(A2:A="modalOpen","♠"&B2:B&" with ","<"&B2:B&">,")),A2:A<>""),,2^99),"♠"))),"(, )(\<[^<>]\>),$"," and $2"))
The result:
Test1 with <1>, <2> and <3>
Test2 with <1>, <2>, <3> and <4>

How to I extract time from sentence and transform into number?

Can I use this method to extract the timelines from this string:
1 month in role 1 year 11 months in company
and transform them into a number of months?
E.g
1 month = 1 1 year 11 months = 23
Any help greatly appreciated!
Have tried the =split formula but all sentences are slightly different
You could try using regular expressions.
=IFERROR(INDEX(SPLIT(REGEXEXTRACT(A1,"(\d+ years? (\d+ months? )?in company)"), " "), 0, 1), 0) * 12 + IFERROR(INDEX(SPLIT(REGEXEXTRACT(A1,"(\d+ months? in company)"), " "), 0, 1), 0)
(A1 in the formula above represents the cell containing the timeline string. This would need to be adjusted as needed)
Basically, this formula looks for "x year(s) (x month(s) )in company" If such a string is found, it will split it up by spaces and take the first portion (the x in x years). If no such pattern is found (for example, when the string is "1 month in role 1 month in company") then the year part is ignored.
For robustness, it is necessary to check if the the year is followed by an optional month component, then "in company." Otherwise, "1 year 1 month in role 2 years 11 months in company" would return 1 for the year, which is not what we want.
The second part of the formula looks for "x month(s) in company" If not found, then the month portion is ignored (e.g. "1 year in role 1 year in company")
formula in B2 cell:
=ARRAYFORMULA(IFERROR(IF(REGEXMATCH(A2:A, "role"),
IF(REGEXEXTRACT(A2:A, "(year|mont)")="mont", REGEXEXTRACT(A2:A, "\d+"),
IF(REGEXEXTRACT(A2:A, "(year|mont)")="year", REGEXEXTRACT(A2:A, "\d+")*12+
IFERROR(REGEXEXTRACT(A2:A, "(\d+) mont")), )), )))
formula in C2 cell:
=ARRAYFORMULA(IFERROR(
IF(REGEXEXTRACT(REGEXEXTRACT(A2:A, "role (.+)"), "(year|mont)")="mont",
REGEXEXTRACT(REGEXEXTRACT(A2:A, "role (.+)"), "\d+"),
IF(REGEXEXTRACT(REGEXEXTRACT(A2:A, "role (.+)"), "(year|mont)")="year",
REGEXEXTRACT(REGEXEXTRACT(A2:A, "role (.+)"), "\d+")*12+
IFERROR(REGEXEXTRACT(REGEXEXTRACT(A2:A, "role (.+)"), "(\d+) mont")), )),
IF(REGEXMATCH(A2:A, "company"),
IF(REGEXEXTRACT(A2:A, "(year|mont)")="mont", REGEXEXTRACT(A2:A, "\d+"),
IF(REGEXEXTRACT(A2:A, "(year|mont)")="year", REGEXEXTRACT(A2:A, "\d+")*12+
IFERROR(REGEXEXTRACT(A2:A, "(\d+) mont")), )), )))