Parse Days in Status field from Jira Cloud for Google Sheets - regex

I am using Jira Cloud for Sheets Adds on in order to get Days in Status field from Jira, it seems to have the following syntax, from this post
<STATUS_ID>_*:*_<NUMBER_OF_TIMES_ISSUE_WAS_IN_THIS_STATUS>_*:*_<SECONDS>_*|
Here is an example:
10060_*:*_1_*:*_1121033406_*|*_3_*:*_1_*:*_7409_*|*_10000_*:*_1_*:*_270003163_*|*_10088_*:*_1_*:*_2595005_*|*_10087_*:*_1_*:*_1126144_*|*_10001_*:*_1_*:*_0
I am trying to extract for example how many times the issue was In QA status and the duration on a given status. I am dealing with parsing this pattern for obtaining this information and return it using an ARRAYFORMULA. Days in Status field information will be provided only when the issue was completed (is in Done status), otherwise, no information will be provided. if the issue is in Done status, but it didn't transition for a given status, this information will not be provided in the Days in Status string.
I am trying to use REGEXEXTRACT function to match a pattern for example:
=REGEXEXTRACT(C2, "(10060)_\*:\*_\d+_\*:\*_\d+_\*|")
and it returns an empty value, where I expect 10068. I brought my attention that when I use REGEXMATCH function it returns TRUE:
=REGEXMATCH(C2, "(10060)_\*:\*_\d+_\*:\*_\d+_\*|")
so the syntax is not clear. Google refers as a reference for Regular Expression to the following documentation. It seems to be an issue with the vertical bar |, per this documentation it is a special character that should be represented like this \v, but this doesn't work. The REGEXMATCH returns FALSE. I am trying to use some online RegEx tester, that implements Google Sheets syntax (RE2), I found ReGo, that I don't know if it is a valid one.
I was trying to use SPLITfunction like this:
=query(SPLIT(C2, "_*:*_"), "SELECT Col1")
but it seems to be a more complicated approach for getting all the values I need from Days in Status field string, but it separates well all the values from the previous pattern. In this case, I am getting the first Status ID. The number of columns returned by SPLITwill varies because it depends on the number of statuses the issues transitioned in order to get to DONE status.
It seems to be a complex task given all the issues I have encounter, but maybe some of you were dealing with this before and may advise about some ideas. It requires properly parsing the information and then extracting the information on specific columns using ARRAYFORMULA function when it applies for a given status from Status column.
Here is a google spreadsheet sample with the input information. I would like to populate the information for the following columns for Times In QA (C column) and Duration in QA (D column, the information is provided in seconds I would need in days but this is a minor task) for In QA status, then the same would apply for the rest of the other statuses. I added the tab Settings for mapping the Status ID to my Status, I would need to use a lookup function for matching the Status column in the Jira Issues tab. I would like to have a solution, without adding helper columns maybe it will require some script.
https://docs.google.com/spreadsheets/d/1ys6oiel1aJkQR9nfxWJsmEyd7XiNkVB-omcNL0ohckY/edit?usp=sharing

try:
=INDEX(IFERROR(1/(1/QUERY(1*IFNA(REGEXEXTRACT(C2:C, "10087.{5}(\d+).{5}(\d+)")),
"select Col1,Col2/86400 label Col2/86400''"))))
...so after we do REGEXEXTRACT some rows (which cannot be extracted from) will output as #N/A error so we wrap it into IFNA to remove those errors. then we multiply it by *1 to convert everything into numeric numbers (regex works & outputs always only plain text format). then we use QUERY to convert 2nd column into proper seconds in one go. at this point every row has some value so to get rid of zeros for rows we don't need (like row 2,3,5,8,9,etc) and keep the output numeric, we use IFERROR(1/(1/ wrapping. and finally, we use INDEX or ARRAYFORMULA to process our array.

Related

Importrange + Query + Matches + Regexp

I am trying to filter out data from a different sheet with a specific account number. However the code below doesn't give out any results
=query(importrange(Setup!B1,"Sheet1!A2:F"),"Select * where Col4 matches '\d\d-\d\d\d\d-[14-7]\d\d\d'",1)
This is supposed to filter out all accounts where the 1st digit in the 3rd group of numbers is either 1,4,5,6 or 7. The width of the account numbers are all the same following the format xx-xxxx-xxxx.
Try using this as your match criteria:
\d{2}-\d{4}-[14-7]{3}\d
Also, while I can't see your data, make sure you actually have a header in the first row of your IMPORTRANGE results (which you've requested with the 1 at the end of the QUERY). If you don't actually have headers, the 1 will leave you with one more result than you want; if that is the case, just remove the ,1 from the end of the QUERY.
If this doesn't produce the results you want, it may be due to mixed data types in your raw data that are being filtered out by the QUERY. In that case, you can try using FILTER and REGEXMATCH instead:
=ArrayFormula(FILTER(IMPORTRANGE(Setup!B1,"Sheet1!A2:F"),REGEXMATCH(IMPORTRANGE(Setup!B1,"Sheet1!D2:D"),"\d{2}-\d{4}-[14-7]{3}\d")))
It is always hard to write complex formulas sight unseen. If none of these solutions (which work in my local sheet) produce the results you expect, I encourage you to share a link to both of your sheets. The raw data sheet being called by IMPORTRANGE can be "View Only"; but you'll want to set the Share permission on the second sheet (the one with the IMPORTRANGE formula itself) to "Anyone with the link..." and "Editor," so that those here can access it to test.

Searching within the result of a vlookup using a range of values and parsing text

MY GOAL:
parse a MM/DD date from the result of a vlookup so that it can be used in a project plan
BACKGROUND:
The vlookup result contains multiple values separated by a "•" (I don't need all of them)
The value I'm looking to parse is not always in the same location in the vlookup result (otherwise I could use the RIGHT formula)
There is a finite number of the values I'm looking to retrieve (and I know them already)
The value that I'm looking to retrieve contains some text with a date range; I only want the first four values in the date range (MM/DD)
I'd like to achieve all this with a single formula with the result in a single cell
CURRENT FORMULA
The formula that I've been working on that is not working is:
=ARRAYFORMULA(if(iserror(search(Iterations!D2:D7,(VLOOKUP(A2,'Results {2596503}'!$C$2:$L$183,3)))),,))
I've set up a sheet called "Erik Help" with the following formulas in B2 ad C2:
=ArrayFormula(IF(A2:A="","",MID(VLOOKUP(A2:A,data!A2:B,2,FALSE),FIND(REGEXEXTRACT(VLOOKUP(A2:A,data!A2:B,2,FALSE),"[0-9]-[0-9]"),VLOOKUP(A2:A,data!A2:B,2,FALSE))-4,5)))
and
=ArrayFormula(IF(A2:A="","",MID(VLOOKUP(A2:A,data!A2:B,2,FALSE),FIND(REGEXEXTRACT(VLOOKUP(A2:A,data!A2:B,2,FALSE),"[0-9]-[0-9]"),VLOOKUP(A2:A,data!A2:B,2,FALSE))+2,5)))
respectively.
They may be longer than actually needed, but you did not share realistic results in Column B or list which symbols may appear in Column B other than in the date; so I tried to account for either a hyphen or a forward slash possibly appearing in Column B in places other than within the date span.
Your analytics sheet also shows a formula that is sorting the results from data!A:A. So even though in your example the original data order happens to be the same as in analytics!A:A, that is not a given (again, based on your formula). Therefore, the VLOOKUP is also necessary.
You did not indicate whether you need to further use these returned date-snippets in calculations, or whether you just need to view them. So the results generated in "Erik Help" are text.
If you want usable numbers/dates, you add further issues that would need to be controlled for in the formula, because you'll only be extracting month and day, not year. That's fine right now. But what about when the date range to be extracted is "12/28-01/13"? If you simply make these values/dates, they will both be assigned to the current year. So the end date here will wind up being earlier than the start date.
Because of this, I've added a second sheet, "Erik Help 2," which contains extended formulas to account for these cases while still returning the date format you want as actual dates which can be used in calculations.
EDIT
(following your note on the sheet: "I would like to remove col b altogether and nest in the formulas in col c and d")
You can adjust the range B2:B by replacing it with your already existing formula in B2.
The new adjusted formula will become
=ArrayFormula(IFNA(SPLIT(REGEXEXTRACT(VLOOKUP(ARRAYFORMULA(sort(unique(data!A2:A))),data!$A$1:$C,2),"\d+\/\d+-\d+\/\d+"),"-")))
Original answer
You can use the following formula:
=ArrayFormula(IFNA(SPLIT(REGEXEXTRACT(B2:B,"\d{2}\/\d{2}-\d{2}\/\d{2}"),"-")))
Make sure you format the results as Date.
(Please adjust ranges to your needs)
Functions used:
ArrayFormula
IFNA
SPLIT
REGEXEXTRACT
try:
=ARRAYFORMULA(IF(A2:A="",,IFNA(TEXT(SPLIT(REGEXEXTRACT(
VLOOKUP(data!A2:A, data!A:C, 2), "\d+/\d+-\d+/\d+"), "-"), "mm/dd"))))

Google Sheets Data Validation not rejecting invalid input

I have a sheet where I control provided services with columns filled with execution and conclusion dates.
These columns have data validation for invalid dates and also, for the user not to input weekend days or holidays (which is listed on another page of the same spreadsheet). So it has to be custom formula validation.
Validation formula:
=AND(ISDATE(K2)=TRUE;K2>=J2;WEEKDAY(K2)<>1;WEEKDAY(K2)<>7;COUNTIF(Holidays!$A:$A;"="&K2)=0)
also tried
=AND(ISDATE(K2)=TRUE;K2>=J2;WEEKDAY(K2)<>1;WEEKDAY(K2)<>7;ISNA(MATCH(K2;Holidays!$A:$A;0))=TRUE)
and also tried using INDIRECT("Holidays!$A:$A") on both options
***Column K has the data validation and Conclusion date is the input. Column J has execution dates. And row 1 has titles.
The problem:
data validation input rejection seems to work fine for the first couple of hours, sometimes a full day, but after this random period of time, it stops working. Actually it does work, but with the red flag, even though "Reject input" option is still checked.
My guess is that the problem resides on the reference being in another sheet, but I don't see any other way to do this, as including the holiday list to the main sheet would pollute it and hiding columns wouldn't be as practical since users update the list constantly.
Is there a way to make it work?
P.S. Conditional Formatting used to return error even when using INDIRECT for external reference but now Google seems to have fixed it.
Hope someone can help me.
custom formula for data validation:
=(ISDATE(A1))*
(WEEKDAY(A1, 2)<>6)*
(WEEKDAY(A1, 2)<>7)*
(NOT(REGEXMATCH(TO_TEXT(A1), TEXTJOIN("|", 1, INDIRECT("Sheet2!H:H")))))
custom formula for conditional formatting (valid green):
=(ISDATE(A1))*
(WEEKDAY(A1, 2)<>6)*
(WEEKDAY(A1, 2)<>7)*
(NOT(REGEXMATCH(TO_TEXT(A1), TEXTJOIN("|", 1, INDIRECT("Sheet2!H:H")))))
custom formula for conditional formatting (invalid red):
=((ISDATE(A1))*
(WEEKDAY(A1, 2)<>6)*
(WEEKDAY(A1, 2)<>7)*
(NOT(REGEXMATCH(TO_TEXT(A1), TEXTJOIN("|", 1, INDIRECT("Sheet2!H:H")))))=0)*
(A1<>"")
spreadsheet demo
I find the issue happens when the user is copying and pasting into the cell, instead of typing in, as part of a larger section of information. This breaks up the data validation into "pieces" because the copy and paste doesn't come with the data validation. I'm not sure if this is the case with you, but it may mean training users to copy and paste values only or only hard keying the information.

How can I resolve INDEX MATCH errors caused by discrepancies in the spelling of names across multiple data sources?

I've set up a Google Sheets workbook that synthesizes data from a few different sources via manual input, IMPORTHTML and IMPORTRANGE. Once the data is populated, I'm using INDEX MATCH to filter and compare the information and to RANK each data set.
Since I have multiple data inputs, I'm running into a persistent issue of names not being written exactly the same between sources, even though they're the same person. First names are the primary culprit (i.e. Mary Lou vs Marylou vs Mary-Lou vs Mary Louise) but some last names with special symbols (umlauts, accents, tildes) are also causing errors. When Sheets can't recognize a match, the INDEX MATCH and RANK functions both break down.
I'm wondering how to better unify the data automatically so my Sheet understands that each occurrence is actually the same person (or "value").
Since you can't edit the results of an IMPORTHTML directly, I've set up "helper columns" and used functions like TRIM and SPLIT to try and fix instances as I go, but it seems like there must be a simpler path.
It feels like IFS could work but I can't figure how to integrate it. Also thinking this may require a script, which I'm just beginning to study.
Here's a simplified example of what I'm trying to achieve and the corresponding errors: Sample Spreadsheet
The first tab is attempting to pull and RANK data from tabs 2 and 3. Sample formulas from the Summary tab, row 3 (Amelia Rose):
Cell B3: =INDEX('Q1 Sales'!B:B, MATCH(A3,'Q1 Sales'!A:A,0))
Cell C3: =RANK(B3,$B$2:B,1)
Cell D3: =INDEX('Q2 Sales'!B:B, MATCH(A3,'Q2 Sales'!A:A,0))
Cell E3: =RANK(D3,$D$2:D,1)
I'd be grateful for any insight on how to best index 'Q2Sales'!B3 as the correct value for 'Summary'!D3. Thanks in advance - the thoughtful answers on Stack Overflow have gotten me this far!
to counter every possible scenario do it like this:
=ARRAYFORMULA(IFERROR(VLOOKUP(LOWER(REGEXREPLACE(A2:A, "-|\s", )),
{REGEXEXTRACT(LOWER(REGEXREPLACE('Q2 Sales'!A2:A, "-|\s", )),
TEXTJOIN("|", 1, LOWER(REGEXREPLACE(A2:A, "-|\s", )))), 'Q2 Sales'!B2:B}, 2, 0)))

How to write IF AND regular expression match

I'm trying to write a simple formula for Google Sheets. The logic is as follows:
if(it is a specific date & it is today){fill cell color with this color}
I know this needs to be done in the conditional formatting section but I am unable to get it right.
I've tried:
if(TODAY(),RegExMatch("Tuesday May 2, 2017"))
RegExMatch("Tuesday May 2, 2017") AND IF(TODAY())
IF(TODAY() AND RegExMatch("Tuesday May 2, 2017"))
but none of those work and return errors such as 'parse & invalid' when attempting to write it in the cell box.
REGEXMATCH can be used in Conditional formatting (eg) but it seems way overkill here. Please select the relevant range (I am assuming ColumnA - populated with 'true' dates, not text) and clear any existing CF rules from it. Format, Conditional formatting..., Format cells if... Custom formula is and
=and(A1=today(),A1=42858)
with fill colour of choice and Done.
Here 42858 happens to be the index number for today, but would be replaced with that for your specific date.
Have you tried just getting the value of TODAY()? It returns the date in mm/dd/yyyy format. Your RegExMatch will always fail.
You don't need to use any formula. Use this guide to see how you can use conditional formatting rules on individual or multiple cells. The correct way to do what you want to do is to select a cell, click on Format -> Conditional formatting... -> Format cells if... -> Date is -> today
If you're referring to the cell box in the 'Custom formula' section, you simply need to write =TODAY().
Not sure I'm following exactly, but if your date is in col A:
=AND(DATEVALUE(a1)=datevalue("5/2/2017"),DATEVALUE(a1)=DATEVALUE(today()))
this seems to work. The date has to be parseable to date by sheets for DATEVALUE to work though.
Image has the column in an IF as well setting 1 or 0 just an example of the logic too.