Google Sheets: Determining if a time falls within a two specified times - if-statement

I'm trying to find a way to determine if a time falls between two specific times - with two different scenarios to flag. So far, I'm coming up empty (and frustrated!)
Column B has date/times such as:
February 9, 2022 09:55AM
February 9, 2022 01:15PM
February 9, 2022 09:39PM
Flag 1: Time is between 4AM and Noon
Flag 2: Time is between 8PM and 4AM -- does this need to be broken down into two separate conditions given that it spreads over midnight?
Resulting Output in Column C:
FLAG 1
[Blank Cell - No Flag]
FLAG 2
Appreciate any ideas - thanks to the community, as always.
CTO

try:
=ARRAYFORMULA(IFERROR(IF(
(TIMEVALUE(A1:A)>=TIMEVALUE("4:00:00"))*
(TIMEVALUE(A1:A)< TIMEVALUE("12:00:00")), "Flag 1",IF(
(TIMEVALUE(A1:A)>=TIMEVALUE("20:00:00"))*
(TIMEVALUE(A1:A)<=TIMEVALUE("23:59:59"))+
(TIMEVALUE(A1:A)>=TIMEVALUE("00:00:00"))*
(TIMEVALUE(A1:A)< TIMEVALUE("04:00:00")), "Flag 2", ))))

You can use a much simpler formula that involves a bit of math:
=arrayformula(if(A1:A="","",iferror(choose(1+mod(3+int((mod(A1:A,1)-4/24)*24/8),3),"FLAG 1","","FLAG 2"))))
We are extracting the time from the date (mod), offsetting the result by -4 hours (-4/24)and int-dividing the result by 8 hours (*24/8, which is same as /(24/8)) to get the index 0,1 or 2

Related

Trying to find Top 10 products within categories through Regex

I have a ton of products, separated into different categories.
I've aggregated each products revenue, within their category and I now need to locate the top 10.
The issue is, that not every product have sold within a given timeframe, or some category doesn't even have 10 products, leaving me with fewer than 10 values.
As an example, these are some of the values:
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,3,3,5,6,20,46,47,53,78,92,94,111,115,139,161,163,208,278,291,412,636,638,729,755,829,2673
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,57,124,158,207,288,547
0,0,90,449,1590,10492
0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,7,12,14,32,32,37,62,64,64,64,94,100,103,109,113,114,114,129,133,148,152,154,160,167,177,188,205,207,207,209,214,214,224,225,238,238,244,247,254,268,268,285,288,298,301,305,327,333,347,348,359,362,368,373,402,410,432,452,462,462,472,482,495,511,512,532,566,597,599,600,609,620,636,639,701,704,707,728,747,768,769,773,805,833,899,937,1003,1049,1150,1160,1218,1230,1262,1327,1377,1396,1474,1532,1547,1565,1760,1768,1836,1962,1963,2137,2293,2423,2448,2451,2484,2529,2609,3138,3172,3195,3424,3700,3824,4310,4345,4415,4819,4943,5083,5123,5158,5334,5734,6673,7160,7913,9298,9349,10148,11047,11078,12929,18535,20756,28850,63447
63,126
How would you get as close as possible to capturing the top 10 within a category, and how would you ensure that it is only products that have sold, that are included as a possibility? And all of this through Regex.
My current setup is only finding top 3 and a very basic setup:
Step 1: ^.*\,(.*\,.*\,.*)$ finding top 3
Step 2: ^(.*)\,.*\,.*$ finding the lowest value of the top 3 products
Step 3: Checking if original revenue value is higher than, or equal to, step 2 value.
Step 4: If yes, then bestseller, otherwise just empty value.
Thanks in advance
You didn't specify a programming language so I'm going with Javascript here but this regex is quite compatible with almost any regex flavor:
(?:[1-9]\d*,){0,9}[1-9]\d*$
(?:[1-9]\d*,){0,9} - between 0 and 9 times, find numbers followed by a comma; ignore zero revenue
[1-9]\d* - guarantee a non-zero revenue one time
$ - end line anchor
https://regex101.com/r/1xBQD3/1
If your data were to have leading zeros like 0,0,00090,00449,01590,10492 for some reason then you would need this regex which is 33% more expensive:
(?:0*[1-9]\d*,){0,9}0*[1-9]\d*$

Why is the incorrect date displaying in Stata

The local system datetime is 10:34 PM 1/8/2021.
In Stata I write
local datestamp: di %tdCCYY-NN-DD daily("S_DATE","DMY")
display `datestamp'
and the output is 2012
If I write
di %tdCCYY-NN-DD daily("S_DATE","DMY")
I get 2021-01-08
Why the discrepancy? This is puzzling to me. I clearly assigned datestamp yet when I display it obviously something is wrong.
Executive summary: display saw 2021-01-08 and evaluated it as a expression in numbers. 2021 - 1 - 8 = 2012, so 2012 was what you saw.
This is a subtle question, but the answer will show Stata's perfect logic, by its own rules.
The code as posted in the question omits the crucial $ sign before S_DATE, which indicates a global macro, specifically a system macro containing the current daily date, obtained from your operating system.
It is now 9 January 2021 in my time zone, but my example will work as well as yours to show what is going on. You defined a local macro, and then you included a reference to that local macro in a call to display. The display command has a designed inclination to calculate the result of any expression it sees before it displays the result of that calculation.
Taking this more slowly: There are two quite distinct steps to the interpretation of your display command. First, as a matter of interpreting any Stata command line, all references to local and global macros are replaced with the contents of those macros (if they exist; it is not an error to refer to a macro that does not exist, but that is not an issue here). Second, display evaluates any expression it sees and then displays the result of that expression. Despite its name, display is not designed to show you directly any macro that exists, although that is what happens if the result of evaluating it leaves it the same as when it was presented. Thus if a local macro contains the string foo, that is what display will show you -- unless foo is the name of a scalar or variable, in which case the name won't be shown, just the values of that scalar or that variable (in the first observation, in the latter case).
The command to see exactly what is inside a macro, without interpretation or calculation, is macro list.
To the point, consider the different results here. In the first display command, the quotation marks " " are functional, not ornamental, and instruct display to treat its input as a string. Without the quotation marks, display is inclined to treat what it sees as numeric, and here it sees an expression, 2021 MINUS 1 MINUS 9, which evaluates to 2011. The leading zeros are ignored. In your case your date was 2021-01-08 and the result was 2012, as you reported.
. local datestamp: di %tdCCYY-NN-DD daily("$S_DATE","DMY")
. di "`datestamp'"
2021-01-09
. di `datestamp'
2011
You get the right answer with the last statement in your question. You fed display a number but instructed it to use a daily date display format to interpret that number, and you got exactly what you asked for and you expected. 22288 is, or was, 8 January 2021 on scale with origin 0 at 1 January 1960.

Regex to validate any kind of date format

I'm trying to find any kind of date format in a text as:
04.04.17
4/5/2016
6 December 1900
9 Dec 2014
1st of May 1920
2017
Dec. 21
October 10, 1930
October 10th, 2017
March 10-12 2015
Years only 1800 until 2017
That's what I have so far:
(0?[1-9]|[12][0-9]|3[01])?([\/\-\.]|st of\s|nd of\s|rd of\s|th of\s|\s)(Jan.?(uary)?|Feb.?(ruary)?|Mar.?(ch)?|Apr.?(il)?|May|Jun.?(e)?|Jul.?(y)?|Aug.?(ust)?|Sep.?(tember)?|Oct.?(ober)?|Nov.?(ember)?|Dec.?(ember)?|0?[1-9]|1[012])([\/\-\.]|\s)(((18|19)\d{2}|20[01][0-7])|[01][0-7])
The expression above can find the formats no. 1 to 5. If I try to work with the question mark quantifier after the first groups to find dates like "Dec. 21" and "2017" it does not work for the other date formats anymore.
Furthermore, the format no. 1 to 7 is more or less dd/mm/yyyy. However, format no. 8 to 10 is mm/dd/yyyy.
Any advice to solve this problem in one regex expression?
Thank you in advance!
Suggestion: instead of a monster regex, which would be nearly impossible to maintain, how about having an array of regex, one for each format you're accepting. Then loop through your array to see if the input matches any of your regexes. It would be easier to maintain, and likely would run faster, too.

How do I generate a mean by year and industry in Stata

I'm trying to generate in Stata the mean per year (e.g. 2002-2012) for each industry (by 2 digit SIC codes, so c. 50 different industries)
I found how to do it for one year with:
by sic_2digit, sort: egen test = mean(oancf_at_rsd10) if fyear == 2004
Is there a more efficient way to do this instead of repeating the command 10 times by hand and than adding the values together?
You can specify more than one variable with by:.
by sic_2digit fyear, sort: egen test = mean(oancf_at_rsd10)
Check out the help for by:, which gives the syntax and an example, and also that for collapse.

Given the life time of different elephants, find the period when maximum number of elephants lived

I came across an interview question:
"Given life times of different elephants. Find the period when maximum number of elephants were alive." For example:
Input: [5, 10], [6, 15], [2, 7]
Output: [6,7] (3 elephants)
I wonder if this problem can be related to the Longest substring problem for 'n' number of strings, such that each string represents the continuous range of a time period.
For e.g:
[5,10] <=> 5 6 7 8 9 10
If not, what can be a good solution to this problem ? I want to code it in C++.
Any help will be appreciated.
For each elephant, create two events: elephant born, elephant died. Sort the events by date. Now walk through the events and just keep a running count of how many elephants are alive; each time you reach a new maximum, record the starting date, and each time you go down from the maximum record the ending date.
This solution doesn't depend on the dates being integers.
If i were you at the interview i would create a std::array with maximum age of the elephant and then increment elements number for each elephant like:
[5,10] << increment all elements from index 5 to 10 in array.
Then i would sort and find where is the biggest number.
There is possibility to use std::map like map<int,int> ( 1st - period, 2nd - number of elephants). It will be sorted by default.
Im wondering if you know any better solution?
This is similar to a program that checks to see if parenthesis are missing. It is also related to date range overlap. This subject is beaten to death on StackOverflow and elsewhere. Here it is:
Determine Whether Two Date Ranges Overlap
I have implemented this by placing all of the start/end ranged in one vector of structs (or classes) and then sorting them. Then you can run through the vector and detect transitions of the level of elephants. (Number of elephants -- funny way of stating the problem!)
From your Input I find that all the time period are overlapping then in that case the solution is simple
we have been given range as [start end]
so the answer will be maximum of all start and minimum of all end.
Just traverse over each time period and find the maximum of all start and mimumum of all end
Note : this solution is applicable when all the time periods over lap
In Your example
Maximum of all input = 6
Minimum of all output= 7
I will just make two arrays , one for the time elephants are born and one for the time elephants die . Sort both of the arrays.
Now keep a counter (initially at zero ) . Start traversing both the arrays and keep getting the smallest element from both of the arrays. If we get an element from start array then increment the counter , else decrement the counter. We can find the max value and the time easily by this method.