Google Scripts replace function to prefix a regexp in CAPS - regex

In GAS, using the .replace(), Is it possible to match any term within a long text string that is at least 5 consecutive ALL CAPS characters (may have 1 space in there) and prefix it with a string, such as ][? There may be multiple matches within the text string, so I want to insert markers that begin and end a phrase beginning with an ALLCAPS category.
An example of a similar type of text would be this (structurally similar, but with other sensitive data):
"VACATION: Approved by Supervisor - Frequency 1-3 times per year; duration not to exceed 5 days. SICK LEAVE: Approved by Supervisor - Frequency up to 8 per year, no more than 5 days consecutively without MD excuse. FMLA FEDERAL: Approved by HR - Frequency as needed, must be approved at least 14 days in advance, or within 24 hours of employee's identified need."
I have learned, through Serge, how to replace globally, which was a big help, but the more I research regexp's, the more confusing it gets. I tried substituting the all caps regexp for a specific term, but failed. I think that I could go through and extract all of the all caps regexp's and use them in a replace with multiple values, but it seems that would be a very long way around.
Is it possible, in a couple of lines to make the above text look like this:
"][VACATION: Approved by Supervisor - Frequency 1-3 times per year; duration not to exceed 5 days. ][SICK LEAVE: Approved by Supervisor - Frequency up to 8 per year, no more than 5 days consecutively without MD excuse. ][FMLA FEDERAL: Approved by HR - Frequency as needed, must be approved at least 14 days in advance, or within 24 hours of employee's identified need."
My intention is to then split on the ] Which would mean that new cells would start with the all caps term, and end with ]. I have the code to convert the text to an array (there are lots of entries), then use .replace() to find and replace within the array, and to set the values back into the sheet, but I just don't know if there is a way to either prefix (my research says lookback isn't possible in GAS), or to pick up the allcaps value, add the string "][", and put it back.
If this is asking too much, or feels like I haven't included any code, here is the first part that Serge already helped with: Looking for a Google script that will perform CTRL+F replace for a string
Here is the code, as I used it, combining Serge's previous help and the new recommendation. I had to fix some case issues with a term before running the all caps because some people can't follow a template, but it works.
function insertSplitMarkers(){
var sh = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Freq Iso');
var data = sh.getRange(2,1,sh.getLastRow(),sh.getLastColumn()).getValues();// get all data
var regexp = /(([A-Z]\s*){5,})/g;
for(var n=0;n<data.length;n++){
for(var m=0;m<data[0].length;m++){
if(typeof(data[n][m])=='string'){ // if it is a string
data[n][m]=data[n][m].replace(/Interventions/g,'INTERVENTIONS');// use the regex replace with /g parameter meaning "globally"
data[n][m]=data[n][m].replace(regexp, "][$1");
}
}
}
Logger.log(data);
sh.getRange(2,1,data.length,data[0].length).setValues(data);
}

It looks like this will do what you want although as is, it will also pick out aoAOEOUE:
var yourString = "VACATION: Approved by Supervisor - Frequency 1-3 times per year; duration not to exceed 5 days. SICK LEAVE: Approved by Supervisor - Frequency up to 8 per year, no more than 5 days consecutively without MD excuse. FMLA FEDERAL: Approved by HR - Frequency as needed, must be approved at least 14 days in advance, or within 24 hours of employee's identified need.";
var regexp = /(([A-Z]\s*){5,})/g;
var newString = yourString.replace(regexp, "][$1");
Logger.log(newString);

#user3169581 I've adjusted your regex slightly to try to eliminate matching whitespace around the desired phrase and ensure you get the whole desired phrase, it will require a little adjustment in the replace:
var regexp = /\b([A-Z\s]{5,})(:)/g
...
data[n][m] = data[n][m].replace(regexp,"][$2$3")
Link to regex101 with working matching here: http://regex101.com/r/rD5kS9
HTH
EDIT: for some reason the existing answer wasn't showing up for me when I started this response. Forgive the redundancy.

Related

How can you find a pattern within X words? Attempting to re-create dtsearch w/n logic

The real deal has a bit more words I'm searching for, but the basic idea is that I am looking for date within 5 words of (birthday|birth date|birthdate)
Trying to have this done in both directions (2/1/2020 word1 birthday as well as birthday word1 2/1/2020)
I'm not using any kind of Python RegEx variations. Essentially limited to a text editor due to limited resources.
This pattern look for a group of numbers which looks like a date a maxiumum of 5 spaces between it and one of the key words.
NB The 'date' format is very loose. There is no effort to check days in the month, months in the year etc. for example the 50/15/9999 will be accepted.
(birthday|birth date|birthdate)( [^ ]*){0,5}(\d\d?([-\/])\d\d?([-\/])\d{2,4})|(\d\d?([-\/])\d\d?([-\/])\d{2,4})( [^ ]*){0,5}(birthday|birth date|birthdate)

RegEx for PostgreSQL 'interval' function

I need to implement regex validation for value that will be used in my server side to get data where certain timestamp is older (smaller) than now() - interval 'myValue'.
pSQL interval function is explained here, and in short it can have values like 2 days,3 years,12 hours, but also you can nest more different values like 2 days 6 hours 30 minutes etc.
I currently have a regex /^\d+\s(seconds?|minutes?|hours?|days?|weeks?|months?|years?)$/i that accepts only one value (e.g. 2 days), but can't figure out how to allow multiple values, and set a rule that a certain string from this group can only be repeated once or not at all.
This regex /^\d+\s(seconds?|minutes?|hours?|days?|weeks?|months?|years?)(\s\d+\s(seconds?|minutes?|hours?|days?|weeks?|months?|years?))*$/i allows nesting but also allows repetition of values e.g. 2 days 12 hours 6 hours 2 minutes which will result in a fatal error in pSQL query.
I tried restricting repetition of values in this group with \1 and {0,1} combination of regex operators but I just can't nail it precisely enough.
NOTE: Regex is unfortunately only way I can validate this value, since I don't have access to server-side controller which receives this value nor do I have access to client-side frontend of this form. I can't just throw exceptions or skip query because it is a part of important cron-job, and must be stable at all time.
(All I have access to is json schema of this value, and therefore can only define regex pattern for it)
Any help is appreciated, thanks.
You can use
^(?!.*(second|minute|hour|day|week|month|year).*\1)\d+\s+(?:second|minute|hour|day|week|month|year)s?(?:\s+\d+\s+(?:second|minute|hour|day|week|month|year)s?)*$
See the regex demo
Details
^ - start of string
(?!.*(second|minute|hour|day|week|month|year).*\1) - no second, minute, hour day, week, monthoryear` string repetition allowed in the whole string
\d+\s+(?:second|minute|hour|day|week|month|year)s? - 1 or more digits, one or more whitespaces, then either second, minute, hour, day, week, month or year, and then an optional s letter
(?:\s+\d+\s+(?:second|minute|hour|day|week|month|year)s?)* - zero or more repetition of one or more whitespaces followed with the pattern described above
$ - end of string.
Forget it. The only complete documentation of the supported values for interval is the implementation (the guts are in ParseDateTime).
Consider these:
SELECT INTERVAL '12 00:12:00';
interval
══════════════════
12 days 00:12:00
(1 row)
SELECT INTERVAL '12 d 12 mins';
interval
══════════════════
12 days 00:12:00
(1 row)
SELECT INTERVAL '3-2';
interval
════════════════
3 years 2 mons
(1 row)
What I would do in your place is to write a function that casts the string to interval and catches and reports an error:
CREATE FUNCTION interval_ok(text) RETURNS boolean
LANGUAGE plpgsql AS
$$BEGIN
PERFORM CAST ($1 AS interval);
RETURN TRUE;
EXCEPTION
WHEN invalid_datetime_format THEN
RETURN FALSE;
END;$$;

Date format comparmission

I have question. I am trying to prepare date regex comparmission. The problem is month and day if its one digit it can be present as 03 or 3 for both month and day. For instance possible values:
2015/03/27 or 2015/4/12 or 2015/07/05 or 2015/2/2 or 2015/02/3
What i did so far is:
^(?<Month>\d(0([0-1]|1[0-2])|([1-12])){1,2})/(?<Day>\d{1,2})/(?<Year>(?:\d{4}|\d{2}))$
I started to make now for month:
(?<Month>\d(0([0-1]|1[0-2])|([1-12])){1,2})
(0([0-1]|1[0-2])|([1-12])){1,2})
so {1,2} - because can be one digit or two for instance (12, 2, 02)
0([0-1]|1[0-2]) | ([1-12])) - because can be two digits or one
somehow i cant figure it into the final version.
Can you help me out?
Using just \d, you might end up with fake dates, like 12/67/4567.
Also, your input has another date format: Year/Month/Day.
I suggest using this regex for your input format:
^(?<Year>(?:19|20)\d{2})\/(?<Month>0?[1-9]|1[0-2])\/(?<Day>3[01]|0?[1-9]|[12][0-9])$
See demo
Optional 0s are made possible due to the ? quantifier after 0.
If it is for .NET, you do not have to escape /s.
To validate the date, use the classes and methods of the programming environment you are using. Here is an example in C#:
var resultFromRegex = "2015/03/27";
DateTime validDate;
var isValid = DateTime.TryParseExact(resultFromRegex, "yyyy/MM/dd", new System.Globalization.CultureInfo("en-US"), System.Globalization.DateTimeStyles.None, out validDate);

extract number from string in Oracle

I am trying to extract a specific text from an Outlook subject line. This is required to calculate turn around time for each order entered in SAP. I have a subject line as below
SO# 3032641559 FW: Attached new PO 4500958640- 13563 TYCO LJ
My final output should be like this: 3032641559
I have been able to do this in MS excel with the formulas like this
=IFERROR(INT(MID([#[Normalized_Subject]],SEARCH(30,[#[Normalized_Subject]]),10)),"Not Found")
in the above formula [#[Normalized_Subject]] is the name of column in which the SO number exists. I have asked to do this in oracle but I am very new to this. Your help on this would be greatly appreciated.
Note: in the above subject line the number 30 is common in every subject line.
The last parameter of REGEXP_SUBSTR() indicates the sub-expression you want to pick. In this case you can't just match 30 then some more numbers as the second set of digits might have a 30. So, it's safer to match the following, where x are more digits.
SO# 30xxxxxx
As a regular expression this becomes:
SO#\s30\d+
where \s indicates a space \d indicates a numeric character and the + that you want to match as many as there are. But, we can use the sub-expression substringing available; in order to do that you need to have sub-expressions; i.e. create groups where you want to split the string:
(SO#\s)(30\d+)
Put this in the function call and you have it:
regexp_substr(str, '(SO#\s)(30\d+)', 1, 1, 'i', 2)
SQL Fiddle

How to increment date using regex

So, I have a spinEdit that should display the year and month in this format yyyyMM. I am using RegEx to mask the value to that format but when I want to increment from say 201212 to 201301, it fails and displays 20121. The RegEx I am using looks like this
([0-9][0-9][0-9][0-9])(0[1-9])|(1[0-2])
The issue is that incrementing the value (add 1 to month) isn't incrementing the year field when the month is at 12. The same happens in reverse where decreasing the value (minus 1 month) isn't decreasing the year, 201301 - 1 takes it to 2013. Is there a way to fix this using just RegEx?
I think it is possible, but not fully regex solution, you need to have linux and bash available (personally I find the date function in bash ve) I had to get the date formats (string) in a filename and compare it to a date in a script. Below is the code snippet:
#!/bin/bash
#yyyymm you got after regex
inputdate = 201307
#value you want to subtract
x = 8
#outputdate should return you 201211
outputdate = $(date -d "$inputdate01 -$x month" +"%Y%m")
I believe there may be a way, however that is far to complicated for its worth in a practical situation. So by keeping things simple, it is not possible.