regex how to match specific words and specific delimeters - regex

I'm trying to capture some input from the user.
/mon,thu',
/mon',
/mon,thu,wed',
/mon,thu-sun'
/mon,tue-thu,sun'
so the "business logic" is that the user can put any of the following words
mon, tue, wed, thu, fri, sat, sun
and they can either be separated by
- or ,
if they are separated by
-
there can only be one day either side i.e
mon-wed
not
mon-wed-sun
if separated by a
,
then only one of the mon, tue, wed, thu, fri, sat, sun can be either side of it.
Basically
,
represents a specific day and
-
represents a range of days
the closest I have been able to get is:
(\bmon\b|\btue\b|\bwed\b|\bthu\b|\bfri\b|\bsat\b|\bsun\b)

I've come up with this:
(mon|tue|wed|thu|fri|sat|sun)(, ?(mon|tue|wed|thu|fri|sat|sun))*(- ?(mon|tue|wed|thu|fri|sat|sun))?(, ?(mon|tue|wed|thu|fri|sat|sun))*
The idea here is that it matches day(,day)(-day)?(,day)
It matches the following:
mon,thu
mon
mon,thu,wed
mon,thu-sun
mon,tue-thu,sun
mon, tue, wed, thu, fri, sat, sun (even with spaces in ,)
mon-wed
but not:
mon-wed-sun

If you are using a PCRE pattern (or any other regex dialect that supports defines) you might wan't to avoid repeating the weekdays over and over, e.g. using
(?(DEFINE)
(?<weekday>\b(?:mon|tue|wed|thu|fri|sat|sun)\b)
(?<field>(?&weekday)(?:-(?&weekday))?)
)
(?&field)(?:,(?&field))*
in verbose mode. See https://regex101.com/r/ELPd6V/1
Note that this would profit from cleaning up the mess around it first and then applying anchors, currently mon-tue-wed will give you two matches.

This will solve your issue. basically, matching "mon" to "sun" then an optional comma not followed by a - (?:,(?!-))?, followed by a repeating (0 to 1) -(mon to sun).
when the case of mon-sun the optional comma will fail the match, hence it
will skip, allowing the - to succeed.
const logicalRE = /(mon|tue|wed|thu|fri|sat|sun)(?:,(?!-))??(?:-(mon|tue|wed|thu|fri|sat|sun)){0,1}/g;
/* cases */
const tCases = ["mon,thu", "mon", "mon,thu,wed", "mon,thu-sun", "mon,tue-thu,sun", "mon-wed-sun", "mon,-thu"]
tCases.forEach(tCase => {
console.log(tCase.match(logicalRE))
})
demo link: https://regex101.com/r/BcqUye/5

Related

How to reformat this datetime without regex in Google Sheets?

In Google Sheets i want to reformat this datetime Mon, 08 Mar 2021 10:57:15 GMT into this 08/03/2021.
Using RegEx i achieve the goal with
=to_date(datevalue(REGEXEXTRACT("Mon, 08 Mar 2021 10:57:15 GMT","\b[0-9]{2}\s\D{3}\s[0-9]{4}\b")))
But how can i do it without RegEx? This datetime format seems to be a classic one - can it really be, that no onboard formula can't do it? I rather think, i miss the right knowledge here...
Please try the following formula and format as date
=TRIM(LEFT(INDEX(SPLIT(K13,","),,2),12))*1
(do adjust according to your locale)
Another option is to use Custom Script.
Example:
Code:
function formatDate(date) {
return Utilities.formatDate(new Date(date), "GMT", "dd/MM/YYYY")
}
Formula in B1: =formatDate(A1)
Output:
Reference:
Custom Functions in Google Sheets

How to restrict matches to the first 5 lines of an email body using regex in GAS [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I'm using the following script which is working correctly to pull 2 fields out of an email body.
This is causing the script execution time to increase significantly due to the amount of content in the body. Is there a way to make this search through only the first 5 lines of the email body?
First lines of e-mail:
Name: Full Report
Store: River North (Wells St)
Date Tripped: 19 Feb 2020 1:07 PM
Business Date: 19 Feb 2020 (Open)
Message:
Information:
This alert was tripped based on a user defined trigger: Every 15 minutes.
Script:
//gets first(latest) message with set label
var threads = GmailApp.getUserLabelByName('South Loop').getThreads(0,1);
if (threads && threads.length > 0) {
var message = threads[0].getMessages()[0];
// Get the first email message of a threads
var tmp,
subject = message.getSubject(),
content = message.getPlainBody();
// Get the plain text body of the email message
// You may also use getRawContent() for parsing HTML
// Implement Parsing rules using regular expressions
if (content) {
tmp = content.match(/Date Tripped:\s*([:\w\s]+)\r?\n/);
var tripped = (tmp && tmp[1]) ? tmp[1].trim() : 'N/A';
tmp = content.match(/Business Date:\s([\w\s]+\(\w+\))/);
var businessdate = (tmp && tmp[1]) ? tmp[1].trim() : 'N/A';
}
}
You can use the pattern /^(?:.*\r?\n){0,5}/ to grab the first 5 lines of the email, then run your search against this smaller string. Here's a browser example with hardcoded content, but I tested it in Google Apps Script.
const Logger = console; // Remove this for GAS!
const content = `Name: Full Report
Store: River North (Wells St)
Date Tripped: 19 Feb 2020 1:07 PM
Business Date: 19 Feb 2020 (Open)
Message:
Information:
This alert was tripped based on a user defined trigger: Every 15 minutes.`;
const searchPattern = /(Date Tripped|Business Date): *(.+?)\r?\n/g;
const matches = [...content.match(/^(?:.*\r?\n){0,5}/)[0]
.matchAll(searchPattern)]
const result = Object.fromEntries(matches.map(e => e.slice(1)));
Logger.log(result);
If you wish to dynamically inject the search terms, use:
const Logger = console; // Remove this for GAS!
const content = `Name: Full Report
Store: River North (Wells St)
Date Tripped: 19 Feb 2020 1:07 PM
Business Date: 19 Feb 2020 (Open)
Foo: this will match because it's on line 5
Bar: this won't match because it's on line 6
Information:
`;
const searchTerms = ["Date Tripped", "Business Date", "Foo", "Bar"];
const searchPattern = new RegExp(`(${searchTerms.join("|")}): *(.+?)\r?\n`, "g");
const matches = [...content.match(/^(?:.*\r?\n){0,5}/)[0]
.matchAll(searchPattern)]
const result = Object.fromEntries(matches.map(e => e.slice(1)));
Logger.log(result);
ES5 version if you're using the older engine:
var Logger = console; // Remove this for GAS!
var content = "Name: Full Report\nStore: River North (Wells St)\nDate Tripped: 19 Feb 2020 1:07 PM\nBusiness Date: 19 Feb 2020 (Open)\nMessage:\nInformation:\nThis alert was tripped based on a user defined trigger: Every 15 minutes.\n";
var searchPattern = /(Date Tripped|Business Date): *(.+?)\r?\n/g;
var truncatedContent = content.match(/^(?:.*\r?\n){0,5}/)[0];
var result = {};
for (var m; m = searchPattern.exec(content); result[m[1]] = m[2]);
Logger.log(result);
#ggorlen's answer is not precise, to my taste. Let's have a look at regex01
My problem with (?:.*\r?\n){0,5} is this: in english this regex says:
Take any number of characters (0 or more) ending with a newline.
Do this between 0 and 5 times.
Which means any empty string matches. If you would do a global match, there's a lot of those.
So, how could you grab the first 5 lines? Be exact! So something like
^([^\r\n]+\r?\n){5}
See regex101
P.S. #ggorlen mentioned I left the default multiline matching on in regex101, and he's right about that. Your preference may vary: choosing between ignoring messages with less than 5 lines and accepting strings with empty lines depends on your particular case.
P.S.2 I've adapted my wording and disabled the multiline and global settings in regex101 to display my concerns with it.

Extract Date and Time in ABAP via Regex

I wanted to separate the time and date from this string using REGEX because I feel like it is the only way I can separate it. But I am not really familiar on how to do it maybe someone can help me out here.
The original string: Your item was delivered in or at the mailbox at 3:34 pm on September 1, 2016 in TEXAS, MT 59102
The output i want to achieve/populate:
lv_time = 3:34 pm
lv_date = September 1, 2016
Here's the code I was trying to do but I am only able to cut it like this:
lv_status = Your item was delivered in or at the mailbox at
lv_time = 3
lv_date = :34 pm on September 1, 2016 in TEXAS, MT 59102.
Here's the code I have so far:
DATA: lv_status TYPE string,
lv_time TYPE string,
lv_date TYPE string,
lv_off TYPE i.
lv_status = 'Your item was delivered in or at the mailbox at 3:34 pm on September 1, 2016 in TEXAS, MT 59102.'.
FIND REGEX '(\d+)\s*(.*)' IN lv_status SUBMATCHES lv_time lv_date MATCH OFFSET lv_off.
lv_status = lv_status(lv_off).
You asked for it, here it comes:
\b((1[0-2]|0?[1-9]):([0-5][0-9]) ([AaPp][Mm])) on (January|February|March|April|May|June|July|August|September|October|November|December)\D?(\d{1,2}\D?)?\D?((?:19[7-9]\d|20\d{2})|\d{2})
This accepts time in HH:MM am/pm format, and dates in Jan-Dec, dd 1970-2999.
Each part is captured in its own group.
The demo shows a version that allows abbreviated month names:
Demo

Groovy String replacement with link

I have multi-lines string from git log in variable
and want to replace matched lines with hyper-links
but keep some parts of the original string with Groovy.
Example:
commit 7a1825abc69f1b40fd8eb3b501813f21e09bfb54
Author: Filip Stefanov
Date: Mon Nov 21 11:05:08 2016 +0200
TICKET-1
Test change
Change-Id: I7b4028e504de6c4a48fc34635d4b94ad038811a6
Should look like:
commit 7a1825abc69f1b40fd8eb3b501813f21e09bfb54
Author: Filip Stefanov
Date: Mon Nov 21 11:05:08 2016 +0200
<a href=http://localhost:8080/browse/TICKET-1>TICKET-1</a>
Test change
<a href=http://localhost:8081/#/q/I7b4028e504de6c4a48fc34635d4b94ad038811a6,n,z>Change-Id: I7b4028e504de6c4a48fc34635d4b94ad038811a6</a>
Im pretty bad in Groovy regex dont know how to use grouping or closures so far so good:
mystring.replaceAll(/TICKET-/, "http://localhost:8080/browse/TICKET-")
NOTE:
TICKET {int} and Change-Id {hash} are variables
mystring.replaceAll(/(TICKET-\d++)/, '$1')
.replaceAll(/Change-Id: (I\p{XDigit}++)/, 'Change-Id: $1')
Of course you have to replace the dynamic parts accordingly. Currently it is at least one digit after the TICKET- and an I and then at least one hex digit after the Change-ID:.

Extract text with regex between ","

I have been struggling making a regex to extract the information in below divided in 3 part between the ",". Only the first and second sequence (Friday and the date has succeded).
Friday, 26 Apr 2013, 18:30
I hope someone has the experience.
Best regards
Why not simply split the string and trim the excess whitespace of the individual parts? For example, verbosely written in C#:
string input = "Friday, 26 Apr 2013, 18:30";
string[] parts = input.Split(',');
for (int i = 0; i < parts.Length; i++)
{
parts[i] = parts[i].Trim();
}
Console.WriteLine(parts[0]); // "Friday"
Console.WriteLine(parts[1]); // "26 Apr 2013"
Console.WriteLine(parts[2]); // "18:30"
If you really want to use a regular expression for this, ^(.*),(.*),(.*)$ should work:
string input = "Friday, 26 Apr 2013, 18:30";
Regex regex = new Regex("^(.*),(.*),(.*)$", RegexOptions.Singleline);
Match match = regex.Match(input);
Console.WriteLine(match.Groups[1].Value.Trim()); // "Friday"
Console.WriteLine(match.Groups[2].Value.Trim()); // "26 Apr 2013"
Console.WriteLine(match.Groups[3].Value.Trim()); // "18:30"
Adding appropriate error checking is left as an exercise for the reader.
The following Regex expression is matching this whole part :
, 18:30
I hope someone has the experience.
Best regards
,+\s[0-9]+:[0-9]+ \r*.*
But yeah, that's kind of ultra specific to this ", Hour:Minuts [...]" format. You should do a split if you're using PHP or the equivalent in your language.
I think what you really want is something like this:
from datetime import datetime
s="Friday, 26 Apr 2013, 18:30"
d=datetime.strptime(s, "%A, %d %b %Y, %H:%M")
d
Out[7]: datetime.datetime(2013, 4, 26, 18, 30)
See the strptime and date format docs for details :)
Edit: sorry, I was somehow assuming you were using Python. Other languages have similar idioms though, e.g. PHP's date_parse, C#'s DateTime.Parse, etc.
You didn't specify a language so I'm going to answer this with a standard REGEX approach.
(?<=(^|,\s+)).+?(?=(,|$)) Will work for you.
Let me break up what it's doing.
(?<=(^|,\s+) - Look ahead for the start of a string or a comma followed by whitespace, but don't include it in the match. All matches must have this in front of them.
.+? - Grab all characters, but don't be greedy.
(?=(,|$)) - Look behind for the end of string or a comma. All matches must have this behind them.
When ran on your test case of Friday, 26 Apr 2013, 18:30, I get 3 matches:
Friday
26 Apr 2013
18:30
Like m01's answer, you could try this approach with C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Globalization;
namespace TestDate
{
class Program
{
static void Main(string[] args)
{
string dateString = "Friday, 26 Apr 2013, 18:30"; // Modified from MSDN
string format = "dddd, dd MMM yyyy, HH:mm";
DateTime dateTime = DateTime.ParseExact(dateString, format, CultureInfo.InvariantCulture);
Console.WriteLine(dateTime);
Console.Read();
}
}
}
This will print out the localized date and time that is configured on the user's machine. For me it printed out 4/16/2013 6:30:00 PM.