RegEx expression that will match Date format - regex

I have a line with "1999-08-16"^^xsd:date. What will be the regex to capture the whole string as "1999-08-16"^^xsd:datein flex file? And, is it possible to capture only "1999-08-16" as a string. If yes, then what will be the regex for it in flex?

Try this one:
^"(\d{4}-(?:0?[1-9]|1[012])-(?:30|31|[12]\d|0?[1-9]))"\^\^xsd:date$
Explanation:
^ - start of line
\d{4} - year part
(?:0?[1-9]|1[012]) - month, can be:
01-09 or 1-9 (that's why 0?)
10,11,12 (1[012] part)
?: means non-capturing group (if we need just alternating matches
with |, but not outputting them to user)
(?:30|31|[12]\d|0?[1-9]) - day part, can be:
30,31,
10-29 (part [12]\d)
1-9, or 01-09 (0?[1-9])
Also we use non-capturing group for matching day
$ matches end of line
Everything in between "" is captured to standard capturing group, as you need to extract date
Demo
NOTICE:
When matching days we put 1-9 day numbers in LAST alternating group:
(?:30|31|[12]\d|0?[1-9])
that's because regex engine when given alternating matches uses FIRST matched result and other matched alternatives are ignored. For example-
in string 1 11 expression:
(?:\d{2}|\d) gives 2 matches
(?:\d|\d{2}) gives 3 matches

To capture whole string \"[0-9]{4}-[0-9]{2}-[0-9]{2}\"\^\^[^ ]* can be used.

Related

regex match two words based on a matching substring

there are 4 strings as shown below
ABC_FIXED_20220720_VALUEABC.csv
ABC_FIXED_20220720_VALUEABCQUERY_answer.csv
ABC_FIXED_20220720_VALUEDEF.csv
ABC_FIXED_20220720_VALUEDEFQUERY_answer.csv
Two strings are considered as matched based on a matching substring value (VALUEABC, VALUEDEF in the above shown strings). Thus I am looking to match first 2 (having VALUEABC) and then next 2 (having VALUEDEF). The matched strings are identified based on the same value returned for one regex group.
What I tried so far
ABC.*[0-9]{8}_(.*[^QUERY_answer])(?:QUERY_answer)?.csv
This returns regex group-1 (from (.*[^QUERY_answer])) value "VALUEABC" for first 2 strings and "VALUEDEF" for next 2 strings and thus desired matching achieved.
But the problem with above regex is that as soon as the value ends with any of the characters of "QUERY_answer", the regex doesn't match any value for the grouping. For instance, the below 2 strings doesn't match at all as the VALUESTU ends with "U" here :
ABC_FIXED_20220720_VALUESTU.csv
ABC_FIXED_20220720_VALUESTUQUERY_answer.csv
I tried to use Negative Lookahead:
ABC.*[0-9]{8}_(.*(?!QUERY_answer))(?:QUERY_answer)?.csv
but in this case the grouping-1 value is returned as "VALUESTU" for first string and "VALUESTUQUERY_answer" for second string, thus effectively making the 2 strings unmatched.
Any way to achieve the desired matching?
With your shown samples please try following regex.
^ABC_[^_]*_[0-9]+_(.*?)(?:QUERY_answer)?\.csv$
OR to match exact 8 digits try:
^ABC_[^_]*_[0-9]{8}_(.*?)(?:QUERY_answer)?\.csv$
Here is the online demo for above regex.
Explanation: Adding detailed explanation for above regex.
^ABC_[^_]*_ ##Matching from starting of value ABC followed by _ till next occurrence of _.
[0-9]+_ ##Matching continuous occurrences of digits followed by _ here.
(.*?) ##Creating one and only capturing group using lazy match which is opposite of greedy match.
(?:QUERY_answer)? ##In a non-capturing group matching QUERY_answer and keeping it optional.
\.csv$ ##Matching dot literal csv at the end of the value.
You need
ABC.*[0-9]{8}_(.*?)(?:QUERY_answer)?\.csv
See the regex demo.
Note
.*[^QUERY_answer] matches any zero or more chars other than line break chars as many as possible, and then any one char other than Q, U, E, etc., i.e. any char in the negated character class. This is replaced with .*?, to match any zero or more chars other than line break chars as few as possible.
(?:QUERY_answer)? - the group is made non-capturing to reduce grouping complexity.
\.csv - the . is escaped to match a literal dot.

replaceAll regex to remove last - from the output

I was able to achieve some of the output but not the right one. I am using replace all regex and below is the sample code.
final String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
System.out.println(label.replaceAll(
"([^-]+)-([^-]+)-(.+)-([^-]+)-([^-]+)", "$3"));
i want this output:
abc-nyd-request-xyxpt
but getting:
abc-nyd-request-xyxpt-
here is the code https://ideone.com/UKnepg
You may use this .replaceFirst solution:
String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
label.replaceFirst("(?:[^-]*-){2}(.+?)(?:--1)?-[^-]+$", "$1");
//=> "abc-nyd-request-xyxpt"
RegEx Demo
RegEx Details:
(?:[^-]+-){2}: Match 2 repetitions of non-hyphenated string followed by a hyphen
(.+?): Match 1+ of any characters and capture in group #1
(?:--1)?: Match optional --1
-: Match a -
[^-]+: Match a non-hyphenated string
$: End
The following works for your example case
([^-]+)-([^-]+)-(.+[^-])-+([^-]+)-([^-]+)
https://regex101.com/r/VNtryN/1
We don't want to capture any trailing - while allowing the trailing dashes to have more than a single one which makes it match the double --.
With your shown samples and attempts, please try following regex. This is going to create 1 capturing group which can be used in replacement. Do replacement like: $1in your function.
^(?:.*?-){2}([^-]*(?:-[^-]*){3})--.*
Here is the Online demo for above regex.
Explanation: Adding detailed explanation for above regex.
^(?:.*?-){2} ##Matching from starting of value in a non-capturing group where using lazy match to match very near occurrence of - and matching 2 occurrences of it.
([^-]*(?:-[^-]*){3}) ##Creating 1st and only capturing group and matching everything before - followed by - followed by everything just before - and this combination 3 times to get required output.
--.* ##Matching -- to all values till last.

Regex to remove time zone stamp

In Google Sheets, I have time stamps with formats like the following:
5/25/2022 14:13:05
5/25/2022 13:21:07 EDT
5/25/2022 17:07:39 GMT+01:00
I am looking for a regex that will remove everything after the time, so the desired output would be:
5/25/2022 14:13:05
5/25/2022 13:21:07
5/25/2022 17:07:39
I have come up with the following regex after some trial and error, although I am not sure if it is prone to errors: [^0-9:\/' '\n].*
And the function in Google Sheets that I plan to use is REGEXREPLACE().
My goal is to be able to do calculations regardless of one's time zone, however the result will be stamped with the user's local time zone.
Could someone confirm this is correct? Appreciate any feedback I can get!
You can use
=REGEXREPLACE(A1, "^(\S+\s\S+).*", "$1")
=REGEXREPLACE(A1, "^([\d/]+\s[\d:]+).*", "$1")
See the regex demo #1 / regex demo #2.
Details:
^ - start of string
(\S+\s\S+) - Group 1: one or more non-whitespaces, one or more whitespaces and one or more non-whitespaces
[\d/]+\s[\d:]+ - one or more digits or / chars, a whitespace, one or more digits or colons
.* - any zero or more chars other than line break chars as many as possible.
The $1 is a replacement backreference that refers to the Group 1 value.
With your shown samples, attempts please try following regex in REGEXREPLACE. This will help to match time stamp specifically. Here is the Online demo for following regex. This will create only 1 capturing group with which we are replacing the whole value(as per requirement).
=REGEXREPLACE(A1, "^((?:\d{1,2}\/){2}\d{4}\s+(?:\d{1,2}:){2}\d{1,2}).*", "$1")
Explanation: Adding detailed explanation for above used regex.
^( ##Matching from starting of the value and creating/opening one and only capturing group.
(?:\d{1,2}\/){2} ##Creating a non-capturing group and matching 1 to 2 digits followed by / with 2 times occurrence here.
\d{4}\s+ ##Matching 4 digits occurrence followed by 1 or more spaces here.
(?:\d{1,2}:){2} ##In a non-capturing group matching 1 to 2 occurrence of digits followed by colon and this combination should occur2 times.
\d{1,2} ##Matching 1 to 2 occurrences of digits.
) ##Closing capturing group here.
.* ##This will match everything till last but its not captured.
You can do this without REGEX by simply splitting and adding the first and second index.
=ARRAYFORMULA(
IF(ISBLANK(A2:A),,
INDEX(SPLIT(A2:A," "),0,1)+
INDEX(SPLIT(A2:A," "),0,2)))

Regex (PCRE): Match all digits in a line following a line which includes a certain string

Using PCRE, I want to capture only and all digits in a line which follows a line in which a certain string appears. Say the string is "STRING99". Example:
car string99 house 45b
22 dog 1 cat
women 6 man
In this case, the desired result is:
221
As asked a similar question some time ago, however, back then trying to capture the numbers in the SAME line where the string appears ( Regex (PCRE): Match all digits conditional upon presence of a string ). While the question is similar, I don't think the answer, if there is one at all, will be similar. The approach using the newline anchor ^ does not work in this case.
I am looking for a single regular expression without any other programming code. It would be easy to accomplish with two consecutive regex operations, but this not what I'm looking for.
Maybe you could try:
(?:\bstring99\b.*?\n|\G(?!^))[^\d\n]*\K\d
See the online demo
(?: - Open non-capture group:
\bstring99\b - Literally match "string99" between word-boundaries.
.*?\n - Lazy match up to (including) nearest newline character.
| - Or:
\G(?!^) - Asserts position at the end of the previous match but prevent it to be the start of the string for the first match using a negative lookahead.
) - Close non-capture group.
[^\d\n]* - Match 0+ non-digit/newline characters.
\K - Resets the starting point of the reported match.
\d - Match a digit.

JScript Regex - extract dates preceded by substrings

I've got oneline string that includes several dates. In JScript Regex I need to extract dates that are proceded by case insensitive substrings of "dat" and "wy" in the given order. Substrings can be preceded by and followed by any character (except new line).
reg = new RegExp('dat.{0,}wy.{0,}\\d{1,4}([\-/ \.])\\d{1,2}([\-/ \.])\\d{1,4}','ig');
str = ('abc18.Dat wy.03/12/2019FFF*Dato dost2009/03/03**data wy2020-09-30')
result = str.match(reg).toString()
Received result: 'Dat wy.03/12/2019FFF*Dato dost2009/03/03**data wy2020-09-30'
Expected result: 'Dat wy.03/12/2019,data wy2020-09-30' or preferably: '03/12/2019,2020-09-30'
Thanks.
Several issues.
You want to match as few as possible between the substrings and date, but your current regex uses greed .{0,} (same like .*). See this Question and use .*? instead.
dat.*?wy.*?FOO can still skip over any other dat. To avoid skipping over, use what some call a Tempered Greedy Token. The .*? becomes (?:(?!dat).)*? for NOT skipping over.
Not really an issue, but you can capture the date separator and reuse it.
If you want to extract only the date part, also use capturing groups. I put a demo at regex101.
dat(?:(?!dat).)*?wy.*?(\d{1,4}([/ .-])\d{1,2}\2\d{1,4})
There are many ways to achieve your desired outcome. Another idea, I would think of - if you know, there will never appear any digits between the dates, use \D for non-digit instead of the .
dat\D*?wy\D*(\d{1,4}([/ .-])\d{1,2}\2\d{1,4})
You might use a capturing group with a backreference to make sure the separators like - and / are the same in the matched date.
\bdat\w*\s*wy\.?(\d{4}([-/ .])\d{2}\2\d{2}|\d{2}([-/ .])\d{2}\3\d{4})
\bdat\w*\s*wy\.? A word boundary, match dat followed by 0+ word chars and 0+ whitespace chars. Then match wy and an optional .
( Capture group 1
\d{4}([-/ .])\d{2}\2\d{2} Match a date like format starting with the year where \2 is a backreference to what is captured in group 2
| Or
\d{2}([-/ .])\d{2}\3\d{4} Match a date like format ending with the year where \3 is a backreference to what is captured in group 3
) Close group
The value is in capture group 1
Regex demo
Note That you could make the date more specific specifying ranges for the year, month and day.