So I have the following string example in a Google Spreadsheet:
AN_U_John_Doe_01.01.1900_24.01.2022.pdf
I want to REGEXTRACT the second date, which would be:
24.01.2022
I did the following, which works but I'm sure there's a better way to do this:
REGEXEXTRACT($A1;"(\d+\.\d+\.\d+\.)") which results in: 24.01.2022. (notice the dot at the end)
and then I do REGEXEXTRACT($B1;"(\d+\.\d+\.\d+)") which gets rid of the dot.
Is there a way to do this in one regextract? Also the front part of the string might not always be the same, can be shorter or longer only the dates are always in the end like that.
You can join your two REGEXEXTRACTs by using this:
=REGEXEXTRACT($A1,"(\d+\.\d+\.\d+)\.")
try:
=REGEXEXTRACT(A1; "_(\d+.\d+.\d+).pdf$")
Related
I am trying to remove square brackets around a date field in Google Data Studio so I can properly treat it as a proper date dimension.
It looks like this:
[2020-05-20 00:00:23]
and I am using the RegEx of REGEXP_REPLACE(Date, "/[\[\]']+/g", "") and I want it to look like this for the output:
2020-05-20 00:00:23
It keeps giving me error results and will not work. I can not figure out what I am doing wrong here, I've used https://www.regextester.com/ to verify that it should work
Regarding Dates, it can be achieved with a single TODATE Calculated Field:
TODATE(Date, "[%Y-%m-%d %H:%M:%S]", "%Y%m%d%H%M%S")
The Date Type can then be set as required:
YYYYMMDD: Date
YYYYMMDDhh: Date Hour
YYYYMMDDhhmm: Date Hour Minute
Google Data Studio Report and GIF to elaborate:
You need to use a plain regex pattern, not a regex literal notation (/.../g).
Note that REGEXP_REPLACE removes all occurrences found, thus, there is no need for a g flag.
Use
REGEXP_REPLACE(Date, "[][]+", "")
to remove all square brackets in Date.
I have text in a column like /AB/25MAR92/ and /AB/25MAR1992/. I am trying to extract just 25MAR92 and 25MAR1992 from the column for a date calculation that I have to work on. Can you please help with the REGEXP_SUBSTR function for this issue?
Thanks!
You could try:
\b\d{1,2}[A-Z]{3}\d{2,4}\b
but this will also match 02MAR992. To exclude this possibility use:
\b\d{1,2}[A-Z]{3}(?:\d{2}|\d{4})\b
This will match 02MAR1992 and02MAR92 but will not match02MAR992.
I suggest using a pattern like this:
\/(\d{2}[A-Z]{3}(19|20)?\d{2})\/
Years are limited to 1900-2099.
Demo
If you do not want to allow any 2-digit value for the day \d{2},
you could add this pattern instead (0[1-9]|[12][0-9]|3[01]) that matches 01-31;
\/((0[1-9]|[12][0-9]|3[01])[A-Z]{3}(19|20)?\d{2})\/
Or if you allow dates like /AB/2MAR92/ that have days without a leading zero
add (0[1-9]|[12][0-9]|3[01]|[1-9]) instead:
\/((0[1-9]|[12][0-9]|3[01]|[1-9])[A-Z]{3}(19|20)?\d{2})\/
I've used / as anchors. If you don't like that, you can use \b.
In reaction to your latest comments, my recommended pattern looks like this:
\b\d{1,2}[A-Z]{3}(?:19|20)?\d{2}\b
I'm trying to create a custom filter in Google Analytic to remove the query parts of the url which I don't want to see. The url has the following structure
[domain]/?p=899:2000:15018702722302::NO:::
I would like to create a regex which skips the first 12 characters (that is until:/?p=899:2000), and what ever is going to be after that replace it with nothing.
So I made this one: https://regex101.com/r/Xgbfqz/1 (which could be simplified to .{0,12}) , but I actually would like to skip those and only let the regex match whatever is going to be after that, so that I'll be able to tell in Google Analytics to replace it with "".
The part in the url that is always the same is
?p=[3numbers]:[0-4numbers]
Thank you
Your regular expression:
\/\?p=\d{3}\:\d{0,4}(.*)
Tested in Golang RegEx 2 and RegEx101
It search for /p=###:[optional:####] and capture the rest of the right side string.
(extra) JavaScript:
paragraf='[domain]/?p=899:2000:15018702722302::NO:::'
var regex= /\/\?p=\d{3}\:\d{0,4}(.*)/;
var match = regex.exec(paragraf);
alert('The rest of the right side of the string: ' + match[1]);
Easily use "[domain]/?p=899:2000:15018702722302::NO:::".substr(12)
You can try this:
/\?p\=\d{3}:\d{0,4}
Which matches just this: ?p=[3numbers]:[0-4numbers]
Not sure about replacing though.
https://regex101.com/r/Xgbfqz/1
I need to put together a regex that matches a patter only if string does not begin with 'N'.
Here is my pattern so far [A-E]+[-+]?.
Now I want to make sure that it does not match something like:
N\A
NA
NB+
NB-
NCAB
This is for REGEXP_SUBSTR command in Oracle SQL DB
UPDATE
It looks like I should have been more specific, sorry
I want to extract from a string [A-E]+[-+]? but if the string also matches ^(N|n) then I want my regex to return nothing.
See examples below:
String Returns
N/A
F1/AAA AAA
NABC
FABC ABC
To match a character between A and E not preceded by N, you can use:
([^N]|^)[A-E]+
If you want to avoid fields that contains N[A-E] use a negation in your query using the pattern N[A-E] (in other words, use two predicates, this one to exclude NA and the first to find A)
To be more clear:
WHERE NOT REGEXP_LIKE(coln, 'N[A-E]') AND REGEXP_LIKE(coln, '[A-E]')
Ok I figured it out, I broadened the scope of the problem a little, I realized that I can also play with other parameters of REGEXP_SUBSTR in this case that I can have returned only second substring.
REGEXP_SUBSTR(field1, '^([^NA-D][^A-D]*)?([A-D]+[-+]?)',1,1,'i',2)
I still have to give you guys the credit, lot of good ideas that led me to here.
Just throw a [^N]? in front. That should do it.
OOPS...
That actually needs to include an " OR ^ "...
It should look like this:
([^N]|^)[A-E]+[-+]?
Sorry about that...It looks like the right answer already got posted anyway.
I have the following date string - "2013-02-20T17:24:33Z"
I want to write a regex to extract just the date part "2013-02-20". How do I do that? Any help will be appreciated.
Thanks,
Murtaza
You could use capture group for this.
/(\d{4}-\d{2}-\d{1,2}).*/
Using $1, you can get your desired part.
Well straightforward approach would be \d\d\d\d-\d\d-\d\d but you can also use quantifiers to make it look nicer \d{4}-\d{2}-\d{2}.
Just search for the first T and use substring. I assume you always get a well-formatted date string.
If the date string is not guaranteed to be valid, you can use any date related library to parse and validate the input (validation includes the calendar logic, which regex fails to achieve), and reformat the output.
No sample code, since you didn't mention the language.
using substring
string date = "2013-02-20T17:24:33Z";
string h = date.Substring(0, 10);