Regular Expression to extract a string based on delimiter - regex

I am trying to extract a substring from a string based on delimiter '.'(period). Can someone share your thoughts on how to do it using regexp_extract please. Thanks.
**
- Input:-
15.075
0.035
**
Output
075
035

From this answer, it appears that you can use parentheses to capture of the match, as you would in most regex systems. That is, match the whole ".[0-9]+", but only capture the numeric portion, by surrounding it with parentheses, like this:
select regexp_extract(input, r'\.([0-9]+)');
This says to match a period followed by one or more numbers, and to extract the numeric portion only. I think that the leading r marks that string as a regular expression, but I can't find documentation on it.
Reference: https://cloud.google.com/bigquery/query-reference?hl=en#regularexpressionfunctions

It seems that you will want to use REGEXP_EXTRACT
REGEXP_EXTRACT(number, r'\.(\d+)')

Related

enclosing regular expression in parentheses in Notepad++ [duplicate]

This question already has an answer here:
Notepad++: add parentheses to timestamps
(1 answer)
Closed 1 year ago.
so i have a big list of items in excel. i copied them to Notepad++ because it has regex built in.
it could be AuAC21-XTS02L or BgUX20-C02S etc. basically i want to replace thses two with Au(AC21-XTS02)L and Bg(UX20-C02)S.
with the regular expression \D\D\d\d-(\D){1,3}\d\d i can perfectly find the part of the text that i want to enclose with parentheses but now i dont know how.
i tried using (\D\D\d\d-(\D){1,3}\d\d) as replacement but then i just receive something like Au(DDdd-D{1,3}dd)L.
any help would be appreciated.
You can store the whole matched string in a group and then replace that with ($1). Note that depending on your Notepad++ version you may need to use \ instead of $ to refer to a matching group (i.e. the replacement string would be (\1))
Take a look at this Regex101 snippet: https://regex101.com/r/0e1Wcc/1
It will convert a sample input like,
it could be AuAC21-XTS02L or BgUX20-C02S etc.
it could be AuAC21-XTS02L or BgUX20-C02S etc.
it could be AuAC21-XTS02L or BgUX20-C02S etc.
into
it could be Au(AC21-XTS02)L or Bg(UX20-C02)S etc.
it could be Au(AC21-XTS02)L or Bg(UX20-C02)S etc.
it could be Au(AC21-XTS02)L or Bg(UX20-C02)S etc.
You can take the full match $0 for pattern \D\D\d\d-\D{1,3}\d\d without a capture group because that is not needed, and use it in the replacement between parenthesis \($0\)
The output will be
Au(AC21-XTS02)L or Bg(UX20-C02)S
Note that \D matches any character except a digit, so it could also match a space or a newline.
Looking at the example strings, at bit more precise match (using the same replacement \($0\) could be:
[A-Z][a-z]\K[A-Z]{2}\d\d-[A-Z0-9]{1,3}\d\d(?=[A-Z])
Regex demo

How to extract big mgrs using regex

I have an input json:
{"id":12345,"mgrs":"04QFJ1234567890","code":"12345","user":"db3e1a-3c88-4141-bed3-206a"}
I would like to extract with regular expression MGRS of 1000 kilometer, in my example result should be: 04QFJ1267
First 2 symbols always digits, next 3 always chars and the rest always digits. MGRS have a fix length of 15 chars at all.
Is it possible?
Thanks.
All you really need to do is remove characters 8-10 and 13-15. If you want/need to do that using regex, then you could use the replace method with regex: (EDIT Edited to remove the rest of the string).
.*?(\w{7})\d{3}(\d{2})\d+.*
and replacement string:
$1$2
I see now you are using Java. So the relevant code line might look like:
resultString = subjectString.replaceAll(".*?(\\w{7})\\d{3}(\\d{2})\\d+.*", "$1$2");
The above assumes all your strings look like what you showed, and there is no need to test to be sure that "mgrs" is in the string.

Using parameters in regular expressions

I am trying to use NotePad++ to do a search and replace using the regex function that replaces a string of characters but maintains one part of the string. My description isn't very good so perhaps it will be better if I just give you the example.
Throughout and xml doc I have the following elements...
<AddressLine3>addressLine3>
<AddressLine2>addressLine2>
I want to replace these with
<addressLine3> <addressLine2>
So I need to maintain the address line number.
I know that
AddressLine([0-9]{1})>addressLine([0-9]{1})
is a valid reg ex but I'm not sure what to put in the replace with section to tell it to maintain whatever value was found by ([0-9]{1}).
Thanks.
It's \{number of the group}, so \1, \2, ...
Edit with your precisions (I changed a bit your regex for simpler groups):
(AddressLine[0-9]{1}>)(addressLine[0-9]{1}) is replaced by \2
You can capture it in group and replace them
Find:(AddressLine[0-9])>(addressLine[0-9])
Replace:$1 <$2
Find what : (<AddressLine\d>)AddressLine\d
Replace by: $1
You have to select the choice regular expression

Regex substring

I'm trying to select a substring using regex and I'm going round in circles. I need to select everything before the first "_".
exampale URL - GI_2013_JUNE_10_VOL3_LASTCHANCE
So the result Im looking for from the URL above would be "GI". The text before the first "_" can vary in length.
Any help would be much apprecited
The regex would be:
^[^_]+
and grab the whole regex match. But as a comment says, using a substring function is more efficient!
^[^_]*
...is the expression you're looking for.
It basically says: Select everything that is not an underscore, starting at the beginning of the string.
http://regexr.com?356in

Regular Expression Time Format?

I have a regular expression which accept time in a specific format like the following,
"10:00".
I want to change the regular expression to etiher accpet this format or accept only a single one dash only ("-").
Here is the expression:
/^((\d)|(0\d)|(1\d)|(2[0-3]))\:((\d)|([0-5]\d))$/
Key points to solving this:
Square brackets ([ and ]) are used to enclose character classes.
The pipe | means or.
[\:|-] means chech for either a literal : or a hyphen -.
The resulting pattern is:
^((\d)|(0\d)|(1\d)|(2[0-3]))[\:|-]((\d)|([0-5]\d))$
Just use an alternative:
^<your regex>$|^-$
This will match either a time in your format or a single hyphen-minus.
Does this regex need to have so many brackets?
/^(([01]?\d|2[0-3]):[0-5]\d|-)$/