REGEX string extraction between underscore and file extension - regex

I have a series of string like the following one:
abc_8g_1980_312.tif
from which I would like to extract the string '312' i.e. everything between the 3rd underscore and the file extension'.tif' string.
I'm trying using this website https://regex101.com/
inserting this regular expression: (\d{3})(\.tif$)
but I'm not getting what I would like to have.
Any suggestions would be appreciated.

To get the last 3 digits after an underscore with extension .tif you can also use lookarounds asserting _ to the left, and .tif to the right at the end of the string.
(?<=_)\d{3}(?=\.tif$)
Regex demo

Assuming what you want to capture would always be the last underscore-separated term in your file name, you could use:
(?<=_)[^_]+(?=\.)
Demo

Related

Regex to MATCH number string (with optional text) in a sentence

I am trying to write a regex that matches only strings like this:
89-72
10-123
109-12
122-311(a)
22-311(a)(1)(d)(4)
These strings are embedded in sentences and sometimes there are 2 potential matches in the sentence like this:
In section 10-123 which references section 122-311(a) there is a phone number 456-234-2222
I do not want to match the phone. Here is my current working regex
\d{2,3}\-\d{2,3}(\([a-zA-Z0-9]\))*
see DEMO
I've been looking on Stack and have not found anything yet. Any help would be appreciated. Will be using this in a google sheet and potentially postgres.
Based on regex, suggested by #Wiktor Stribiżew:
=REGEXEXTRACT(A1,REPT("\b(\d{2,3}-\d{2,3}\b(?:\([A-Za-z0-9]\))*)(?:[^-]|$)(?:.*)",LEN(REGEXREPLACE(REGEXREPLACE(A1,"\b(\d{2,3}-\d{2,3}\b(?:\([A-Za-z0-9]\))*)(?:[^-]|$)", char (9)),"[^"&char(9)&"]",""))))
The formula will return all matches.
String:
A
In 22-311(a)(1)(d)(4) section 10-123 which ... 122-311(a) ... number 456-234-2222
Output:
B C D
22-311(a)(1)(d)(4) 10-123 122-311(a)
Solution
To extract all matches from a string, use this pattern:
=REGEXEXTRACT(A1,
REPT(basic_regex & "(?:.*)",
LEN(REGEXREPLACE(REGEXREPLACE(A1,basic_regex, char (9)),"[^"&char(9)&"]",""))))
The tail of a function:
LEN(REGEXREPLACE(REGEXREPLACE(A1,basic_regex, char (9)),"[^"&char(9)&"]","")))
is just for finding number 3 -- how many entries of a pattern in a string.
To not match the phone number you have to indicate that the match must neither be preceded nor followed by \d or -. Google spreadsheet uses RE2 which does not support look around assertion (see the list of supported feature) so as far as I can tell, the only solution is to add a character before and after the match, or the string boundary:
(?:^|[^-\d])\d{2,3}\-\d{2,3}(\([a-zA-Z0-9]\))*(?:$|[^-\d])
(?:^|[^-\d]) means either the start of a line (^) or a character that is not - or \d (you might want to change that, and forbid all letters as well). $ is the end of a line. ^ and $ only do what you want with the /m flag though
As you can see here this finds the correct strings, but with additional spaces around some of the matches.

With a regular expression, how can I get the file's name?

I've got this file here:
\\prdflsrvcl2.unicreprd.local\Integracao-PRD\GestaoTangiveis\APD\FW_A_enviar_correio_electronico_Imagem_(384).jpg-1-52FFN8.msg
With the regular expression, I want this part: FW_A_enviar_correio_electronico_Imagem_(384).jpg-1-52FFN8.msg
I'm using the following regex:
[[:alnum:]-_]+\.[a-zA-Z]*$
However, I get .jpg-1-52FFN8.msg instead of what I want.
However if the file name was without the dot before the jpg, I would get FW_A_enviar_correio_electronico_Imagem_(384)jpg-1-52FFN8.msg instead.
Basically, I want the filename with the extension.
Thanks.
Just try with following regex:
[^\\]+$
It will match all characters from the end to the first occurence of \.
Example: https://regex101.com/r/eJ8zG2/1
You can use the following regex and refer the first capturing group which is (.*):
/(?!.*\\)(.*)/g
Match all things which is not followed by \
Example

How to match either a subset (preferred), or the whole line in a regex?

I have a string that looks something like this:
"Element 1 | Element 2| Element 3: element 4"
I want to substring the portion of the source string that follows the colon (to the end of the source string), but if there is no colon, then I want to grab the whole string.
What I've tried so far are variations around this:
:.*|.*
:?.*
etc.
However, while they'll match if either the colon is present or not, they don't prefer the substring when the colon is found.
I've been playing with this on http://regexpal.com.
Ultimately, this will be used in a CMDB tool for matching CIs - so a general solution would be ideal, rather than language- or engine-specific.
You can use the following:
(:.*|[^:]*)$
See DEMO
Explanation:
if there is no colon, then I want to grab the whole string
This if condition can be specified using a negitive character class of colon
You can use:
(?:^|:)[^:\n]*$
RegEx Demo

REGEXP to grab all text before second underscore, including second underscore

So I have strings that come across like this:
GRF_STHB_010_00
ABC_AB9_004_01
BGH_NP2_002_03
AG2_BVT_007_010
The text before the first underscore can be any combo of Letters or Numbers.
The text before the second underscore can also be any combo of letters or numbers.
I want to be able to grab the whole string before the 2nd underscore, including the second underscore.
I have come up with this for now:
^([^\d]*)
It works for the first one, and finds:
GRF_STHB_
But for the other two it stops at a number that it finds:
ABC_AB
BGH_NP
AG
I need this to work in REGEXP because this is being included in a spreadsheet for grabbing data.
How can I adjust it so that it works with numbers and would have a result of:
GRF_STHB_
ABC_AB9_
BGH_NP2_
AG2_BVT_
Here is a quick tester for anyone that can help:
regexpal.com
Thanks!
You can use this regex for this:
^([^_]*_){2}
Online Demo: http://regex101.com/r/cX7hL7
You can use this :
^[^_]*_[^_]*_
You can use this regex:
^([^_]*_[^_]*)_.*$
Demo

Regex to match _ or end of string

I'm working with MATLAB's regexp() and I'm trying to find a regular expression that would match only file names containing Cyto but not CytoBlue. My problem is that the file names look either like Texture_Variance_Cyto_4_90 and Texture_Variance_CytoBlue_4_90, or HIST_9BinsHistBin7_Cyto and HIST_9BinsHistBin7_CytoBlue.
If I just try to match Cyto, I also capture all the files containing CytoBlue. If I try to match Cyto_, I miss the file names where Cyto is the last element. I guess I'd need something that says "match either _ or the end of the string". I tried Cyto[_\Z] but that does not work, I again miss all the elements that ends with Cyto.
Cyto(?=$|_)
This matches Cyto, followed by ("(?=...)") the end of the string ("$") or _. Note that the underscore is not returned as part of the match.
use this regex: Cyto(_.*?(?= ))?\b
MATLAB supports positive and negative lookaheads, so this this should work:
Cytp(?!Blue)
...meaning "Cyto" not followed by "Blue".