With a regular expression, how can I get the file's name? - regex

I've got this file here:
\\prdflsrvcl2.unicreprd.local\Integracao-PRD\GestaoTangiveis\APD\FW_A_enviar_correio_electronico_Imagem_(384).jpg-1-52FFN8.msg
With the regular expression, I want this part: FW_A_enviar_correio_electronico_Imagem_(384).jpg-1-52FFN8.msg
I'm using the following regex:
[[:alnum:]-_]+\.[a-zA-Z]*$
However, I get .jpg-1-52FFN8.msg instead of what I want.
However if the file name was without the dot before the jpg, I would get FW_A_enviar_correio_electronico_Imagem_(384)jpg-1-52FFN8.msg instead.
Basically, I want the filename with the extension.
Thanks.

Just try with following regex:
[^\\]+$
It will match all characters from the end to the first occurence of \.
Example: https://regex101.com/r/eJ8zG2/1

You can use the following regex and refer the first capturing group which is (.*):
/(?!.*\\)(.*)/g
Match all things which is not followed by \
Example

Related

regular expression replace removes first and last character when using $1

I have string like this:
&breakUp=Mumbai;city,Puma;brand&
where Mumbai;city and Puma;brand are filters(let say) separated by comma(,). I have to add more filters like Delhi;State.
I am using following regular expression to find the above string:
&breakUp=.([\w;,]*).&
and following regular expression to replace it:
&breakUp=$1,Delhi;State&
It is finding the string correctly but while replacing it is removing the first and last character and giving the following result:
&breakUp=umbai;city,Puma;bran,Delhi;State&
How to resolve this?
Also, If I have no filters I don't want that first comma. Like
&breakUp=&
should become
&breakUp=Delhi;State&
How to do it?
My guess is that your expression is just fine, there are two extra . in there, that we would remove those:
&breakUp=([\w;,]*)&
In this demo, the expression is explained, if you might be interested.
To bypass &breakUp=&, we can likely apply this expression:
&breakUp=([^&]+)&
Demo
Your problem seems to be the leading and trailing period, they are matched to any character.
Try using this regex:
&breakUp=([\w;,]*)&

Find first point with regex

I want a regex which return me only characters before first point.
Ex :
T420_02.DOMAIN.LOCAL
I want only T420_02
Please help me.
You can use the following regex: ^(.*?)(?=\.)
The captured group contains what you need (T420_02 in your example).
This simple expression should do what you need, assuming you want to match it at the beginning of the string:
^(.+?)\.
The capture group contains the string before (but not including) the ..
Here's a fiddle: http://www.rexfiddle.net/s8l0bn3
Use regex pattern ^[^.]+(?=[.])

RegEx: capture entire group content

I am writing a parser for some Oracle commands, like
LOAD DATA
INFILE /DD/DATEN
TRUNCATE
PRESERVE BLANKS
INTO TABLE aaa.bbb
( some parameters... )
I already created a regex to match the entire command. I am now looking for a way to capture the name of the input file ("/DD/DATEN" for instance here).
My problem is that using the following regex will only return the last character of the first group ("N").
^\s*LOAD DATA\s*INFILE\s*(\w|\\|/)+\s*$
Debuggex Demo
Any ideas?
Many thanks in advance
EDIT: following #HamZa 's question, here would be the entire regex to parse Oracle LOAD DATA INFILE command (simplified though):
^\s*LOAD DATA\s*INFILE\s*((?:\w|\\|/)+)\s*((?:TRUNCATE|PRESERVE BLANKS)\s*){0,2}\s*INTO TABLE\s*((?:\w|\.)+)\s*\(\s*((\w+)\s*POSITION\s*\(\s*\d+\s*\:\s*\d+\s*\)\s*((DATE\s*\(\s*(\d+)\s*\)\s*\"YYYY-MM-DD\")|(INTEGER EXTERNAL)|(CHAR\s*\(\s*(\d+)\s*\)))\s*\,{0,1}\s*)+\)\s*$
Debuggex Demo
Let's point out the wrongdoer in your regex (\w|\\|/)+. What happens here ?
You're matching either a word character or a back/forwardslash and putting it in group 1 (\w|\\|/) after that you're telling the regex engine to do this one or more times +. What you actually want is to match those characters several times before grouping them. So you might use a non-matching group (?:) : ((?:\w|\\|/)+).
You might notice that you could just use a character class after all ([\w\\/]+). Hence, your regex could look like
^\s*LOAD DATA\s*INFILE\s*([\w\\/]+)\s*$
On a side note: that end anchor $ will cause your regex to fail if you're not using multiline mode. Or is it that you intentionally didn't post the full regex :) ?
Not tested but...
^\s*LOAD DATA\s*INFILE\s*(\S+)\s*$

Regular expression get filename without extention from full filepath

How can I extract the filename without extention from the following file path:
D:\Projects\Extract\downtown - second.pdf
The following regular expression gives me the filename with extention: [^\\]*$
e.g. downtown - second.pdf
The following regular expression gives me the filename without extention: (.+)(?=(\.))
e.g. D:\Projects\Extract\downtown - second
I'm struggling to combine the two into one regular expression to give me the results I want: downtown - second
I suspect that your 2nd regex would not give you the output you have shown. It will give you the complete string till the first period (.).
To get just the file name without extension, you can use this regex: -
[^\\]*(?=[.][a-zA-Z]+$)
I have just replaced (.+) in your 2nd regex with the [^\\]* from your first regex, and added pattern to match pdf till the end.
Now this pattern will match 0 or more repetition of any character but backslash(\), followed by a . and then 1 or more repetition of alphabets making up extension.
I made up this one, which allows to capture most of the possibilities:
/[^\\\/]+(?=\.[\w]+$)|[^\\\/]+$/
/path/to/file
/path/to/file.txt
/path.with/dots.to/file.txt
/path/to/file.with.dots.txt
file.txt
C:\path\to\file.txt
and so on...
I captured file from /path/to/file.pdf by using following regex:
[^/]*(?=\.[^.]+($|\?))
Hope this helps you
I had to use an extra backslash before the first ']' to make this work
[^\\\]*(?=[.][a-zA-Z]+$)
I use this pattern
[^\/]+[.+\.].*$ for / path separator
[^\\]+[.+\.].*$ for \ path separator
hich matches the filename at the end of the string without worrying about characters. There is one exception that if the path for some reason has a folder with a period in it this will get upset. Linux hidden directories that are preceded with a . like .rvm are unaffected.
Hope this helps.
http://rubular.com/r/LNrI4inMU1

Regex to match _ or end of string

I'm working with MATLAB's regexp() and I'm trying to find a regular expression that would match only file names containing Cyto but not CytoBlue. My problem is that the file names look either like Texture_Variance_Cyto_4_90 and Texture_Variance_CytoBlue_4_90, or HIST_9BinsHistBin7_Cyto and HIST_9BinsHistBin7_CytoBlue.
If I just try to match Cyto, I also capture all the files containing CytoBlue. If I try to match Cyto_, I miss the file names where Cyto is the last element. I guess I'd need something that says "match either _ or the end of the string". I tried Cyto[_\Z] but that does not work, I again miss all the elements that ends with Cyto.
Cyto(?=$|_)
This matches Cyto, followed by ("(?=...)") the end of the string ("$") or _. Note that the underscore is not returned as part of the match.
use this regex: Cyto(_.*?(?= ))?\b
MATLAB supports positive and negative lookaheads, so this this should work:
Cytp(?!Blue)
...meaning "Cyto" not followed by "Blue".