Regex - Filename may contains parantheses group - regex

I want to match the main name and the file count without parantheses.
For example:
8680733046449.png
8680733046449 (3).png
these files has the same name. I want to seperate second file's name (8680733046449) and the file count (3) (without parantheses).
If file name is not containing any parantheses just match the name.
My regex is:
/^(.*)\s?\((\d+)\)\.png$/
This regex matching files that has parantheses but the without.
Test here : http://www.regexr.com/38pup

You need to use a non-greedy quantifier for the name part. Otherwise, it will match the space and parentheses. You also need to make the part in parentheses optional.
/^(.*?)\s?(\((\d+)\))?\.png$/
^

If I understood to you well, I think this would work for you:
^(.+)(\s?\((\d+)\))?\.png$
Notice that I changed the * after the first dot to avoid empty filenames.
Kind regards.

Related

With a regular expression, how can I get the file's name?

I've got this file here:
\\prdflsrvcl2.unicreprd.local\Integracao-PRD\GestaoTangiveis\APD\FW_A_enviar_correio_electronico_Imagem_(384).jpg-1-52FFN8.msg
With the regular expression, I want this part: FW_A_enviar_correio_electronico_Imagem_(384).jpg-1-52FFN8.msg
I'm using the following regex:
[[:alnum:]-_]+\.[a-zA-Z]*$
However, I get .jpg-1-52FFN8.msg instead of what I want.
However if the file name was without the dot before the jpg, I would get FW_A_enviar_correio_electronico_Imagem_(384)jpg-1-52FFN8.msg instead.
Basically, I want the filename with the extension.
Thanks.
Just try with following regex:
[^\\]+$
It will match all characters from the end to the first occurence of \.
Example: https://regex101.com/r/eJ8zG2/1
You can use the following regex and refer the first capturing group which is (.*):
/(?!.*\\)(.*)/g
Match all things which is not followed by \
Example

Regular expression get filename without extention from full filepath

How can I extract the filename without extention from the following file path:
D:\Projects\Extract\downtown - second.pdf
The following regular expression gives me the filename with extention: [^\\]*$
e.g. downtown - second.pdf
The following regular expression gives me the filename without extention: (.+)(?=(\.))
e.g. D:\Projects\Extract\downtown - second
I'm struggling to combine the two into one regular expression to give me the results I want: downtown - second
I suspect that your 2nd regex would not give you the output you have shown. It will give you the complete string till the first period (.).
To get just the file name without extension, you can use this regex: -
[^\\]*(?=[.][a-zA-Z]+$)
I have just replaced (.+) in your 2nd regex with the [^\\]* from your first regex, and added pattern to match pdf till the end.
Now this pattern will match 0 or more repetition of any character but backslash(\), followed by a . and then 1 or more repetition of alphabets making up extension.
I made up this one, which allows to capture most of the possibilities:
/[^\\\/]+(?=\.[\w]+$)|[^\\\/]+$/
/path/to/file
/path/to/file.txt
/path.with/dots.to/file.txt
/path/to/file.with.dots.txt
file.txt
C:\path\to\file.txt
and so on...
I captured file from /path/to/file.pdf by using following regex:
[^/]*(?=\.[^.]+($|\?))
Hope this helps you
I had to use an extra backslash before the first ']' to make this work
[^\\\]*(?=[.][a-zA-Z]+$)
I use this pattern
[^\/]+[.+\.].*$ for / path separator
[^\\]+[.+\.].*$ for \ path separator
hich matches the filename at the end of the string without worrying about characters. There is one exception that if the path for some reason has a folder with a period in it this will get upset. Linux hidden directories that are preceded with a . like .rvm are unaffected.
Hope this helps.
http://rubular.com/r/LNrI4inMU1

Regex for extracting filename from path

I need to extract just the filename (no file extension) from the following path....
\\my-local-server\path\to\this_file may_contain-any&character.pdf
I've tried several things, most based off of something like http://regexr.com?302m5 but can't quite get there
^\\(.+\\)*(.+)\.(.+)$
This regex has been tested on these two examples:
\var\www\www.example.com\index.php
\index.php
First block "(.+\)*" matches directory path.
Second block "(.+)" matches file name without extension.
Third block "(.+)$" matches extension.
This will get the filename but will also get the dot. You might want to truncate the last digit from it in your code.
[\w-]+\.
Update
#Geoman if you have spaces in file name then use the modified pattern below
[ \w-]+\. (space added in brackets)
Demo
This is just a slight variation on #hmd's so you don't have to truncate the .
[ \w-]+?(?=\.)
Demo
Really, thanks goes to #hmd. I've only slightly improved on it.
Try this:
[^\\]+(?=\.pdf$)
It matches everything except back-slash followed by .pdf at the end of the string.
You can also (and maybe it's even better) take the part you want into the capturing group like that:
([^\\]+)\.pdf$
But how you refer to this group (the part in parenthesis) depends on the language or regexp flavor you're using. In most cases it'll be smth like $1, or \1, or the library will provide some method for getting capturing group by its number after regexp match.
I use #"[^\\]+$"
That gives the filename including the extension.
I'm using this regex to replace the filename of the file with index. It matches a contiguous string of characters that doesn't contain a slash and is followed by a . and a string of word characters at the end of the string. It will retrieve the filename including spaces and dots but will ignore the full file extension.
const regex = /[^\\/]+?(?=\.\w+$)/
console.log('/path/to/file.png'.match(regex))
console.log('/path/to/video.webm'.match(regex))
console.log('/path/to/weird.file.gif'.match(regex))
console.log('/path with/spaces/and file.with.spaces'.match(regex))
If anyone is looking for a windows absolute path (and relative path) javascript regular expression in javascript for files:
var path = "c:\\my-long\\path_directory\\file.html";
((/(\w?\:?\\?[\w\-_\\]*\\+)([\w-_]+)(\.[\w-_]+)/gi).exec(path);
Output is:
[
"c:\my-long\path_directory\file.html",
"c:\my-long\path_directory\",
"file",
".html"
]
Here's a slight modification to Angelo's excellent answer that allows for spaces in the path, filename and extension as well as missing parts:
function parsePath (path) {
var parts = (/(\w?\:?\\?[\w\-_ \\]*\\+)?([\w-_ ]+)?(\.[\w-_ ]+)?/gi).exec(path);
return {
path: parts[0] || "",
folder: parts[1] || "",
name: parts[2] || "",
extension: parts[3] || "",
};
}
If you want to return the file name with its extension, Regex should be as below:
[A-Za-z0-9_\-\.]+\.[A-Za-z0-9]+$
works for
path/to/your/filename.some
path/to/your/filename.some.other
path\to\your\filename.some
path\to\your\filename.some.other
http://path/to/your/filename.some
http://path/to/your/filename.some.other
And so on
Which returns full file name with extension(eg: filename.some or filename.some.other)
If you want to return file name without the last extension Regex should be as below:
[A-Za-z0-9_\-\.]+(?=\.[A-Za-z0-9]+$)
Which returns full file name without last extension(eg: "filename" for "filename.some" and "filename.some" for "filename.some.other")
Click the Explain button on these links shown TEST to see how they work.
This is specific to the pdf extension.
TEST ^.+\\([^.]+)\.pdf$
This is specific to any extension, not just pdf.
TEST ^.+\\([^.]+)\.[^\.]+$
([^.]+)
This is the $1 capture group to extract the filename without the extension.
\\my-local-server\path\to\this_file may_contain-any&character.pdf
will return
this_file may_contain-any&character
TEST ^(.*[\\\/])?(.*?)(\.[^.]*?|)$
example:
/^(.*[\\\/])?(.*?)(\.[^.]*?|)$/.exec("C:\\folder1\\folder2\\foo.ext1.ext")
result:
0: "C:\folder1\folder2\foo.ext1.ext"
1: "C:\folder1\folder2\"
2: "foo.ext1"
3: ".ext"
the $1 capture group is the folder
the $2 capture group is the name without extension
the $3 capture group is the extension (only the last)
works for:
C:\folder1\folder2\foo.ext
C:\folder1\folder2\foo.ext1.ext
C:\folder1\folder2\name-without extension
only name
name.ext
C:\folder1\folder2\foo.ext
/folder1/folder2/foo.ext
C:\folder1\folder2\foo
C:\folder1\folder2\
C:\special&chars\folder2\f [oo].ext1.e-x-t
Answer with:
File name and directory space support
Named capture group
Gets unlimited file extensions (captures file.tar.gz, not just file.tar)
*NIX and Win support
^.+(\\|\/)(?<file_name>([^\\\/\n]+)(\.)?[^\n\.]+)$
Explanation:
^.+(\\|\/) Gets anything up to the final / or \ in a file path
(?<file_name> Begin named capture group
([^\\\/\n]+) get anything except for a newline or new file
(\.)?[^\n\.]+ Not really needed but it works well for issues with odd characters in file names
)$ End named capture group and end line
Note that if you're putting this in a string and you need to escape backslashes (such as with C) you'll be using this string:
"^.+(\\\\|\/)(?<file_name>([^\\\/\n]+)(\.)?[^\n\.]+)$"
Here is an alternative that works on windows/unix:
"^(([A-Z]:)?[\.]?[\\{1,2}/]?.*[\\{1,2}/])*(.+)\.(.+)"
First block: path
Second block: dummy
Third block: file name
Fourth block: extension
Tested on:
".\var\www\www.example.com\index.php"
"\var\www\www.example.com\index.php"
"/var/www/www.example.com/index.php"
"./var/www/www.example.com/index.php"
"C:/var/www/www.example.com/index.php"
"D:/var/www/www.example.com/index.php"
"D:\\var\\www\\www.example.com\\index.php"
"\index.php"
"./index.php"
This regular expression extract the file extension, if group 3 isn't null it's the extension.
.*\\(.*\.(.+)|.*$)
also one more for file in dir and root
^(.*\\)?(.*)(\..*)$
for file in dir
Full match 0-17 `\path\to\file.ext`
Group 1. 0-9 `\path\to\`
Group 2. 9-13 `file`
Group 3. 13-17 `.ext`
for file in root
Full match 0-8 `file.ext`
Group 2. 0-4 `file`
Group 3. 4-8 `.ext`
For most of the cases ( that is some win , unx path , separator , bare file name , dot , file extension ) the following one is enough:
// grap the dir part (1), the dir sep(2) , the bare file name (3)
path.replaceAll("""^(.*)[\\|\/](.*)([.]{1}.*)""","$3")
Direct approach:
To answer your question as it's written, this will provide the most exact match:
^\\\\my-local-server\\path\\to\\(.+)\.pdf$
General approach:
This regex is short and simple, matches any filename in any folder (with or without extension) on both windows and *NIX:
.*[\\/]([^.]+)
If a file has multiple dots in its name, the above regex will capture the filename up to the first dot. This can easily be modified to match until the last dot if you know that you will not have files without extensions or that you will not have a path with dots in it.
If you know that the folder will only contain .pdf files or you are only interested in .pdf files and also know that the extension will never be misspelled, I would use this regex:
.*[\\/](.+)\.pdf$
Explanation:
. matches anything except line terminators.
* repeats the previous match from zero to as many times as possible.
[\\/] matches a the last backslash or forward slash (previous ones are consumed by .*). It is possible to omit either the backslash or the forward slash if you know that only one type of environment will be used.
If you want to capture the path, surround .* or .*[\\/] in parenthesis.
Parenthesis will capture what is matched inside them.
[^.] matches anything that is not a literal dot.
+ repeats the previous match one or more times, as many as possible.
\. matches a literal dot.
pdf matches the string pdf.
$ asserts the end of the string.
If you want to match files with zero, one or multiple dots in their names placed in a variable path which also may contain dots, it will start to get ugly. I have not provided an answer for this scenario as I think it is unlikely.
Edit: To also capture filenames without a path, replace the first part with (?:.*[\\/])?, which is an optional non-capturing group.
Does this work...
.*\/(.+)$
Posting here so I can get feedback
Here a solution to extract the file name without the dot of the extension.
I begin with the answer from #Hammad Khan and add the dot in the search character. So, dots can be part of the file name:
[ \w-.]+\.
Then use the regex look ahead(?= ) for a dot, so it will stop the search at the last dot (the dot before the extension), and the dot will not appears in the result:
[ \w-.]+(?=[.])
reorder, it's not necessary but look better:
[\w-. ]+(?=[.])
try this
[^\\]+$
you can also add extension for specificity
[^\\]+pdf$

How to match a string that does not end in a certain substring?

how can I write regular expression that dose not contain some string at the end.
in my project,all classes that their names dont end with some string such as "controller" and "map" should inherit from a base class. how can I do this using regular expression ?
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Do a search for all filenames matching this:
(?<!controller|map|anythingelse)$
(Remove the |anythingelse if no other keywords, or append other keywords similarly.)
If you can't use negative lookbehinds (the (?<!..) bit), do a search for filenames that do not match this:
(?:controller|map)$
And if that still doesn't work (might not in some IDEs), remove the ?: part and it probably will - that just makes it a non-capturing group, but the difference here is fairly insignificant.
If you're using something where the full string must match, then you can just prefix either of the above with ^.* to do that.
Update:
In response to this:
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Not quite sure what you're attempting with the public/class stuff there, so try this:
public.*class.*(?<!controller|map)$`
The . is a regex char that means "anything except newline", and the * means zero or more times.
If this isn't what you're after, edit the question with more details.
Depending on your regex implementation, you might be able to use a lookbehind for this task. This would look like
(?<!SomeText)$
This matches any lines NOT having "SomeText" at their end. If you cannot use that, the expression
^(?!.*SomeText$).*$
matches any non-empty lines not ending with "SomeText" as well.
You could write a regex that contains two groups, one consists of one or more characters before controller or map, the other contains controller or map and is optional.
^(.+)(controller|map)?$
With that you may match your string and if there is a group() method in the regex API you use, if group(2) is empty, the string does not contain controller or map.
Check if the name does not match [a-zA-Z]*controller or [a-zA-Z]*map.
finally I did it in this way
public.*class.*[^(controller|map|spec)]$
it worked

%:s/\([0-9]*\)_\(*\)/\2 will not rename files

can someone please edit %:s/\([0-9]*\)_\(*\)/\2 so that i can rename files. for example, if file name is 5555_word_word.jpg, then I want the file name to be word_word.jpg. i feel like I am so close!
You may want to simplify and have it just delete leading numbers and the underscore:
s/^[0-9]+_//
Try this:
:%s/\([0-9]*\)_\(.*\)/\2
The . will match any character (part of the second grouping) and the * will greedily match any amount of them. Your original regex was missing that directive. This will also rename files of the form _word_word.txt to word_word.txt. If you want to require digits to match (probably a good idea), use:
:%s/\([0-9]\+\)_\(.*\)/\2
The \+ directive means to match 1 or more instances.
Your version is fine but you forgot a period and you should probably anchor it to the beginning of a line or to a word boundary using either ^ or \<.
:%s/^\([0-9]*\)_\(.*\)/\2/
You can use \v to clean up some of those slashes.
:%s/\v^([0-9]*)_(.*)/\2/
You can use \ze to avoid capture groups.
:%s/^[0-9]*_\ze.*//
But the trailing .* is superfluous, because it matches anything. So use Seth's version, it's the simplest.