Regular Expression remove specific text in file name - regex

I am using a file transferring tool that allows the use of Regular Expression to rename files as they are copied into a new folder (so I am working with Regular Expression only and not inside a code base) I have a large set of files with a specific naming convention with a version number at the end of the file name. My goal is to remove this file version number along with the underscore.
Here are some examples of the file names:
the_file_name_DS_017_EN_35.pdf
the_file_name_DS_037_SP_35.pdf
different_filename_DS_EN_5.pdf
I am looking to change them to:
the_file_name_DS_017_EN.pdf
the_file_name_DS_037_SP.pdf
different_filename_DS_EN.pdf
I am trying to remove the version number so that the file naming convention on my new server will always be the same. I am not good with regex and this is what I tried so far but to no avail:
Using _[^_]+$ it selects last underscore along with the .pdf extension.
Using \_(.*?)\. it selects the first underscore until the period.
How do I select the last underscore until the period removing that text but keeping the period? Maybe there is a better method? Thanks in advance!

If you regex motor works with positive lookaheads, you might work it like this and replace it by nothing
(_\d+)(?=\.pdf$)
Demo
Explanation :
(_\d+) will follow an underscore following by one or more digits
(?=\.pdf$) will match as a positive lookahead the .pdf extension at the end of the file name

TRY to use the regular expression here:
_[0-9]*\.
and replace it by
.

Related

matching the frame identifier of a limited file sequence with regex

I have some code that can use regex to filter and find a list of files.
I want to filter these files by their name, and select the final set of numbers in the filename.
for example, say I have this sequence of file names:
render1frame0001.png ... render1frame0200.png
Finding the last 4 digits isnt that hard. The issue is when the file name itself has numbers in it.
Some platforms have different margins for frame counts, so I need it to be able to match more or less than 4 characters, while also ignoring any numbers in the file name.
so in the example above, it would match 0001 through 0200, ignoring the 1 inside the filename.
I dont have much experience with regex and this particular problem seems a little niche, so I dont exactly know what to do here.
also, the files can possibly have other extensions, such as jpeg, so I guess it should also be able to work around different extensions.
Essentially I want to match the first occurrence of a group of numbers... but from the end of the string backwards. That behavior could possibly get around the extension and any extra numbers in the file name.
Is this possible?
Used "Lookahead and Lookbehind" to get numeric value properly.
regex = /(?<=\w+)\d+(?=\.[a-z]+)/gi (Best) or /\d+(?=\.[a-z]+)/gi or /\d+(?=\.[a-z]+$)/gi (if you pass single file name at a time).
Your Input is: "render1frame0001.png ... render1frame0200.png".
You have to match numbers before file format ('.png').
For more details about regular expression follow this: Puzzling with Regular Expression
Since each string is a filename that include the file extension, you should be able to use the below regular expression, regardless of the file type:
(\d+)\.[a-z]+$
Regular Expression Tests
Note: since the regular expression contains the $ character, each filename should be parsed individually.

How to exclude file extension from string with regex

I want to be able to get two matching groups from a regex and exclude a third.
This is an example of a string I want to match:
my-file-name-0.44.0.6-SOME-SNAPSHOT.zip
I want two matching groups, one for the file name without the version and one for the version without the file extension.
Group 1: my-file-name
Group 2: 0.44.0.6-SOME-SNAPSHOT
Excluded: .zip
the file name can be random, but the version will always have a hyphen before it, then file extension can also be random.
This is what I have come up with, but can't figure out the exclude part.
(.*?)-([0-9.]{1,4}.*)
Append \. to your regex:
(.*?)-([0-9.]{1,4}.*)\.
However you may want to modify it a little:
(.*?)-(\d.*)\.\w+
Live demo
Use this regular expression to remove file extension:
/(.*)\.[^.]+$/

Regex in bash scripting

I have 2 similar files names that need to go into different directories. I tried using the following regex.
File 1: abc_xyz_2016_12_02.out
File 2: abc_xyz_test_2016-12-02.out
Regex used:
regex_abc_xyz="abc_xyz_[0-9]{1,4}-[0-9]{1,2}-[0-9]{1,2}.out"
regex_abc_xyz_test="abc_xyz_test_[0-9]{1,4}-[0-9]{1,2}-[0-9]{1,2}.out"
regex_abc_xyz works but regex_abc_xyz_test is failing.
Using your example test strings (the first of which I assume was mistyped, using underscores between the date components instead of hyphens), I entered these together with your regular expressions into RegEx 101. Both matched the appropriate filenames.
As one user stated, you ought to escape your period, i.e. \.out, but otherwise, your regular expressions are fine.
However, if all you need is to separate two lots of files into two different directories, and each begin with a fixed string (I’m implying this given your regex patterns that start with abc_xyz and abc_xyz_test), then could you not use a wildcard expression to move the latter group first, then the remaining group second ?
So:
mv abc_xyz_test*.out /path/to/new/folder/
Then:
mv abc_xyz*.out /path/to/new/folder/

Regex for SublimeText Snippet

I've been stuck for a while on this Sublime Snippet now.
I would like to display the correct package name when creating a new class, using TM_FILEPATH and TM_FILENAME.
When printing TM_FILEPATH variable, I get something like this:
/Users/caubry/d/[...]/src/com/[...]/folder/MyClass.as
I would like to transform this output, so I could get something like:
com.[...].folder
This includes:
Removing anything before /com/[...]/folder/MyClass.as;
Removing the TM_FILENAME, with its extension; in this example MyClass.as;
And finally finding all the slashes and replacing them by dots.
So far, this is what I've got:
${1:${TM_FILEPATH/.+(?:src\/)(.+)\.\w+/\l$1/}}
and this displays:
com/[...]/folder/MyClass
I do understand how to replace splashes with dots, such as:
${1:${TM_FILEPATH/\//./g/}}
However, I'm having difficulties to add this logic to the previous one, as well as removing the TM_FILENAME at the end of the logic.
I'm really inexperienced with Regex, thanks in advance.
:]
EDIT: [...] indicates variable number of folders.
We can do this in a single replacement with some trickery. What we'll do is, we put a few different cases into our pattern and do a different replacement for each of them. The trick to accomplish this is that the replacement string must contain no literal characters, but consist entirely of "backreferences". In that case, those groups that didn't participate in the match (because they were part of a different case) will simply be written back as an empty string and not contribute to the replacement. Let's get started.
First, we want to remove everything up until the last src/ (to mimic the behaviour of your snippet - use an ungreedy quantifier if you want to remove everything until the first src/):
^.+/src/
We just want to drop this, so there's no need to capture anything - nor to write anything back.
Now we want to match subsequent folders until the last one. We'll capture the folder name, also match the trailing /, but write back the folder name and a .. But I said no literal text in the replacement string! So the . has to come from a capture as well. Here comes the assumption into play, that your file always has an extension. We can grab the period from the file name with a lookahead. We'll also use that lookahead to make sure that there's at least one more folder ahead:
^.+/src/|\G([^/]+)/(?=[^/]+/.*([.]))
And we'll replace this with $1$2. Now if the first alternative catches, groups $1 and $2 will be empty, and the leading bit is still removed. If the second alternative catches, $1 will be the folder name, and $2 will have captured a period. Sweet. The \G is an anchor that ensures that all matches are adjacent to one another.
Finally, we'll match the last folder and everything that follows it, and only write back the folder name:
^.+/src/|\G([^/]+)/(?=[^/]+/.*([.]))|\G([^/]+)/[^/]+$
And now we'll replace this with $1$2$3 for the final solution. Demo.
A conceptually similar variant would be:
^.+/src/|\G([^/]+)/(?:(?=[^/]+/.*([.]))|[^/]+$)
replaced with $1$2. I've really only factored out the beginning of the second and third alternative. Demo.
Finally, if Sublime is using Boost's extended format string syntax, it is actually possible to get characters into the replacement conditionally (without magically conjuring them from the file extension):
^.+/src/|\G(/)?([^/]+)|\G/[^/]+$
Now we have the first alternative for everything up to src (which is to be removed), the third alternative for the last slash and file name (which is to be removed), and the middle alternative for all folders you want to keep. This time I put the slash to be replaced optionally at the beginning. With a conditional replacement we can write a . there if and only if that slash was matched:
(?1.:)$2
Unfortunately, I can't test this right now and I don't know an online tester that uses Boost's regex engine. But this should do the trick just fine.

How to extract file location using Regular Expressions(VB.NET)

I am facing a problem whereby I am given a string that contains a path to a file and the file's name and I only want to extract the path (without the file's name)
For example, I will receive something like
C:\Users\OopsD\Projects\test.acdbd
and from that string I want to extract only
C:\Users\OopsD\Projects
I was trying to create a RegEx to match a backslash followed by a word, followed by a dot followed by another word - this is to match the
\test.acdbd
part and replace it with empty string so that the final result is
C:\Users\OopsD\Projects
Can anyone, familiar with RegEx, help me on this one? Also, I will be using regular expressions quite a lot in the future. Is there a (free) program I can download to create regular expressions?
Are you really sure you need to be using Regex for such as simple task? How about this:
Dim file As New IO.FileInfo(" C:\Users\OopsD\Projects\test.acdbd")
MsgBox(file.Directory.FullName)
Regarding the free program on Regex, I would definitely recommend http://www.gskinner.com/RegExr/ - using it all the time. But you always have to consider alternatives, before going the Regex way.
The regex that you are looking for is as below:
[^/]+$
where,
^ (caret):Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (i.e. at the start of a line in a file) as well.
$ (dollar):Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the dollar match before line breaks (i.e. at the end of a line in a file) as well. Also matches before the very last line break if the string ends with a line break.
+ (plus):Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once.
More reference can be found out at this link.
Many Regex softwares and tools are out there. Some of them are:
www.gskinner.com/RegExr/
www.txt2re.com
Rubular- It is not just for Ruby.