Regex for matching file suffix where filename contains multiple periods - regex

I am trying to match a series of suffixes, but, our file names often contain multiple periods for example "Document A V0.1.1.docx"
I have tried a number of solutions, but none seem to work for me and I'm not a great regex user... this is where I got to, but it fails to ignore multiple periods in the filename.
The regex will be used in the DocFetcher program to exclude large numbers of file from indexing where the suffixed filenames contents is of no use to the person using DocFetcher.
Suggestions?
(?i).*\.(jar|doc|docx|etc...)

Related

matching the frame identifier of a limited file sequence with regex

I have some code that can use regex to filter and find a list of files.
I want to filter these files by their name, and select the final set of numbers in the filename.
for example, say I have this sequence of file names:
render1frame0001.png ... render1frame0200.png
Finding the last 4 digits isnt that hard. The issue is when the file name itself has numbers in it.
Some platforms have different margins for frame counts, so I need it to be able to match more or less than 4 characters, while also ignoring any numbers in the file name.
so in the example above, it would match 0001 through 0200, ignoring the 1 inside the filename.
I dont have much experience with regex and this particular problem seems a little niche, so I dont exactly know what to do here.
also, the files can possibly have other extensions, such as jpeg, so I guess it should also be able to work around different extensions.
Essentially I want to match the first occurrence of a group of numbers... but from the end of the string backwards. That behavior could possibly get around the extension and any extra numbers in the file name.
Is this possible?
Used "Lookahead and Lookbehind" to get numeric value properly.
regex = /(?<=\w+)\d+(?=\.[a-z]+)/gi (Best) or /\d+(?=\.[a-z]+)/gi or /\d+(?=\.[a-z]+$)/gi (if you pass single file name at a time).
Your Input is: "render1frame0001.png ... render1frame0200.png".
You have to match numbers before file format ('.png').
For more details about regular expression follow this: Puzzling with Regular Expression
Since each string is a filename that include the file extension, you should be able to use the below regular expression, regardless of the file type:
(\d+)\.[a-z]+$
Regular Expression Tests
Note: since the regular expression contains the $ character, each filename should be parsed individually.

Regular Expression remove specific text in file name

I am using a file transferring tool that allows the use of Regular Expression to rename files as they are copied into a new folder (so I am working with Regular Expression only and not inside a code base) I have a large set of files with a specific naming convention with a version number at the end of the file name. My goal is to remove this file version number along with the underscore.
Here are some examples of the file names:
the_file_name_DS_017_EN_35.pdf
the_file_name_DS_037_SP_35.pdf
different_filename_DS_EN_5.pdf
I am looking to change them to:
the_file_name_DS_017_EN.pdf
the_file_name_DS_037_SP.pdf
different_filename_DS_EN.pdf
I am trying to remove the version number so that the file naming convention on my new server will always be the same. I am not good with regex and this is what I tried so far but to no avail:
Using _[^_]+$ it selects last underscore along with the .pdf extension.
Using \_(.*?)\. it selects the first underscore until the period.
How do I select the last underscore until the period removing that text but keeping the period? Maybe there is a better method? Thanks in advance!
If you regex motor works with positive lookaheads, you might work it like this and replace it by nothing
(_\d+)(?=\.pdf$)
Demo
Explanation :
(_\d+) will follow an underscore following by one or more digits
(?=\.pdf$) will match as a positive lookahead the .pdf extension at the end of the file name
TRY to use the regular expression here:
_[0-9]*\.
and replace it by
.

Regex in bash scripting

I have 2 similar files names that need to go into different directories. I tried using the following regex.
File 1: abc_xyz_2016_12_02.out
File 2: abc_xyz_test_2016-12-02.out
Regex used:
regex_abc_xyz="abc_xyz_[0-9]{1,4}-[0-9]{1,2}-[0-9]{1,2}.out"
regex_abc_xyz_test="abc_xyz_test_[0-9]{1,4}-[0-9]{1,2}-[0-9]{1,2}.out"
regex_abc_xyz works but regex_abc_xyz_test is failing.
Using your example test strings (the first of which I assume was mistyped, using underscores between the date components instead of hyphens), I entered these together with your regular expressions into RegEx 101. Both matched the appropriate filenames.
As one user stated, you ought to escape your period, i.e. \.out, but otherwise, your regular expressions are fine.
However, if all you need is to separate two lots of files into two different directories, and each begin with a fixed string (I’m implying this given your regex patterns that start with abc_xyz and abc_xyz_test), then could you not use a wildcard expression to move the latter group first, then the remaining group second ?
So:
mv abc_xyz_test*.out /path/to/new/folder/
Then:
mv abc_xyz*.out /path/to/new/folder/

Extracting substring in linux using expr and regex

So I have just begun learning regular expressions. I have to extract a substring within a large string.
My string is basically one huge line containing a lot of stuff. I have identified the pattern based on which I need to extract. I need the number in this line A lot of stuff<li>65,435 views</li>a lot of stuff This number is just for example.
This entire string is in fact one big line and my file views.txt contains a lot of such lines.
So I tried this,
while read p
do
y=`expr "$p": ".*<li>\(.*\) views "`
echo $y
done < views.txt
I wished to iterate over all such lines within this views.txt file and print out the numbers.
And I get a syntax error. I really have no idea what is going wrong here. I believe that I have correctly flanked the number by <li> and views including the spaces.
My (limited) interpretation of the above regex leads me to believe that it would output the number.
Any help is appreciated.
The syntax error is because the ":" is not separated from "$p" by a space (or tab). With that fixed, the regex has a trailing blank which will prevent it matching. Fixing those two problems, your sample script works as intended.

How to filter images and exclude filenames that include hyphen and image size in Filezilla?

I have a folder in my server that contains several images resized by WordPress. I want to upload only the images that are not resized. Here's an example:
File Names :
2CVictoria_and_Albert_Museum_London-708x400.jpg
2CVictoria_and_Albert_Museum_London-336x190.jpg
2CVictoria_and_Albert_Museum_London.jpg
I want only files that end with .jpg without the -708x400 and 336x190
2CVictoria_and_Albert_Museum_London.jpg
How to create a filter in Filezilla that will ignore all the images that have been resized and show only the original image? Please if you can put all the filter rules.
Based on your criteria, this should work.
^[[:alnum:]_]+\.jpg$
and after the edit requested:
^[[:alnum:]_-]+[^-0-9x]+\.jpg
and after the second edit requested:
^[[:alnum:]_-]+[^x]{5}\.jpg
I don't have Filezilla installed at the moment, but Filezilla uses POSIX regular expressions. This means we can't use lookaheads, but that shouldn't be a problem.
The expression uses a character class [[:alnum:]_] with a quantifier, allowing it to match any number of letters, digits and underscores. Then we match a period, jpg and the end of the string, so any suffix using a dash will fail.
In the second expression, before the .jpg, we make sure to match any number of characters that are not dashes, digits or xs.
In the third expression, the [^x]{5} is a bit of a hack: it ensures there is no "x" character in the last five characters, therefore excluding files ending in 300x225.jpg for instance. This would also exclude some false-negatives, such as myphotox.jpg as well as short files such as abc.jpg
One way to get around the short file problem is, which accepts all four-letter filenames that don't include xs. Still a hack.
^[[:alnum:]_-]+[^x]{5}\.jpg|^[^x]{1,4}\.jpg
If there are exceptions, let me know so we can tweak the regex.