delete file in pentaho - regex

I need to delete some ".text" files but I can not delete all files with this extension. wanted to use regular expressions to search by filename. can you help me?

There are several steps you can use to achieve that. Easiest one is the Job Step Delete Files. You can specify multiple folders and RegExp for each individual folder.
RegExp ignoring numbers - hs_err_pid.*\.txt
Meaning, it will match any file that begins with hs_err_pid and ends in .txt ... having any amount of characters between hs_err_pid and .txt

Related

Regex for file name in a directory

I have two files in a directory. FileAbc_1.xml and FileAbc.xml. I want write a regex that only select FileAbc_1.xml.
My regex is : FileAbc.*.xml
It is picking up both file names but I only want FileAbc_1.xml. Any help would great favor.
This will work for you
FileAbc_[0-9]+.xml
That should just be: FileAbc_\d\.xml
(assuming there's never more than one digit after the underscore)
You can go with this for anything that will start with FileAbc and end with XML FileAbc.+\.xml.

Target file names using Regex

If I have a list of file names in an XML and want to remove all instances where the file name doesn't have a file extension, how can I do this using regular expressions? I need to do the replace in TextWrangler and have no other option unfortunately.
For example, if I have such a list in an XML as:
<name>AAA_A026C032_150522_R4RO.mov</name>
<name>BBB_A016D032_150809_R4RO.aiff</name>
<name>CCC_A026C038_151010_R4RO</name>
<name>DDGS_A006C132_150409_R4RO.mp3</name>
<name>EFFD_B026C001_150607_R4RO</name>
<name>FGHG_A026C032_141215_R4RO.cine</name>
Have can the files without the file extension be targeted using regular expressions? I would like to replace these (clear them) in the output document.
Thanks in advance,
Matt
'(?!>\w+\.[a-zA-Z0-9]+)>(\w+)'
this pattern gets the name of the files without extensions as its first capturing group. I dont know how to use TextWrangler but I assume that with filename string, you can probably figure it out?

Regex expression to match a string but exclude something at the same time

I want to try and ask this as concisely as possible please forgive me if I'm leaving something out. I want the expression to match all cases except where an exact filename string is present.
A backup software I'm using uses regular expressions and I want to setup an exclusion to skip all of a particular file extension type, except I have certain files I need to backup so I don't want them to match.
The files I want to exclude are we'll say for this example *.FLV
(?i).*\.flv
I want to include in my backups three files: abc123.flv, ghk432.flv, and fdw917.flv
This is where I'm having trouble, even just including one file from the three to be included to backup
(?i).*\.flv^(?!(abc123\.flv))&
The expression is being added to an Exclusion List for code42 CrashPlan backup, their support unfortunately cannot assist with complex RegEx expressions.
The closest thing I can supply as an example is their Example 3: Using An Exclude To Include:
.*/Documents/((?!(.*\.(doc|rtf)|.*/)$).)*$
http://support.code42.com/Administrator/3.6_And_4.0/Configuring/Using_Include_And_Exclude_Filters
However it excludes all files within directories named "Documents" and includes any files in those folders with doc or rtf file extensions. I'm trying to create an expression working with file extensions irregardless of folder location.
In my brain logically it seems like I need to write this as some kind of if then else statement but regex is not my forte.
Use an anchored negative look ahead with an alternation for the files you want to keep:
^(?i)(?!.*(abc123|ghk432|fdw917)\.flv).*\.flv
The negative lookahead asserts that the following input does not match its regex, and the pipe character means "or".
Try to put the negative lookahead at the position of the filename in the path:
^([^/]*/)*(?!(abc123|ghk432|fdw917)\.flv$)[^/]*\.flv$

Find all file names that match a pattern

I am trying to find a way to list all file names in a folder that matches this pattern :
20131106XXXXX.pdf
The prefix is the date, and the content and length of XXXX vary across files, and I only care about pdf files.
Anyone could advise a way to do this?
Try this
list.files(path="./yourdir",pattern="[[:digit:]]{8}.*\\.pdf")
You can use regex.
files <- dir(pattern="^[0-9]{8}.*\\.pdf")

Append Filename of Txt Files on a Line Inside the File Notepad++

I 'm not a coder/programmer so my knowledge about regex is limited on what I can find on Google and sites like stackoverflow.
I have a series of files, around 10k with different filenames. Now I want to put the specific filename of each file into a line within the txt file, preferably on the first or last line of the file.
So if I have a file with the filename Caste "System in Nepal.txt" I want to see "Caste System in Nepal" on either the first or second line of the txt, without the quotes.
Can anybody help me? Thanks a lot. :)
Have a go with the following:
(.+?)\.\w+
The bit in the brackets you'd then use for your file name. This assumes that there is only one dot in the filename before the extension. Otherwise, if you have filenames like Document.name.txt, you'll need a more complex regex.
I'm not sure you can do this from within Notepad++ as you'll need the list of files to begin with. If you already have this then you can use find and replace to find .txt (if they all have the .txt extension) and replace it with nothing. If you have other extensions then try a regexp like \.[^\.]$ and replace it with nothing. That should match the last fullstop and everything after it.
If you don't have the list of files in your text editor then you can get them from a Bourne compatible shell with something like find . -type f -print > ../file-you-want-the-list-in.txt run in the directory with all the files in.