kettle wildcard subdirectory regex - regex

I'm trying to process a file in a Kettle transformation. The targeted file has a static name, let's say TARGETED.LOG and it's in a subdirectory which contains a date component (variable) in his name. So, the whole path name will be something like:
c:\username\kettleworkspace\report_[DDMMYYYY]\TARGETED.LOG.
Any advice?

Use the Get File Names step with the include subfolders option, and drop the resulting list of files in your Text File Input with the Accept filenames from previous step option.
Of course between these two step you would probably want to add some Filter step.

Related

I wonder if I can perform data-pipeline by directory of a specific name with DataFusion

I'm using google-cloud-platform data fusion.
Assuming that the bucket's path is as follows:
test_buk/...
In the test_buk bucket there are four files:
20190901, 20190902
20191001, 20191002
Let's say there is a directory inside test_buk called dir.
I have a prefix-based bundle based on 201909(e.g, 20190901, 20190902)
also, I have a prefix-based bundle based on 201910(e.g, 20191001, 20191002)
I'd like to complete the data-pipeline for 201909 and 201910 bundles.
Here's what I've tried:
with regex path filter
gs://test_buk/dir//2019 to run the data pipeline.
If regex path filter is inserted, the Input value is not read, and likewise there is no Output value.
When I want to create a data pipeline with a specific directory in a bundle, how do I handle it in a datafusion?
If using directly the raw path (gs://test_buk/dir/), you might be getting an error when escaping special characters in the regex. That might be the reason for which you do not get any input file into the pipeline that matches your filter.
I suggest instead that you use ".*" to math the initial part (given that you are also specifying the path, no additional files in other folders will match the filter).
Therefore, I would use the following expressions depending on the group of files you want to use (feel free to change the extension of the files):
path = gs://test_buk/dir/
regex path filter = .*201909.*\.csv or .*201910.*\.csv
If you would like to know more about the regex used, you can take a look at (1)

Pick up a particular file from a directory using regex in Talend

My directory contains files named as WM_PersonFile_22022018 , WM_PersonFile_23022018, WM_PersonFile_24022018 , WM_PersonFile_25022018 and these files come on a daily basis. I am using tFileList to iterate through the files
What should be my regex in my Filemask to pick up the most recent file? Should the Use Global Expressions as Filemask be unchecked?
I tried "*.txt" which is picking up all the files.
RegEx would help you to filter for the correct files.
Some other logic would get you the newest file. If you use tFileList, you might be able to sort after date and only take the first result.
Alternatively, if you also want to check the date in the filename is correct, you might need to add a little logic with a tMap, tAssert, tJava or tJavaRow.

Reading dates from filenames

I want to extract dates from the suffixes of files in a particular folder. The contents of such a folder look something like:
Packed_Folder_1_2016.06.10
Packed_Folder_1_2016.08.06
Packed_Folder_1_2015.09.03
packed_Folder_1_2015.01.08
... (so on and so forth, always in the same path just different suffixes)
There is no pattern to the dates. I need to make a VS form (2013) to read the name of the files and store the date differences.
Notice how the filenames always follow a pattern? It's always Packed_Folder_1_####.##.##, where the last part is a date.
So what you want to do is list the file names in the folder, and try to find a file that matches the pattern. You could use a regular expression to match the filename (it would be something like R"(Packed_Folder_1_\d{4}\.\d{2}\.\d{2})").
You are talking about Forms, so I am assuming you are able to use Visual C++. If that is the case, you can check FileSystemWatcher Class.
You instantiated it with a given path ( file or directory ), and it will trigger events based on some changes on the target (simple change, creation, rename - you can select which one). You could then update your reference, in case its change suits your needs.

CFZip of certain file type

Is it possible to use cfzip to create a zip folder containing of a certain type. I need to do this to take out .bak files from a folder with different filetypes and insert them into a zip folder.
Thanks
Colin
Steve already pointed you to the CF documentation which has example of delete. To create a zip, the simplest way is as follows:
<cfset fileName = createUUID() />
<cfzip file="D:\#fileName#.zip" action="zip"
source="D:\myfolder"
filter="*.bak"
recurse="No" >
If you want to add the files of a sub-directory, then make recurse="yes". To filter multiple file types you can simply use comma separated file type in filter like this filter="*.jpg, *.gif, *.png"
Note: I have used dynamic file name in case you want to run this script multiple times and have different file name, or multiple users are accessing this script at the same time.

Name exported files with names list from txt-file using command prompt

I have a tons of files and I want to convert them to other format using command prompt.
After exporting each of converted files program needs to name it.
I have a txt-documnet, that contains a list of future file-names.
So, I really need to name exported files one by one using first string name, then second string name, then third and so on.
EXAMPLE I have files to import in the directory
00001.umg
00002.umg
00003.umg
00004.umg
00005.umg
00006.umg
00007.umg
(and so on)
I have a txt document looking like this
Great name.brt
Wonderful.brt
Most beautiful.brt
Beautiful File.brt
Random File.brt
Define.brt
Excellent file.brt
...
(and so on)
So, after convertion I need to have exported files named like string in txt document in proper direction. First file must take first TXT string name, second - second string.....
Thank you!