how to use value of #[server.dateTime.weekOfYear] in mule-app.properties or configuration xml - regex

I am downloading files from FTP. I am able to download files with defined patterns or defined name and pass then process these files in Java.
Problem I am facing is that I need to download a new file every week. The file name is like "constant-prefix-2013-W51.zip". My current XML is like this;
<ftp:inbound-endpoint
host="${ftp.host}"
port="${ftp.port}"
path="${ftp.pathInbound}"
user="${ftp.user}"
password="${ftp.password}"
responseTimeout="10000"
doc:name="KBB_FTP" >
<file:filename-wildcard-filter pattern="MyFile-2013-W51.zip"/>
</ftp:inbound-endpoint>
Flow Reference: Mule: How to pass File from FTP to Java class in Mule ESB?
This code downloads the requested file successfully. But I need to add the year and week value dynamically in file pattern.
I have tried following patterns but no success;
1. pattern="MyFile-2013-W#[server.dateTime.weekOfYear].zip"
2. pattern="MyFile-2013-W${server.dateTime.weekOfYear}.zip"
I know second pattern is totally wrong as it is not a property which is defined in .properties file. I also added a property in mule-app.properties like this
calendar.weekOfYear=#[server.dateTime.weekOfYear]
and used following pattern;
3. pattern="MyFile-2013-W${calendar.weekOfYear}.zip"
None of this way is working, I want to add year value like 2013 and week value like 51 dynamically which is not happening in any case. Value which appends to fileName is only above string patterns, not any digit..

The file:filename-wildcard-filter does not support MEL expressions. Use an expression-filter instead, like this:
<message-filter throwOnUnaccepted="false">
<expression-filter
expression="#[message.outboundProperties.originalFilename == 'MyFile-2013-W'+server.dateTime.weekOfYear+'.zip']" />
</message-filter>

Use an expression filter #[server.dateTime.getWeekOfYear()].zip
you can use this expression and customize your time format.

Related

ColdFusion CF2021 xmlParse(file) returning wddx encoded object

UPDATE: After doing some more poking around, it looks as though the problem has to do with where CF is looking for the DTD file referenced in the XML.
We have the DTDs, but it looks as though CF isn't finding them, so it isn't sure how to parse the XML according to the DTD. I determined this by having it parse XML without any DTD, and it worked as expected and as I wanted - returning a parsed xmlDoc, not a string.
Is there some way of setting the default directory for where CF should look for the DTD specified in the XML?
We're running CF2021, and xmlParse(file), which should return a parsed XML object is instead returning the file contents as a string, inside a wddx encoded object. We have just migrated from a CF2018 server running on a remote hosting service to CF2021 running on an AWS box.
In order to return the XML object we need, I need to run xmlParse on the file, then wddx2cfml on the object, then xmlParse again on the string.
Is there a reason why xmlParse, which should return a parsed XML object, is instead behaving this way?
We pass the system file location to the method. Call it docPath, and it'd look something like g:\appName\xmlFiles\20230125.xml
Then we have, in cfscript:
doc = xmlParse(docPath);
When I dump that to a file, I get what I described above. When I change it to the following, I get what want:
docFile = xmlParse(docPath); cfwddx(action="wddx2cfml", input="#docFile#", output="xmlString"); xmlDoc = xmlParse(xmlString);
But I don't understand why this is necessary, and I'm concerned about having to change it everywhere in the code that we use xmlParse. For the record, this also occurs in tagged CF as well as cfscript, so it's not that.
Putting the dtd files in CF's WEB-INF folder solved the problem. CF was able to match the DTD with the DOCTYPE and properly parse the XML.

I wonder if I can perform data-pipeline by directory of a specific name with DataFusion

I'm using google-cloud-platform data fusion.
Assuming that the bucket's path is as follows:
test_buk/...
In the test_buk bucket there are four files:
20190901, 20190902
20191001, 20191002
Let's say there is a directory inside test_buk called dir.
I have a prefix-based bundle based on 201909(e.g, 20190901, 20190902)
also, I have a prefix-based bundle based on 201910(e.g, 20191001, 20191002)
I'd like to complete the data-pipeline for 201909 and 201910 bundles.
Here's what I've tried:
with regex path filter
gs://test_buk/dir//2019 to run the data pipeline.
If regex path filter is inserted, the Input value is not read, and likewise there is no Output value.
When I want to create a data pipeline with a specific directory in a bundle, how do I handle it in a datafusion?
If using directly the raw path (gs://test_buk/dir/), you might be getting an error when escaping special characters in the regex. That might be the reason for which you do not get any input file into the pipeline that matches your filter.
I suggest instead that you use ".*" to math the initial part (given that you are also specifying the path, no additional files in other folders will match the filter).
Therefore, I would use the following expressions depending on the group of files you want to use (feel free to change the extension of the files):
path = gs://test_buk/dir/
regex path filter = .*201909.*\.csv or .*201910.*\.csv
If you would like to know more about the regex used, you can take a look at (1)

Pick up a particular file from a directory using regex in Talend

My directory contains files named as WM_PersonFile_22022018 , WM_PersonFile_23022018, WM_PersonFile_24022018 , WM_PersonFile_25022018 and these files come on a daily basis. I am using tFileList to iterate through the files
What should be my regex in my Filemask to pick up the most recent file? Should the Use Global Expressions as Filemask be unchecked?
I tried "*.txt" which is picking up all the files.
RegEx would help you to filter for the correct files.
Some other logic would get you the newest file. If you use tFileList, you might be able to sort after date and only take the first result.
Alternatively, if you also want to check the date in the filename is correct, you might need to add a little logic with a tMap, tAssert, tJava or tJavaRow.

Reading dates from filenames

I want to extract dates from the suffixes of files in a particular folder. The contents of such a folder look something like:
Packed_Folder_1_2016.06.10
Packed_Folder_1_2016.08.06
Packed_Folder_1_2015.09.03
packed_Folder_1_2015.01.08
... (so on and so forth, always in the same path just different suffixes)
There is no pattern to the dates. I need to make a VS form (2013) to read the name of the files and store the date differences.
Notice how the filenames always follow a pattern? It's always Packed_Folder_1_####.##.##, where the last part is a date.
So what you want to do is list the file names in the folder, and try to find a file that matches the pattern. You could use a regular expression to match the filename (it would be something like R"(Packed_Folder_1_\d{4}\.\d{2}\.\d{2})").
You are talking about Forms, so I am assuming you are able to use Visual C++. If that is the case, you can check FileSystemWatcher Class.
You instantiated it with a given path ( file or directory ), and it will trigger events based on some changes on the target (simple change, creation, rename - you can select which one). You could then update your reference, in case its change suits your needs.

Writing a regular expression for nutch's regex-urlfilter.txt file

I'm having some problems with regex-urlfilter.txt file.
I want to crawl only links that have numbers before '.html', should be easy but I can't get it right...
Here's an example:
http://www.utiltrucks.com/annonce-occasion-camion-poids-lourd/marque-renault/modele-midliner/ref-71015.html
http://www.utiltrucks.com/annonce-occasion-camion-poids-lourd/dpt-.html
I want to catch the first link.
I've tried with the following entry in regex-urlfilter:
accept anything else
+http://www.utiltrucks.com/annonce-occasion.+?[0-9]+.html
I get a message:
0 records selected for fetching, exiting ...
Anybody got an idea how to pull this off?
Note that your url filters should also match with your seed URLs or else they will be filter out and hence nutch won't get any chance to parse them and extract the links you wanted.
For example, if your seed file contains this url http://www.utiltrucks.com/home then you should also add an entry in your regex-urlfilter file like this:
+http://www.utiltrucks.com/home
This should be also done for all pages that in the path from your seed urls to your target pages that you want to extract links from.
you have to start your url like
+^(http|https)://www.example.com