Do you know if there is a way to use transport.vfs.MoveAfterProcess parameter of a File InboundEndpoint in order to move the processed file to several folders (with separator, multiple entries...)?
Thanks
After having a deeper look in the source code (https://github.com/wso2/carbon-mediation/blob/master/components/inbound-endpoints/org.wso2.carbon.inbound.endpoint/src/main/java/org/wso2/carbon/inbound/endpoint/protocol/file/FilePollingConsumer.java) it looks not possible so I'll have to create a copy of the file after reading it.
Related
I am a begginer and I've to List two files (a.xlsx and mark.txt) in a SFTP, fetching them and only process it when i've both files,
This is the logic:
If i have "mark.txt" i process a.xlsx and i delete "mark.txt".
For the next start, when i don't have "mark.txt" i don't process anything.
If i have again "mark.txt" i process a.xlsx and i delete "mark.txt".
Repeat.
I've tried with ListSFTP, then FetchSFTP, and then use a RouteonAttribute, but i don't know how to solve it.
Thank you in advance for your help
What you could do is look for the file a.xlsx and then process it if found. When NiFi picks up this file, it can delete it so the next time it looks for the xlsx file, it will be a new one. Therefore, if the file isn’t found, then it won’t do anything. Looking for the .txt and then pulling the .xlsx isn’t the best way to do this, just pull the XLSX directly.
One way to do what you’re asking is to look for mark.txt and if found, then you can write a script using a language like Python to get the file, instead of having to write a NiFi processor. This would be something like List File -> ExecuteStreamCommand where the ExecuteStreamCommand would be a Python script.
I have a Django server running which utilizes files in a directory /PROCESSING_DOCS/*.json. An API call dynamically adds more files to this folder. Now I need to maintain a queue which updates the files added into that folder dynamically.
How can I implement this? I don't have any idea.
Here are a few suggestions right off the top of my head:
If you just need to keep a log of what files were added, processing status, etc:
since you're doing a lot of I/O you can add another file (ex: named files_queue) and append the filenames one per line. Later you may add additional details (CSV style) about each file (would be a bit of a challenge to search through it if this file grows big).
related to the first idea, if the number of files is not an issue you may create a file (like a .lock file for example) for each file processed and maybe store all processing details in it (and it will be easy to search).
if your application is connected to a database, create a table (ex: named files_queue) and insert one row per each file. Late you may add additional columns to the table to store additional details about each file.
If you're looking for queue manager there are a few solutions just a "python queue" google search away. I personally have used RabbitMQ.
Hope this helps,
Cheers!
I've been scouring the web for hours looking for an approach to solving this problem, and I just can't find one. Hopefully someone can fast-track me. I'd like to cause the following behaviour:
When running ember s and a file of a certain extension is changed, I'd like to analyze the contents of that file and write to several other files in the same directory.
To give a specific example, let's assume I have a file called app/dashboard/dashboard.ember. dashboard.ember consists of 3 concatenated files: app/dashboard/controller.js, .../route.js, and .../template.hbs with a reasonable delimiter between the files. When dashboard.ember is saved, I'd like to call a function (inside an addon, I assume) that reads the file, splits it at the delimiter and writes the corresponding splitted files. ember-cli should then pick up the changed source (.js, .hbs, etc.) files that it knows how to handle, ignoring the .ember file.
I could write this as a standalone application, of course, but I feel like it should be integrated with the ember-cli build environment, but I can't figure out what concoction of hooks and tools I should use to achieve this.
Consider I want to write a program that delete all files in a given directory, except for few files which the user can define to preserve (by using a config file for example).
Is there a native way to do it? The direct approach is to loop over all the files and to decide for each of them if to call DeleteFile or not. But is it the right approach?
Thanks.
The goal is to write output to different folders(different path) using one reduce.
I use old mapreduce api, and I do a little modification on MultipleOutputs(loose the restriction), and it works.
But the outputformat I use extends FileOutputFormat, where FileOutputCommitter is refered by FileOutputFormat.
And I find there will be a _success file in only one folder. it will be a problem?
And there still a empty file part-00000, I don't know why it is generated?
_SUCCESS is written only once after the job is complete. It is useful to check if the job is complete. I dont think there is any risk with that. You should know that it is created only after the job is complete and you should know where to look for that file if you are using it.
Regarding the part- files, take a look at
map reduce output files: part-r-* and part-*