Event-wait task in Informatica - informatica

Which of the following statements is correct on workflow events?
1) A predefined (or file-wait) Event Wait task waits for a file to appear at the specified location. As soon as the file appears, the event succeeds and the file can be processed by the subsequent mapping.
2) A predefined (or file-wait) Event Wait waits for a file to appear at the specified location. As soon as the file appears, it is deleted and the subsequent task will start.

Option 1 is true. By default Informatica doesn't delete the file watch for which the predefined event is waiting.
If you want the file to be deleted as soon as the subsequent task is started, you will have to check the 'Delete Filewatch File' option which is present in properties tab of Event wait task.
Delete Filewatch File property

Related

How to prevent other workers from accessing a message which is being currently processed?

I am working on a project that will require multiple workers to access the same queue to get information about a file which they will manipulate. Files are ranging from size, from mere megabytes to hundreds of gigabytes. For this reason, a visibility timeout doesn't seem to make sense because I cannot be certain how long it will take. I have though of a couple of ways but if there is a better way, please let me know.
The message is deleted from the original queue and put into a
‘waiting’ queue. When the program finished processing the file, it
deletes it, otherwise the message is deleted from the queue and put
back into the original queue.
The message id is checked with a database. If the message id is
found, it is ignored. Otherwise the program starts processing the
message and inserts the message id into the database.
Thanks in advance!
Use the default-provided SQS timeout but take advantage of ChangeMessageVisibility.
You can specify the timeout in several ways:
When the queue is created (default timeout)
When the message is retrieved
By having the worker call back to SQS and extend the timeout
If you are worried that you do not know the appropriate processing time, use a default value that is good for most situations, but don't make it so big that things become unnecessarily delayed.
Then, modify your workers to make a ChangeMessageVisiblity call to SQS periodically to extend the timeout. If a worker dies, the message stops being extended and it will reappear on the queue to be processed by another worker.
See: MessageVisibility documentation

how to monitor a file for changes?

How do I monitor rtf file to check if it is updated for a while (lets say 15 min). If not updating then let the main thread know that file is not updated. I am thinking of using WaitforSingleObject function to wait for any changes in last 15 minute. how can I implement this funcationality?
I believe what are looking for is file change notifications such as FindFirstChangeNotification, FindNextChangeNotification, and ReadDirectoryChangesW you monitor a file or directory for changes, rename, write, and so on.
Presumably your platform is Windows since you mention WaitForSingleObject. In which case the function you are looking for is ReadDirectoryChangesW. This will allow you to be notified as soon as changes are made, without you performing any polling.
Jim Beveridge has an excellent pair of articles that go into some depth:
http://qualapps.blogspot.com/2010/05/understanding-readdirectorychangesw.html
http://qualapps.blogspot.com/2010/05/understanding-readdirectorychangesw_19.html
You can stat() the file, check its modification date and act appropriately.
You can also periodically compute a checksum of the file and compare it to the previous one.
For RTF files you can also take the size of the file and compare it to the previous size; if it's been modified it's very likely the size will be different.
All those methods will probably introduce more overhead than the system calls mentioned by others.
In my opinion, you can achieve this in two ways. You can write a file filter driver that can monitor write operation on the file. However this is little bit stretching.
Another way is simple one. In your main thread, create a hash of your RTF file and cache it. Create an event in non-signaled state, create a callback function, create a worker thread. Wait in the worker thread on event for 15 min. After timout, again generate hash of your file and compare it with cached hash. If mismatch, notify your main thread through callback function.

Correct way to ensure single instance and pass on arguments for Windows c++ console application

I’m writing a simple Windows console c++ application. If the application is started a second time (on the same computer) it should not span an new instance but pass command line arguments to the instance already running.
I have accomplished to ensure that the application only runs in one instance by using mutex but I am unable to notify the first application that it has been started as second time and pass on command line arguments.
Use case:
listener.exe -start // starts listener
listener.exe -stop // stops listener
If you just want to communicate a simple boolean value (start/stop, for example), then you probably need an Event object.
If you want to exchange more complex data between processes, you could use named pipes or perhaps blocks of shared memory.
The first listener should wait on an event object which is for shutdown. When you launch listener.exe -stop then it will just set the global event for shutdown and if first instance is running then it would exit. Named event object is required in order for the other processes to refer it. Also when you fire command 2nd time it will launch another process there is no implicit IPC with command interpreter.
listener.exe -start:
Create a named event (CreateEvent)
Wait on the event in the main thread or any suitable thread. (WaitForSingleObject)
On event initiate shutdown
listener.exe -stop
Get Handle to named event.
Set the event so that the thread of first process knows that shutdown event is fired and it exits
Some reference:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686915(v=vs.85).aspx
There are many types of IPC. One technique that worked well for me on Windows was using a separate thread to process messages for a message only window. Once you determine you are the primary instance of the program, or listener(via mutex creation as in your scenario) create the message only window and start a thread to process messages. For secondary instances, if there is something on the command tail, pass it as a string to the message only window using WM_COPYDATA message. The listener program ignores all other messages except perhaps a token telling it to quit. Once the secondary instance passes the message to the message only window it exits.
This can work very well for a scenario where you may have dozens of secondary instances opening. One example would be the user selects 50 files in an Explorer folder, right clicks, and runs the program. The listener processes the message only window in a dedicated thread and queues up the strings(in this case filenames) for processing.

NodeJS Filesystem Watch throwing event twice or more often

I'm watching the config files of my NodeJS server on Ubuntu using:
for( var index in cfgFiles ) {
fs.watch(cfgFiles[index], function(event, fileName) {
logger.info("======> EVENT: " + event);
updateConfigData(fileName);
});
}
So whenever I save a config file, the "change" event is received at least twice by the handler function for the same file name causing updateConfigData() to be executed multiple times. I experienced the same behavior when watching config files using C++/iNotify.
Does anyone have a clue what causes this behavior?
Short Answer: It is not Node, file is really changed twice.
Long Answer
I have a very similar approach that I use for my development setup. My manager process watches all js source files if it is a development machine and restart childs on the cluster.
I had not paid any attention to this since it was just development setup; but after I read your question, I gave it a look and realized that I have the same behavior.
I edit files on my local computer and my editor updates them over sftp whenever I save. At every save, change event on the file is triggered twice.
I had checked listeners('change') for the FSWatcher object that is returned by fs.watch call; but it shows my event handler only once.
Then I did the test I should have done first: "touch file.js" on server and it triggered only once. So, for me, it was not Node; but file was really changed twice. When file is opened for writing (instead of appending), it probably triggers a change since it empties the content. Then when new content is written, it triggers the event for a second time.
This does not cause any problem for me; but if you want to prevent it, you can make an odd-even control in your event handler function by keeping the call numbers for each file and do whatever you do only on even-indexed calls.
See my response to a similar question which explains that the problem is being caused by your editor making multiple edits to the file on save.

Rotating logs without restart, multiple process problem

Here is the deal:
I have a multiple process system (pre-fork model, similar to apache). all processes are writing to the same log file (in fact a binary log file recording requests and responses, but no matter).
I protect against concurrent access to the log via a shared memory lock, and when the file reach a certain size the process that notices it first roll the logs by:
closing the file.
renaming log.bin -> log.bin.1, log.bin.1 -> log.bin.2 and so on.
deleting logs that are beyond the max allowed number of logs. (say, log.bin.10)
opening a new log.bin file
The problem is that other processes are unaware, and are in fact continue to write to the old log file (which was renamed to log.bin.1).
I can think of several solutions:
some sort of rpc to notify other processes to reopen the log (maybe even a singal). I don't particularly like it.
have processes check the file length via the opened file stream, and somehow detect that the file was renamed under them and reopen log.bin file.
None of those is very elegant in my opinion.
thoughts? recommendations?
Your solution seems fine, but you should store an integer with inode of current logging file in shared memory (see stat(2) with stat.st_ino member).
This way, all process kept a local variable with the opened inode file.
The shared var must be updated when rotating by only one process, and all other process are aware by checking a difference between the local inode and the shared inode. It should induce a reopening.
What about opening the file by name each time before writing a log entry?
get shared memory lock
open file by name
write log entry
close file
release lock
Or you could create a logging process, which receives log messages from the other processes and handles all the rotating transparently from them.
You don't say what language you're using but your processes should all log to a log process and the log process abstracts the file writing.
Logging client1 -> |
Logging client2 -> |
Logging client3 -> | Logging queue (with process lock) -> logging writer -> file roller
Logging client4 -> |
You could copy log.bin to log.bin.1 and then truncate the log.bin file.
So the problems can still write to the old file pointer, which is empty now.
See also man logrotate:
copytruncate
Truncate the original log file to zero size in place after cre‐
ating a copy, instead of moving the old log file and optionally
creating a new one. It can be used when some program cannot be
told to close its logfile and thus might continue writing
(appending) to the previous log file forever. Note that there
is a very small time slice between copying the file and truncat‐
ing it, so some logging data might be lost. When this option is
used, the create option will have no effect, as the old log file
stays in place.
Since you're using shared memory, and if you know how many processes are using the log file.
You can create an array of flags in shared memory, telling each of the processes that the file has been rotated. Each process then resets the flag so that it doesn't re-open the file continuously.