log4cpp stops working properly after sometime - c++

I have a log4cpp implementation in a multiple process environment . Logger is configured once during initialization and then is shared among forked processes which server http requests.
During first minute or so , I see the logs rolls perfectly fine at the query per second load( say it runs at 100qps).
After that, the log slows down dramatically. So, I logged pid as well and notice that only one process gets to write to the log for a time duration ( around 10-15 seconds) and then another process starts writing and so on so forth . Processes don't die. They just don't get a chance to write.
This is different from what happens when the server starts . At that time, every other log line is written by a different process. ( Also, I write one-log-line per process at the end of serving the request. )
At this point, I can't think of what could be going wrong.
This is how my log4cpp conf file looks
log4cpp.rootCategory=DEBUG,rootAppender
log4cpp.appender.rootAppender=org.apache.log4cpp.RollingFileAppender
log4cpp.appender.rootAppender.fileName=/tmp/mylogfile.log
log4cpp.appender.rootAppender.layout=org.apache.log4cpp.PatternLayout
log4cpp.appender.rootAppender.layout.ConversionPattern=%d|%p|%m%n
log4cpp.category.http.server.main=INFO,MAIN
log4cpp.additivity.http.server.main=false
log4cpp.appender.MAIN=org.apache.log4cpp.RollingFileAppender
log4cpp.appender.MAIN.maxBackupIndex=10
log4cpp.appender.MAIN.maxFileAge=1
log4cpp.appender.MAIN.append=true
log4cpp.appender.MAIN.fileName=/tmp/mylogfile.log
log4cpp.appender.MAIN.layout=org.apache.log4cpp.PatternLayout
log4cpp.appender.MAIN.layout.ConversionPattern=%d|%p|%m%n
Edit: more updates : Thanks #Botje for your time.
I see that whenever a new child process is created , it is only that process that gets to write to the log. That tells me that all the reference other processes were holding become invalid.
I also tried setting additive property to true. With that , server starts properly writing into the /tmp/myfile.log and then switches to writing into /tmp/myfile.log.1 withing a minute . And then stops writing after a minute.
At that point logs gets directed to stderr which is directed to another log file.
Also,

I did notice that the log4cpp FileAppender uses seek to determine the file size before writing log entries. If the file handle is shared between processes that will cause writes to end up at the start of the file instead of the end. Even if you fix that, you still have multiple processes that think they are in charge of log file rotation.
I suggest you have all processes write to a common udp/tcp/Unix socket and designate one process that collects all log entries and actually writes it to a file. You don't have to reinvent the wheel, you can use the syslog protocol and either the system syslog or a copy running in userspace.

Related

How to prevent procmail from crashing the platform and make it run one process at a time?

I have the problem that I capture emails and they arrive in masses, the issue is that every time they arrive in masses the platform crashes, the question is how to make it go running the process 1 at a time, is it possible? because currently I filled the entire procmail server where there were multiple processes at once, plus we add the executives who were working and the server died and we had to reboot and delete data from the procmail to get it working again.
Because once we capture the data it is working and making subprocesses.
This is the code:
SHELL = /bin/sh
LOGFILE = /var/log/procmail.log
LOGABSTRACT = "all"
VERBOSE = "on"
:0c
| php /srv/platform/laravel/artisan platform:catchemail >> /var/log/procmail_catchemail.log 2>&1
:0:
/var/log/plaform_catchemail
If by "platform" you mean the PHP script, you can serialize access to it by using a lock file.
:0c:.catchemail.lock
| php /srv/platform/laravel/artisan platform:catchemail >> /var/log/procmail_catchemail.log 2>&1
This means, if the file .catchemail.lock does not exist in your $MAILDIR, go ahead and create it, and hold it for the duration of this recipe.
If it does exist, sleep and try again.
There is a failure scenario if the lock is held for too long; Procmail's default behavior in this case is to bounce the message (i.e. cause the delivering MTA to regard it as undeliverable, and return an error message to the sender). You probably want to avoid that, ideally by telling the MTA to attempt delivery again at a later time. (The precise mechanism will depend on your MTA; but basically, by setting a suitable exit code.) But what's feasible and scalable ultimately depends on how many messages you receive vs how many you can process under this constraint.

C++ pause/resume system on large operation

I have a C++ program that loads a file with few millions lines and starts processing, the same operation was done by a php script, but in order to reduce the execution time I switched to C++.
In the old script, I checked whether there is a file with the current operation id in a "pause" folder, the file is empty It is just to check if a pause is requested, the script then checks after each 5 iterations if there is such file, if so It stuck on an empty loop until the file is deleted (a.k.a resume) :
foreach($lines as $line)
{
$isFinished = $index >= $countData - 1;
if($index % 5 == 0)
{
do
{
$isPaused = file_exists("/home/pauses/".$content->{'drop-id'});
}while($isPaused);
}
// Starts processing the line here
}
But since disk accessing is relatively slow, I don't want to follow the same approach, so I was thinking of some sort of commands that simulates this :
$ kill cpp_program // C++ program returns the last index checked e.g: 37710
$ ./main 37710
$ // cpp_program escapes the first 37709 lines and continues its job
What do you think of this approach ? Is-it feasible ? Is-it non time-consuming ? Is there any better approach ?
Thank you
Edit : A clarification because this seems a little ambiguous, this task runs in the background, there is another application which starts this one, I want to be able to send command from the management app (through Linux commands) to the background task to pause/resume.
Jumping to the 37710 line of a text file sadly requires reading all 37710 lines before it on most operating systems.
On most operating systems, text files are binary files with a convention about newlines. But the OS doesn't cache where the newlines are.
So to find the newlines, you have to read every byte.
If your program saved the byte offset of the file it had reached, it could seek to that location, however.
You can save the state of your program to some config file as you are shutting down, and set it to resume by default when it starts up again. This will require catching the signal you use to shut down, making your main logic notice the signal flag being set, and then cleanly shutting down. It is a very C-esque operation.
Now, a different traditional way to make a program controllable remotely is to have it listen on a TCP port (and/or stdin) and take command line commands there.
To go that way, you'd write a REPL component, then hook that up to whatever input and output.
Either you'd do the REPL in a coroutine like way between processing files, or you'd spawn a separate thread to do REPL and have it communicate asynchronously with the processing thread.
However, this could be beyond your skill. Each step of this (writing a REPL system, having it not block the main work, responding to commands, then attaching it to a TCP port) would take some effort and learning on your part.

C++: How to set a timeout (not reading input, not threaded)?

Got a large C++ function in Linux that calls a whole lot of other functions, making up an algorithm. At various points given certain bad inputs, the algorithm can get "stuck" and go on forever. Adding a timeout seems appropriate as all potential "stuck" points cannot be predicted. But despite scouring the Internet for timeout examples I've only found how to apply timeouts when either the thing your timing is a separate thread or it's reading inputs. My code is a single thread and does not modify file descriptors, so not coming up with any luck. Do I basically have no choice but to thread it?
I am not sure about the situation, actually server applications or embedded applications often run for years in background without stopping. I think one option is to let your program run in background and log to a file(or screen) timely, and, if you really want to stop the program after certain time, you can use timeout command or a script to kill your program after that time, say, timeout 15s your-prog.

Request for suggestions on doing IPC/event capture

I have a simple python server script which forks off multiple instances (say N) of C++ program. The C++ program generates some events that need to be captured.
The events are currently being captured in a log file (1 logfile per forked process). In addition, i need to periodically (T minutes) get the rate at which the events are being produced across all child processes to either the python server or some other program listening for these events (still not sure). Based on rate of these events, some "re-action" may be taken by the server (say reduce the number of forked instances)
Some pointers i have briefly looked at:
grep log files - go through the running process log files (.running), filter those entries generated in the last T minutes, analyse the data and report
socket ipc - add code to c++ program to send the events to some server program which analyses the data after T minutes, reports and starts all over again
redis/memcache (not sure completely) - add code to c++ program to use some distributed store to capture all the generated data, analyses the data after T minutes, reports and starts all over again
Please let me know your suggestions.
Thanks
if time is not of the essence (T minutes sounds like it is long compared to whatever events are happening in the C++ programs that are kicked off) then dont make things any more complicated than they need to be. forget IPC (sockets, shared mem, etc), just have each C++ program log what you need to know about time/performance and let the python script check logs every T minutes that you need the data. dont waste time overcomplicating something that you can do in a simple manner
As a alternative to your socket IPC suggestion, how about 0mq. It's a library (in C with python bindings available) that can do message transfer on an inter-thread, inter-process or inter-machine level. Pretty simple to get going, and pretty quick.
I'm not affiliated with it. I'm just evaluating it for other uses and thought it might be a fit for you as well.

Rotating logs without restart, multiple process problem

Here is the deal:
I have a multiple process system (pre-fork model, similar to apache). all processes are writing to the same log file (in fact a binary log file recording requests and responses, but no matter).
I protect against concurrent access to the log via a shared memory lock, and when the file reach a certain size the process that notices it first roll the logs by:
closing the file.
renaming log.bin -> log.bin.1, log.bin.1 -> log.bin.2 and so on.
deleting logs that are beyond the max allowed number of logs. (say, log.bin.10)
opening a new log.bin file
The problem is that other processes are unaware, and are in fact continue to write to the old log file (which was renamed to log.bin.1).
I can think of several solutions:
some sort of rpc to notify other processes to reopen the log (maybe even a singal). I don't particularly like it.
have processes check the file length via the opened file stream, and somehow detect that the file was renamed under them and reopen log.bin file.
None of those is very elegant in my opinion.
thoughts? recommendations?
Your solution seems fine, but you should store an integer with inode of current logging file in shared memory (see stat(2) with stat.st_ino member).
This way, all process kept a local variable with the opened inode file.
The shared var must be updated when rotating by only one process, and all other process are aware by checking a difference between the local inode and the shared inode. It should induce a reopening.
What about opening the file by name each time before writing a log entry?
get shared memory lock
open file by name
write log entry
close file
release lock
Or you could create a logging process, which receives log messages from the other processes and handles all the rotating transparently from them.
You don't say what language you're using but your processes should all log to a log process and the log process abstracts the file writing.
Logging client1 -> |
Logging client2 -> |
Logging client3 -> | Logging queue (with process lock) -> logging writer -> file roller
Logging client4 -> |
You could copy log.bin to log.bin.1 and then truncate the log.bin file.
So the problems can still write to the old file pointer, which is empty now.
See also man logrotate:
copytruncate
Truncate the original log file to zero size in place after cre‐
ating a copy, instead of moving the old log file and optionally
creating a new one. It can be used when some program cannot be
told to close its logfile and thus might continue writing
(appending) to the previous log file forever. Note that there
is a very small time slice between copying the file and truncat‐
ing it, so some logging data might be lost. When this option is
used, the create option will have no effect, as the old log file
stays in place.
Since you're using shared memory, and if you know how many processes are using the log file.
You can create an array of flags in shared memory, telling each of the processes that the file has been rotated. Each process then resets the flag so that it doesn't re-open the file continuously.