I'm searching for a logging library in C++ for several days now, but somehow I'm not very happy with the existing solutions like boost logging or Pantheios. Originally I'm a Java Developer. I would like to have a logging library with a logger which behaves more like an object. I would like to do the following things:
Create an instance of a logging object Logger(filepath, filename)
and use a log(serverity, message) method to log different messages in the text file
The outstanding problem of this features is that I do not know in advance how many of these logging objects will exists or if these files will have the same filepath. Maybe I could handle this with boost but I don't get the example in the "Text multi-file backend" part of the documentation. Especially what will this code snippets from the example do:
Snippet 1.
// Set up the file naming pattern
backend->set_file_name_composer
(
sinks::file::as_file_name_composer(expr::stream << "logs/" << expr::attr< std::string >("RequestID") << ".log")
);
Snippet 2.
// Set the formatter
sink->set_formatter
(
expr::stream
<< "[RequestID: " << expr::attr< std::string >("RequestID")
<< "] " << expr::smessage
);
This code raises 4 question (or issues) in my head:
Does that mean that I just have to set the attribute RequestID and than the logger will decide in which file to put the message? How would I do that?
Is it even possible with boost to have logging files in different paths?
What will happen if different threads access the same file?
Will this code in init_logging() effect the application-wide behaviour of the boost logging library? Is this done by some kind of ... global variables?
Maybe my thoughts are too naive. Is there even a way to get something like I mentioned at the beginning of my post?
If you're new with Boost.Log you should read about the library design first, it is quite different from Java. Despite the difference, it is possible to configure the library in a similar way to log4j, and this answer will help to get you started.
Now, to your questions:
Does that mean that I just have to set the attribute RequestID and than the logger will decide in which file to put the message? How would I do that?
In the particular case of text_multifile_backend the sink will decide to what file every log record will be written. The set_file_name_composer call sets a function object that composes the log file name, and as you can see, it involves the RequestID attribute. Naturally, you can use whatever attribute(s) you like, including channels. You should also know that text_multifile_backend is not the only way (and probably not the most efficient way) to achieve what you want. If the number of different log files is limited, it is typically better to add several text file sinks, one for each file, and set up filtering so that each sink receives its own log records. This approach is described in the answer I linked above.
Regarding adding attributes, there are different ways depending on the use case and the attribute set you want to add it to. In the case of channels, this attribute is automatically provided by the logger, you just create the logger with the channel name and every log record you make through that logger will have it attached as an attribute. The RequestID attribute from the example you pointed to could be added in any possible way. Here are a few common examples:
It could be added to a logger manually. This is typical, if you create a logger for processing a request (in a broad meaning - whatever 'request' means in your application), and write all log messages related to the request processing through that logger.
It could be added to a logger as a scoped attribute. This is useful if you don't have a dedicated logger for every request but have a common logger somewhere that is used to write logs related to request processing.
It could be added as a scoped attribute to thread-specific attributes. This will help if request processing involves multiple loggers in different parts of the program, but at a given point of time only a single thread (the current one) is processing a particular request. Other threads may be processing other requests and set their own thread-specific attributes - they will not interfere.
Is it even possible with boost to have logging files in different paths?
Of course. As I said, this can be done by adding more than one file sink to the core. By its nature, text_multifile_backend is already able to write more than one file.
What will happen if different threads access the same file?
Boost.Log has support for multithreading. On the sinks level, sink frontends implement thread synchronization. For instance, the synchronous_sink frontend will block contending threads from writing to a single file concurrently. Log records can be written to different sinks concurrently though.
Loggers also have single-threaded and multi-threaded versions, and the latter do additional locking to protect their internal structures from concurrent access. This protection, however, does not extend on sinks (i.e. even if you use an _mt logger, the sink frontend still has to synchronize threads).
Will this code in init_logging() effect the application-wide behaviour of the boost logging library? Is this done by some kind of ... global variables?
There are a number of singletons in Boost.Log, yes. Most notably, the logging core, in which you register all sinks and global and thread-specific attributes. Adding a new sink will have effect on the whole application as records from all loggers will start going to that sink (this is why you should generally configure the sink before adding it to the core). Loggers themselves are not related to sinks and in which sink the log records end up is defined solely by filters. But as I mentioned, it is possible to associate loggers and sinks with help of attributes and filters and manage them in a related manner. You will have to write a wrapper class that provides the interface you described and along with Boost.Log logger creates and configures the corresponding sink.
I think you require log4cxx logging library. It determines, log level when you write it in log file.
Here is reference for you to get starting.
http://www.yolinux.com/TUTORIALS/Log4cxx.html
Related
Let's suppose I want to use AKKA actor model to create a program crunching data coming from files.
Since the model, as far as I understood, is winning if the actor really are unaware on where they are running, passing the path of the file in the message should be an error -some actors when the app scales will possibly not to have access to that path -. By opposite, passing the entire file as bytes would not be an option due to resource issue ( what if file is big and bigger? )
What is the correct strategy to handle this situation? On the same question: would be the assumption of having a distributed file system a good excuse to accept paths as messages?
I don't think there's a single definitive answer, because it depends on the nature of the data and the "crunching". However, in the typical case where you really are doing data processing of the files, you are going to have to read the files into memory at some point. So, yes, the generally answer is to read the entire file as bytes.
In answer to the question of "what if the file is bigger", that's why we have streaming libraries like Akka Streams. For example, a common case might be to use Alpakka to watch for files in a local directory (or FTP), parse them into records, filter/map the records to do some initial cleansing, and then stream those records to distributed actors to process. Because you are using streaming, Akka is not trying to load the whole file into memory at a time, and you get the benefit of backpressure so that you don't overload the actors doing the processing.
That's not to say a distributed file system might not have uses. For example, so that you have high availability. If you upload a file to the local filesystem of an Akka node and the Akka node fails, you obviously lose access to your file. But that's really a separate issue to how you do distributed processing.
I am using "multiprocessing.Process" to launch multiple subrocesses. Each subprcess is the the same python script, which instantiate Logger and write different levels into log files. As long as it's the same script it creates Logger with the same name in each subprocess.
Also each subprocess has unique ID and logs info including that unique id.
I have found out that log file missing some IDs completely, i.e there is no log output for the whole subprocess.
answer is here:
Although logging is thread-safe, and logging to a single file from multiple threads in a single process is supported, logging to a single file from multiple processes is not supported, because there is no standard way to serialize access to a single file across multiple processes in Python. If you need to log to a single file from multiple processes, one way of doing this is to have all the processes log to a SocketHandler, and have a separate process which implements a socket server which reads from the socket and logs to file. (If you prefer, you can dedicate one thread in one of the existing processes to perform this function.) This section documents this approach in more detail and includes a working socket receiver which can be used as a starting point for you to adapt in your own applications.
If you are using a recent version of Python which includes the multiprocessing module, you could write your own handler which uses the Lock class from this module to serialize access to the file from your processes. The existing FileHandler and subclasses do not make use of multiprocessing at present, though they may do so in the future. Note that at present, the multiprocessing module does not provide working lock functionality on all platforms (see https://bugs.python.org/issue3770).
https://docs.python.org/2/howto/logging-cookbook.html#logging-cookbook
I'm trying to use Cap’n Proto in existing project consisting of client and server communicating over UDS. I don't have the resources (and I doubt it would be accepted) to redo all client-server RPC, but I wanted to benefit from Cap’n Proto serialization mechanisms. Unfortunately, it seems to me it's impossible.
The biggest problem is server side, which is single threaded (and it will remain so, if there aren't any serious arguments for multithreading) and uses it's own poll based loop. All events are read partially, server can't block waiting for any event to be fully read - and this is where I am stuck. We have our own protocol and classes which wrap the message, which can consume bytes from file descriptor and notify, when the event is fully read, so the server can process it. I think I've analysed most of Cap’n Proto interfaces (serialization, async serialization) and it seems, that it can't be used the same way without any modifications.
I really hope that I've missed something. Did I?
There are two ways you can solve this:
Hard way: You can attempt to integrate with the KJ async I/O framework (used by Cap'n Proto). The KJ event loop can actually integrate with other event loops and run on top of them -- but it's tricky. For example, node-capnp includes code to integrate the KJ event loop with libuv, as seen in the first part of this source file. Once you have the necessary glue, you can write KJ-style async code that uses the interfaces in capnp/serialize-async.h.
Easy way: Instead of trying to integrate KJ, you can write simple code using your event infrastructure which reads data from the file descriptor directly and then uses capnp::expectedSizeInWordsFromPrefix() (from capnp/serialize.h) to figure out if it has received the whole message yet. If that function returns a number greater than what you already have, then you don't have the full message and have to keep waiting. Once you have the full message, you can then use capnp::FlatArrayMessageReader to parse it.
Well my problem is the following. I have a piece of code that runs on several virtual machines, and each virtual machine has N interfaces(a thread per each). The problem itself is receiving a message on one interface and redirect it through another interface in the fastest possible manner.
What I'm doing is, when I receive a message on one interface(Unicast), calculate which interface I want to redirect it through, save all the information about the message(Datagram, and all the extra info I want) with a function I made. Then on the next iteration, the program checks if there are new messages to redirect and if it is the correct interface reading it. And so on... But this makes the program exchange information very slowly...
Is there any mechanism that can speed things up?
Somebody has already invented this particular wheel - it's called MPI
Take a look at either openMPI or MPICH
Why don't you use queuing? As the messages come in, put them on a queue and notify each processing module to pick them up from the queue.
For example:
MSG comes in
Module 1 puts it on queue
Module 2,3 get notified
Module 2 picks it up from queue and saved it in the database
In parallel, Module 3 picks it up from queue and processes it
The key is "in parallel". Since these modules are different threads, while Module 2 is saving to the db, Module 3 can massage your message.
You could use JMS or MQ or make your own queue.
It sounds like you're trying to do parallel computing across multiple "machines" (even if virtual). You may want to look at existing protocols, such as MPI - Message Passing Interface to handle this domain, as they have quite a few features that help in this type of scenario
I developed a logger for testing our modules in c++, Win32 console, visual studio(Windows)
Logger is running in one thread.
While it displays output in console window, thread is getting preempted.
Some other module thread is running.
So output of other modules is getting mixed with output of Logger module in Console window.
Is there any way to avoid preemption of logger thread, so that entire logger output can be at one place in console output window ?
Writing to file instead of output window is one solution. But as the drive names may be different on different machines, its difficult to hardcode the logger output file path. Even then, we can still write the code for finding the drives available on any machine and write to first drive etc. But tester may not understand where to search for the logger output file.
Add the thread Id to the logger output, and then use a log viewer that can filter.
DebugView (under windows) allows you to add highlight filters to dynamic logging.
The standard solution is to use a mutex . After formatting, but before starting the output to the console, you lock the mutex. When all output is sent, you unlock the mutex again. If a second thread comes in, its attempt to lock the mutex will cause that thread to be preempted until the first thread is done.
CriticalSections in Windows behave mutex-like and are also usable. They use a slightly different terminology. You don't "lock" them, you "enter" and "leave" a critical section with EnterCriticalSection and LeaveCriticalSection.
Preventing Thread preemption is generally dangerous. You can try to temporarily increase the thread priority, but i dont advise it (Dangerous; will not work on multiprocessor, ...).
Other Ways:
rewrite all modules to only use your logger for output
if other modules only write to cout/stdout: logger should write to cerr/stderr. This will not prevent the intermingled output in the console. But when redirecting the program output to different files, it will.
I think the best solution is to simply separate the logger output from the rest of your program's output. You mentioned the possibility of writing the logging to a file. If your only hang-up with this solution is coding an appropriate path, then you can choose the output path dynamically:
TCHAR buffer[ MAX_PATH ];
SHGetSpecialFolderPath( NULL, buffer, CSIDL_LOCAL_APPDATA, TRUE );
This will give you the local app data folder for the current user. You can then just append your log file name.