Let's suppose I want to use AKKA actor model to create a program crunching data coming from files.
Since the model, as far as I understood, is winning if the actor really are unaware on where they are running, passing the path of the file in the message should be an error -some actors when the app scales will possibly not to have access to that path -. By opposite, passing the entire file as bytes would not be an option due to resource issue ( what if file is big and bigger? )
What is the correct strategy to handle this situation? On the same question: would be the assumption of having a distributed file system a good excuse to accept paths as messages?
I don't think there's a single definitive answer, because it depends on the nature of the data and the "crunching". However, in the typical case where you really are doing data processing of the files, you are going to have to read the files into memory at some point. So, yes, the generally answer is to read the entire file as bytes.
In answer to the question of "what if the file is bigger", that's why we have streaming libraries like Akka Streams. For example, a common case might be to use Alpakka to watch for files in a local directory (or FTP), parse them into records, filter/map the records to do some initial cleansing, and then stream those records to distributed actors to process. Because you are using streaming, Akka is not trying to load the whole file into memory at a time, and you get the benefit of backpressure so that you don't overload the actors doing the processing.
That's not to say a distributed file system might not have uses. For example, so that you have high availability. If you upload a file to the local filesystem of an Akka node and the Akka node fails, you obviously lose access to your file. But that's really a separate issue to how you do distributed processing.
Related
Looking for C++ library or easy and robust combination of ones that will provide durable disk backed queue for variable sizes binary blocks.
My app is producing messages that are being sent out to subscribers (messages are variable sized binaries), in a case of subscribers failure or restart or networking issues I need something like circular buffer to queue them up until subscriber return. Available RAM is not enough to handle worst case failure scenario so I'm looking for easy way to offload data to disk.
In best case : set up maximum disk space like (100G) and file name, recover data after application restart, .pus_back() / .front() / .pop_front() like API, no performance drawback when queue is small (99.99% case), no need for strict persistence (fsync() on every message)
Average case : data is not preserved between restarts
Some combo of boost libs will be highly preferable
Muliple process access to writing on same file simultaneously..if the file size is excess on the limit(example 10mb),the processing file is renamed(sample.txt to sample1.txt)rolling appender) and create a new one on the same name.
My issue is ,multiple process writing at same time,File size exceed time file closed, if one of the process is still writing on same file. doesnt File rolling .can any one help
One strategy that I've used also works on a distributed computing system accross multiple machines.
If you create a library which will package log messages and then send them via TCP to a destination, then you can have as many processes as you like writing to the same logger. You'd need a server at that destination to receive the log messages and write them to one file.
Generally, inter-process communication occurs via either shared memory or networking. Using networking we can go not-only inter-process, but also inter machine. If we just use the destination of localhost or 127.0.0.1, then the packet never actually reaches the network card. Most drivers are smart enough to just pass the packet to any processes listening, leading to good performance too.
I'm searching for a logging library in C++ for several days now, but somehow I'm not very happy with the existing solutions like boost logging or Pantheios. Originally I'm a Java Developer. I would like to have a logging library with a logger which behaves more like an object. I would like to do the following things:
Create an instance of a logging object Logger(filepath, filename)
and use a log(serverity, message) method to log different messages in the text file
The outstanding problem of this features is that I do not know in advance how many of these logging objects will exists or if these files will have the same filepath. Maybe I could handle this with boost but I don't get the example in the "Text multi-file backend" part of the documentation. Especially what will this code snippets from the example do:
Snippet 1.
// Set up the file naming pattern
backend->set_file_name_composer
(
sinks::file::as_file_name_composer(expr::stream << "logs/" << expr::attr< std::string >("RequestID") << ".log")
);
Snippet 2.
// Set the formatter
sink->set_formatter
(
expr::stream
<< "[RequestID: " << expr::attr< std::string >("RequestID")
<< "] " << expr::smessage
);
This code raises 4 question (or issues) in my head:
Does that mean that I just have to set the attribute RequestID and than the logger will decide in which file to put the message? How would I do that?
Is it even possible with boost to have logging files in different paths?
What will happen if different threads access the same file?
Will this code in init_logging() effect the application-wide behaviour of the boost logging library? Is this done by some kind of ... global variables?
Maybe my thoughts are too naive. Is there even a way to get something like I mentioned at the beginning of my post?
If you're new with Boost.Log you should read about the library design first, it is quite different from Java. Despite the difference, it is possible to configure the library in a similar way to log4j, and this answer will help to get you started.
Now, to your questions:
Does that mean that I just have to set the attribute RequestID and than the logger will decide in which file to put the message? How would I do that?
In the particular case of text_multifile_backend the sink will decide to what file every log record will be written. The set_file_name_composer call sets a function object that composes the log file name, and as you can see, it involves the RequestID attribute. Naturally, you can use whatever attribute(s) you like, including channels. You should also know that text_multifile_backend is not the only way (and probably not the most efficient way) to achieve what you want. If the number of different log files is limited, it is typically better to add several text file sinks, one for each file, and set up filtering so that each sink receives its own log records. This approach is described in the answer I linked above.
Regarding adding attributes, there are different ways depending on the use case and the attribute set you want to add it to. In the case of channels, this attribute is automatically provided by the logger, you just create the logger with the channel name and every log record you make through that logger will have it attached as an attribute. The RequestID attribute from the example you pointed to could be added in any possible way. Here are a few common examples:
It could be added to a logger manually. This is typical, if you create a logger for processing a request (in a broad meaning - whatever 'request' means in your application), and write all log messages related to the request processing through that logger.
It could be added to a logger as a scoped attribute. This is useful if you don't have a dedicated logger for every request but have a common logger somewhere that is used to write logs related to request processing.
It could be added as a scoped attribute to thread-specific attributes. This will help if request processing involves multiple loggers in different parts of the program, but at a given point of time only a single thread (the current one) is processing a particular request. Other threads may be processing other requests and set their own thread-specific attributes - they will not interfere.
Is it even possible with boost to have logging files in different paths?
Of course. As I said, this can be done by adding more than one file sink to the core. By its nature, text_multifile_backend is already able to write more than one file.
What will happen if different threads access the same file?
Boost.Log has support for multithreading. On the sinks level, sink frontends implement thread synchronization. For instance, the synchronous_sink frontend will block contending threads from writing to a single file concurrently. Log records can be written to different sinks concurrently though.
Loggers also have single-threaded and multi-threaded versions, and the latter do additional locking to protect their internal structures from concurrent access. This protection, however, does not extend on sinks (i.e. even if you use an _mt logger, the sink frontend still has to synchronize threads).
Will this code in init_logging() effect the application-wide behaviour of the boost logging library? Is this done by some kind of ... global variables?
There are a number of singletons in Boost.Log, yes. Most notably, the logging core, in which you register all sinks and global and thread-specific attributes. Adding a new sink will have effect on the whole application as records from all loggers will start going to that sink (this is why you should generally configure the sink before adding it to the core). Loggers themselves are not related to sinks and in which sink the log records end up is defined solely by filters. But as I mentioned, it is possible to associate loggers and sinks with help of attributes and filters and manage them in a related manner. You will have to write a wrapper class that provides the interface you described and along with Boost.Log logger creates and configures the corresponding sink.
I think you require log4cxx logging library. It determines, log level when you write it in log file.
Here is reference for you to get starting.
http://www.yolinux.com/TUTORIALS/Log4cxx.html
I've used FileSystemWatcher in the past. However, I am hoping someone can explain how it actually is working behind the scenes.
I plan to utilize it in an application I am making and it would monitor about 5 drives and maybe 300,000 files.
Does the FileSystemWatcher actually do "Checking" on the drive - as in, will it be causing wear/tear on the drive? Also does it impact hard drive ability to "sleep"
This is where I do not understand how it works - if it is like scanning the drives on a timer etc... or if its waiting for some type of notification from the OS before it does anything.
I just do not want to implement something that is going to cause extra reads on a drive and keep the drive from sleeping.
Nothing like that. The file system driver simply monitors the normal file operations requested by other programs that run on the machine against the filters you've selected. If there's a match then it adds an entry to an internal buffer that records the operation and the filename. Which completes the driver request and gets an event to run in your program. You'll get the details of the operation passed to you from that buffer.
So nothing actually happens the operations themselves, there is no extra disk activity at all. It is all just software that runs. The overhead is minimal, nothing slows down noticeably.
The short answer is no. The FileSystemWatcher calls the ReadDirectoryChangesW API passing it an asynchronous flag. Basically, Windows will store data in an allocated buffer when changes to a directory occur. This function returns the data in that buffer and the FileSystemWatcher converts it into nice notifications for you.
I'm working on a little client that interfaces with a game server. The server sends messages to the connected client over HTTP. Its relatively easy to parse the text messages coming into the client and form responses to send back.
Now what I'm trying to figure out is how to break up the process. I want to have a thread receiving the messages, parsing them into some data object, and placing them into an "incoming" queue to be processed. Then another thread reads messages from this queue and processes them (the brains or AI of the client) and makes responses back to the server.
I want to have the thread that watches the incoming data to do process the text (break up the messages, pull the important data out, etc.) so the AI thread doesn't have that overhead. But the problem is that the server can send a couple hundred different types of messages to the client (what the client can see, other players, if you are firing etc). I want to package this data into a neat little structure so the AI can handle it quickly, and the AI can be rewritten easily.
But how do I write a function that can pull something off a queue and know what type of message it is (so I know what data is contained within the message)?
Example messages:
ALIVE (tells you if you are alive)
It has only one data object, the current game time
DAM (tells if you are damaged)
Has a whole bunch of data, who damaged you, how much, what gun it is, if you can see them, etc.
It is possible to make an object that can handle all of these different message types and be interpreted by a single function? Very few messages have common attributes, so I don't think inheriting or just making one really big message class would be very good...
I'm not looking for a full solution here, just point me in the right direction and hopefully I'll be able to learn a bit on the way :-)
Basically what you're asking about is called a protocol: how data is exchanged and interpreted. Traditionally you'd define your own (and odds are they'd tend to start out rather naive -- sending plain text data with newlines to indicate the end of a command, or something like that). After a while you begin to realize that more is needed (how do you handle binary data? how do you handle errors? etc, etc)
Fortunately there are libraries out there to make life easier for you. These days I tend to favor simple RPC-like libraries for most of my needs. Examples include protocol buffers (by Google), Apache Thrift (by Facebook) and Apache Avro.