I have been searching this site and the Boost.Log doc for a way to do this but have come up empty so far.
The doc (https://www.boost.org/doc/libs/1_74_0/libs/log/doc/html/log/detailed/sink_backends.html) mentions the ability to set a text_stream_backend to flush after each log record written by calling auto_flush(true).
While this works well for debugging, I was wondering if it was possible to configure a custom number of log records received by the core (or sink?) before a flush() occurs. My goal is to strike a balance between useful live logging (I can see the log records frequently enough with a tail -f) and performance.
Alternatively, would it be possible to configure the size of the buffer containing log records so that once it fills up, it gets flushed?
Related
I am using libcurl with CURLOPT_WRITEFUNCTION to download a certain file.
I ask for a certain buffer size using CURLOPT_BUFFERSIZE.
When my callback function is called the first time and I get about that many bytes, there are much more data actually downloaded.
For example, if I ask for 1024 bytes of data, when I first get that, the process has already consumed 100K of data (based on process explorer and similar tools. I can see the continuous stream of data and ACKs in wireshark), so I assume it is downloading in advance and buffering the data.
The thing I am trying to achieve here is to be able to cancel the retrieval based on first few chunks of data without downloading anything that is unnecessary.
Is there a way to prevent that sort of buffering and only download the next chunk of data once I have finished processing the current one (or at least not to buffer tens and hundreds of kilobytes)?
I would prefer the solution to be server agnostic, so CURLOPT_RANGE won't actually work here.
I have my own MediaSink in Windows Media Foundation with one stream. In the OnClockStart method, I instruct the stream to queue (i) MEStreamStarted and (ii) MEStreamSinkRequestSample on itself. For implementing the queue, I use the IMFMediaEventQueue, and using the mtrace tool, I can also see that someone dequeues the event.
The problem is that ProcessSample of my stream is actually never called. This also has the effect that no further samples are requested, because this is done after processing a sample like in https://github.com/Microsoft/Windows-classic-samples/tree/master/Samples/DX11VideoRenderer.
Is the described approach the right way to implement the sink? If not, what would be the right way? If so, where could I search for the problem?
Some background info: The sink is an RTSP sink based on live555. Since the latter is also sink-driven, I thought it would be a good idea queuing a MEStreamSinkRequestSample whenever live555 requests more data from me. This is working as intended.
However, the solution has the problem that new samples are only requested as long as a client is connected to live555. If I now add a tee before the sink, eg to show a local preview, the system gets out of control, because the tee accumulates samples on the output of my sink which are never fetched. I then started playing around with discardable samples (cf. https://social.msdn.microsoft.com/Forums/sharepoint/en-US/5065a7cd-3c63-43e8-8f70-be777c89b38e/mixing-rate-sink-and-rateless-sink-on-a-tee-node?forum=mediafoundationdevelopment), but the problem is either that the stream does not start, queues are growing or the frame rate of the faster sink is artificially limited depending on which side is discardable.
Therefore, the next idea was rewriting my sink such that it always requests a new sample when it has processed the current one and puts all samples in a ring buffer for live555 such that whenever clients are connected, they can retrieve their data from there, and otherwise, the samples are just discarded. This does not work at all. Now, my sink does not get anything even without the tee.
The observation is: if I just request a lot of samples (as in the original approach), at some point, I get data. However, if I request only one (I also tried moderately larger numbers up to 5), ProcessSample is just not called, so no subsequent requests can be generated. I send MeStreamStarted once the clock is started or restarted exactly as described on https://msdn.microsoft.com/en-us/library/windows/desktop/ms701626, and after that, I request the first sample. In my understanding, MEStreamSinkRequestSample should not get lost, so I should get something even on a single request. Is that a misunderstanding? Should I request until I get something?
I came across a situation where I have to log last 1000 events present in the queue.
What will be the best solution to handle this by reducing costly file operation?
At present we are completely rewriting the file with all the queue entries.
Out of the two solutions mentioned below, which one is good? or is there any other option to speed up the logging?
Making a fixed log message size and using file pointer do read/write operation.
Creating multiple files and when the request comes, then read 1000 events from last files
There are several considerations here, that can't be all simultaneously optimized. Among them are:
the latency and throughput of the process emitting the logging messages
the total number of IO operations
the latency of reading log messages
There probably is no "best way". You need to find a working point that suits your requirements.
For example, Nathan Oliver basically suggested in the comments to have the emitting process writing to some aux file, and once it is full to have it rename aux to log.
This idea has very low latency characteristics for the emitter, and an essentially optimal total number of IO ops. Conversely, (at least depending on the implementation,) it has unbounded latency for the reader. Say the logger emits 1700 messages, then indefinitely stops logging. There is no bound on the time it will take the log reader to access the last 700 messages.
So, this idea might be excellent in some settings, but in other settings it might be considered less adequate.
A different way of doing it (with a different working point), is to have the process emitting the messages write again to some aux. When either aux has a number of messages that exceed some number (possibly less than 1000), or a certain amount of time has passed, it should rename aux to some temporary-named file in a temp directory.
Meanwhile, a background process can periodically scan the tmp directory. When it sees there files, it should read:
the log file (which is the only file viewed externally)
the files it found in tmp sorted by modification time
It should retain the last 1000 messages (at most), write them to some tmp_log file, rename it to log, and then erase the files it read in tmp.
This has reasonable latency for both emitter and reader, but more total IO accesses. YMMV.
When Log4cxx decides to write the logs it caches into the file (as configured previously), is it buffer based or timer based?
Also, can I configure Log4cxx to write the logs when I send the logs to it not when it decides to ?
When you set your file in the RollingfileAppender with setfile() you can tell whether you want buffered IO or not. This option will automatically configure setImmediateFlush() accordingly.
The code for the buffered writter shows that the flushing decistion is taken based on the size exclusively (if the buffer+new output exceeds bufer size).
I want to get notified about a change in directory (new file addition/deletion/updation).
I used an API - "ReadDirectoryChangesW" which correctly notifies about any change in directory. But, the API accepts a buffer in which it returns the details of file(s) added/deleted/modified in the directory.
This pose a limitation as the change in directory is not certain and can be huge sometimes. For ex: 1000 files get added in the directory.
In this case I alwasys need to be ready with a large enough buffer to hold notification about all 1000 files.
I dont want to always create this large buffer.
Is there any other alternate way which is more efficient?
If I read the documentation correctly, it will return as many changes as fit in your buffer, and then when next you call it will give you more changes. If want to get 1000 files worth of changes at once, you've got to give it a big buffer, but if you can handle them in smaller chunks, just pass in a smaller buffer and you'll get the rest of the changes on subsequent calls.
One approach that you could use is to use the ReadDirectoryChangesW() as a way to be notified that there has been some change in the directory and to then use this notification as an event to review the directory for changes.
The idea is to discover what has changed yourself rather than depending on ReadDirectoryChangesW() to tell you what has changed.
The documentation for the function indicates that a system buffer is allocated to track changes and it is possible, with a large number of changes that the buffer allocated will overflow. This results in an error returned and requires you to discover what has changed for yourself anyway.
This article on using ReadDirectoryChangesW() may help you.
In my case, I am using the function to monitor a print spooler folder into which a number of text files might be dropped. The number of files is small so I have just allocated a large buffer. What I then do is to use a queue to provide to the actual print functionality, which runs on another thread, the list of files to print.