How to handle 100 Mbps input stream when my program can process data only at 1 Mbps rate - c++

I am working on a project where we can have input data stream with 100 Mbps.
My program can be used overnight for capturing these data and thus will generate huge data file. My program logic which interpret these data is complex and can process only 1 Mb data per second.
We also dump the bytes to some log file after getting processed. We do not want to loose any incoming data and at the same time want my program to work in real time.So; we are maintaining a circular buffer which acts like a cache.
Right now only way to save incoming data from getting lost is to increase size of this buffer.
Please suggest better way to do this and also what are the alternate way of caching I can try?

Stream the input to a file. Really, there is no other choice. It comes in faster than you can process it.
You could create one file per second of input data. That way you can directly start processing old files while new files are being streamed on the disk.

Related

Is there a way prevent libcurl from buffering?

I am using libcurl with CURLOPT_WRITEFUNCTION to download a certain file.
I ask for a certain buffer size using CURLOPT_BUFFERSIZE.
When my callback function is called the first time and I get about that many bytes, there are much more data actually downloaded.
For example, if I ask for 1024 bytes of data, when I first get that, the process has already consumed 100K of data (based on process explorer and similar tools. I can see the continuous stream of data and ACKs in wireshark), so I assume it is downloading in advance and buffering the data.
The thing I am trying to achieve here is to be able to cancel the retrieval based on first few chunks of data without downloading anything that is unnecessary.
Is there a way to prevent that sort of buffering and only download the next chunk of data once I have finished processing the current one (or at least not to buffer tens and hundreds of kilobytes)?
I would prefer the solution to be server agnostic, so CURLOPT_RANGE won't actually work here.

Reading small separated chunks of a large file (C++)

I am reading a proprietary binary data file format. The format is basically header, data, size_of_previous_data, header, data, size_of_previous_data, header, data, size_of_previous_data, ...
Part of the header includes the number of bytes of the next chunk of data as well as its size being listed immediately after the data. The header is 256 bytes, the data is typically ~ 2MB and the size_of_previous_data is a 32 bit int.
The files are generally large ~GB, and I often have to search through tens of them for the data I want. In order to do this, the first thing I do in my code is idex each of the files, i.e. read in just the headers and record the location of the associated data (file and byte number). My code basically ready the header using fstream::read(), checks the data size, skips the data using fstream::seekg(), then reads in the size_of_previous_data, then repeats until I reach the end of the file.
My problem is that this indexing is painfully slow. The data is on an internal 7200 rpm hard drive on my Windows 10 laptop and Task manager shows that my hard drive usage is maxed out, but I am only getting read speeds of about 1.5 MB/s with response times typically >70 ms. I am reading the file using a std::fstream using fstream::get() to read the headers and fstream::seekg() to move to the next header.
I have profiled my code and almost the entire time is spent in the fstream::read() code to read the size_of_previous_data value. I presume that when I do this the data immediately after this is buffered so my fstream::read() to get the next header takes practically no time.
So I am wondering if there is a way to optimise this? Almost my entire buffer in any buffered read is likely to be wasted (97% of it, if it is an 8kB buffer). Is there a way to shrink this and is it likely to be worth it (perhaps underlying OS buffers too in a way I cannot change)?
Assuming that a disk seek takes about 10 ms (from Latency Numbers Every Programmer Should Know), your file is 11 GB consisting of 2 MB chunks, the theoretical minimum running time is 5500 * 10 ms = 55 seconds.
If you're already in that order of magnitude, the most effective way of speeding this up might be to buy an SSD.

Buffering to the hard disk

I am receiving a large quantity of data at a fixed rate. I need to do some processing on this data on a different thread, but this may run slower than the data is coming in, so I need to buffer the data. Due to the quantity of data coming in the available RAM would be quickly exhausted, so it needs to overflow onto the hard disk. What I could do with is something like a filesystem-backed pipe, so the writer could be blocked by the filesystem, but not by the reader running too slowly.
Here's a rough set of requirements:
Writing should not be blocked by the reader running too slowly.
If data is read slow enough that the available RAM is exhausted it should overflow to the filesystem. It's ok for writes to the disk to block.
Reading should block if no data is available unless the stream has been closed by the writer.
If the reader is able to keep up with the data then it should never hit the hard disk as the RAM buffer would be sufficient (nice but not essential).
Disk space should be recovered as the data is consumed (or soon after).
Does such a mechanism exist in Windows?
This looks like a classic message queue. Did you consider MSMQ or similar? MSMQ has all the properties you are asking for. You may want to use direct addressing to avoid Active Directory http://msdn.microsoft.com/en-us/library/ms700996(v=vs.85).aspx and use local or TCP/IP queue address.
Use an actual file. Write to the file as the data is received, and in another process read the data from the file and process it.
You even get the added benefits of no multithreading.

Replaying stored data at a fixed rate

I am working on a problem where I want to replayed data stored in a file at a specified rate.
For Eg: 25,000 records/second.
The file is in ascii format. Currently, I read each line of the file and apply a regex to
extract the data. 2- 4 lines make up a record. I timed this operation and it takes close to
15 microseconds for generating each record.
The time taken to publish each record is 6 microseconds.
If I perform the reading and writing sequentially, then I would end up with 21 microseconds to publish each record. So effectively, this means my upper bound is ~47K records per second.
If I decide to multi thread the reading and writing then I will be able to send out a packet every 9 microsecond ( neglecting the locking penalty since reader and writer share the same Q ) which gives a throughput of 110K ticks per second.
Is my previous design correct ?
What kind of Queue and locking construct has minimum penalty when a single producer and consumer share a queue ?
If I would like to scale beyond this what's the best approach ?
My application is in C++
If it takes 15uS to read/prepare a record then your maximum throughput will be about 1sec/15uSec = 67k/sec. You can ignore the 6uSec part as the single thread reading the file cannot generate more records than that. (try it, change the program to only read/process and discard the output) not sure how you got 9uS.
To make this fly beyond 67k/sec ...
A) estimate the maximum records per second you can read from the disk to be formatted. While this depends on hardware a lot, a figure of 20Mb/sec is typical for an average laptop. This number will give you the upper bound to aim for, and as you get close you can ease off trying.
B) create a single thread just to read the file and incur the IO delay. This thread should write to large preallocated buffers, say 4Mb each. See http://en.wikipedia.org/wiki/Circular_buffer for a way of managing these. You are looking to hold maybe 1000 records per buffer (guess, but not just 8 ish records!) pseudo code:
while not EOF
Allocate big buffer
While not EOF and not buffer full
Read file using fgets() or whatever
Apply only very small preprocessing, ideally none
Save into buffer
Release buffer for other threads
C) create another thread ( or several if the order of records is not important) to process a ring buffer when it is full, your regex step. This thread in turn writes to another set of output ring buffers (tip, keep the ring buffer control structures apart in memory)
While run-program
Wait/get an input buffer to process, semaphores/mutex/whatever you prefer
Allocate output buffer
Process records from input buffer,
Place result in output buffer
Release output buffer for next thread
Release input buffer for reading thread
D) create you final thread to consume the data. It isn't clear if this output is being written to disk or network, so this might affect the disk reading thread.
Wait/get input buffer from processed records pool
Output records to wherever
Return buffer to processed records pool
Notes.
Preallocate all buffers and pass them back to where they came from. Eg you might have 4 buffers between file reading thread and processing threads, when all 4 are infuse, the file reader waits for one to be free, it doesn't just allocate new buffers.
Try not to memset() buffers if you can avoid it, waste of memory bandwidth.
You won't need many buffers, 6? Per ring buffer?
The system will auto tune to slowest thread ( http://en.wikipedia.org/wiki/Theory_of_constraints ) so if you can read and prepare data faster than you want to output it, all the buffers will fill up and everything will pause except the output.
As the threads are passing reasonable amounts of data each sync point, the overhead of this will not matter too much.
The above design is how some of my code reads CSV files as quick as possible, basically it all comes to to input IO bandwidth as limiting factor.

Using libpcap, is there a way to determine the file offset of a captured packet from an offline pcap file?

I'm writing a program to reconstruct TCP streams captured by Snort. Most of the examples I've read regarding session reconstruction either:
load the entire pcap file in to memory to start with (not a solution because of hardware constraints and the fact that some of the capture files are 10 GB in size), or
cache each packet in memory as it reads through the capture and discards the irrelevant ones as it goes; this presents basically the same problems as reading the entire file in to memory
My current solution was to write my own pcap file parser since the format is simple. I save the offsets of each packet in a vector and can reload each one after I've passed it. This, like libpcap, only streams one packet in to memory at a time; I am only using sequence numbers and flags for ordering, NOT the packet data. Unlike libpcap, it is noticeably slower. processing a 570 MB capture with libpcap takes roughly 0.9 seconds whereas my code takes 3.2 seconds. However, I have the advantage of being able to seek backwards without reloading the entire capture.
If I were to stick with libpcap for speed issues, I was thinking I could just make a currentOffset variable with an initial value of 24 (the size of the pcap file global header), push it to a vector every time I load a new packet, and increment it every time I call pcap_next_ex by the size of the packet + 16 (for the size of the pcap record header). Then, whenever I wanted to read an individual packet, I could load it using conventional means and seek to packetOffsets[packetNumber].
Is there a better way to do this using libpcap?
Solved the problem myself.
Before I call pcap_next_ex, I push ftell(pcap_file(myPcap)) in to a vector<unsigned long>. I manually parse the packets after that as needed.
EZPZ. It just took 24+ hours of brain wrack...