Could I read lines from a C++ socket using Ubuntu? - c++

I wonder if I could read several lines from a C++ socket using Ubuntu?
Please note that every line is to be used for a different purpose (e.g. maybe the first is used as a string and the second as a char array).
I.e. Could I put those two lines directly after each other without encountering any problem?
read(socketFileDescriptor, buffer1, BUFFER_SIZE);
read(socketFileDescriptor, buffer2, BUFFER_SIZE);
Thanks in advance,
Regards,

You call read twice in sequence without any problem in itself.
What you get from each call may not correspond to a single line of input though. read basically just does "raw" reading, just about like it does when reading from a file on disk--if data is available, it will read as much data as necessary to fill the buffer you gave it (up to the size you specified).
TCP treats data as a stream, so data you pass to two (or more) separate calls to write could end up being put into a single packet and transmitted together. On the receiving end, all that data could be read by a single call to read--or, depending on the buffer size you pass, it might read only part of one, or might read all of the first and part of the second, etc.
If you want to read the input as "lines", you could (for one example) create a stream buffer that reads data from a socket, and create an iostream object that parses data from that buffer to read lines. This initially seems attractive to many people (it did to me, anyway), but has never worked out very well, at least for me. Iostreams basically assume a synchronous protocol, but sockets are mostly asynchronous. Trying to treat sockets as synchronous tends to lead to more problems rather than to solutions.

Related

Parser for TCP buffers

I want to implement a protocol to share the data between server and client.
I don't know the correct one. By keeping performance as main criteria can anyone suggest the best protocol to parse the data.
I have one in mind, don't the actual name but it will be like the this
[Header][Message][Header][Message]
the header contains the length of the message and the header size is fixed.
I have tried this with some by performing a lot of concatenation and substring operation which are costlier. can any suggest the best implementation for this
The question is very broad.
On the topic of avoiding buffer/string concatenation, like at Buffer Sequences, described in Boost Asio's "Scatter-Gather" Documentation
For parsing there are two common solutions:
small messages
Receive the data into a buffer, e.g. 64k. Then use pointers into that buffer to parse the header and message. Since the messages are small there can be many messages in the buffer and you would call the parser again as long as there is data in the buffer. Note that the last message in the buffer might be truncated. In which case you have to keep the partial message and read more data into the buffer. If the message is near the end of the buffer then copying it to the front might be necessary.
large messages
With large messages it makes sense to first only read the header. Then parse the header to get the message size, allocate an appropriate buffer for the message and then read the whole message into it before parsing it.
Note: In both cases you might want to handle overly large messages by either skipping them or terminating the connection with an error. For the first case a message can not be larger than the buffer and should be a lot smaller. For the second case you don't want to allocate e.g. a gigabyte to buffer a message if they are supposed to be around 1MB only.
For sending messages it's best to first collect all the output. A std::vector can be sufficient. Or a rope of strings. Avoid copying the message into a larger buffer over and over. At most copy it once at the end when you have all the pieces. Using writev() to write a list of buffers instead of copying them all into one buffer can also be a solution.
As for the best protocol... What's best? Simply sending the data in binary format is fastest but will break when you have different architectures or versions. Something like google protobuffers can solve that but at the cost of some speed. It all depends on what your needs are.

C++-What is the need of both buffer and stream?

Although i have read about buffer and stream and it's working with files in c++ but i don't know what is the need of a buffer if a stream is there, stream is always there to transfer the data of one file to the program. So why do we use buffers to store data(performing same task that stream does) and what are buffered and unbuffered stream.
Consider a stream that writes to a file. If there were no buffer, if your program wrote a single byte to the stream, you'd have to write a single byte to the file. That's very inefficient. So streams have buffers to decouple operations one on side of the stream from operations on the other side of the stream.
Ok lets lets start from the scratch suppose you want to work with files. For this purpose you would have to manage how the data is entered into your file and if the sending of data into the file was successful or not, and all other basic working problems. Now either you can manage all that on your own which would take a lots a time and hard work or What you can do is you can use a stream.
Yes, you can allocate a stream for such purposes. Streams work with abstraction mechanism i.e. we c++ programmers don't know how they are working but we only know that we are at the one side of a stream (on our program's side) we offer our data to the stream and it has the responsibility to transfer data from one end to the other (file's side)
Eg-
ofstream file("abc.txt"); //Here an object of output file stream is created
file<<"Hello"; //We are just giving our data to stream and it transfers that
file.close(); //The closing of file
Now if you work with files you should know that working with files is a really expensive operation i.e. it takes more time to access file than to access memory and we also don't have to perform file operations every time. So programmers created a new feature called buffer which is a part of computer's memory and stores data temporarily for handling files.
Suppose at the place of reading file every time to read data you just read some memory location where all the data of file is copied temporarily.Now it would be a less expensive task as you are reading memory not file.
Those streams that have a buffer for their working i.e. they open the file and by default copy all the data of file to the buffer are called as buffered streams whereas those streams which don't use any buffer are called as unbuffered streams.
Now if you enter data to a buffered stream then that data will be queued up until the stream is not flushed (flushing means replacing the data of buffer with that of file). Unbuffered streams are faster in working (from the point of user at one end of the stream) as data is not temporarily stored into a buffer and is sent to the file as it comes to the stream.
A buffer and a stream are different concepts.
A buffer is a part of the memory to temporarily store data. It can be implemented and structured in various ways. For instance, if one wants to read a very large file, chunks of the file can be read and stored in the buffer. Once a certain chunk is processed the data can be discarded and the next chunk can be read. A chunk in this case could be a line of the file.
Streams are the way C++ handles input and output. Their implementation uses buffers.
I do agree that stream is probably the poorest written and the most badly udnerstood part of standard library. People use it every day and many of them have not a slightest clue how the constructs they use work. For a little fun, try asking what is std::endl around - you might find that some answers are funny.
On any rate, streams and streambufs have different responsibilities. Streams are supposed to provide formatted input and output - that is, translate an integer to a sequence of bytes (or the other way around), and buffers are responsible of conveying the sequence of bytes to the media.
Unfortunately, this design is not clear from the implementation. For instance, we have all those numerous streams - file stream and string stream for example - while the only difference between those are the buffer. The stream code remains exactly the same. I believe, many people would redesign streams if they had their way, but I am afraid, this is not going to happen.

Can TCP data overlap in the buffer

If I keep sending data to a receiver is it possible for the data sent to overlap such that they accumulate in the buffer and so the next read to the buffer reads also the data of another sent data?
I'm using Qt and readAll() to receive data and parse it. This data has some structure in it so I can know if the data is already complete or if it is valid data at all but I'm worried that other data will overlap with others when I call readAll() and so would invalidate this suppose-to-be valid data.
If it can happen, how do I prevent/control it? Or is that something the OS/API worries about instead? I'm worried partly because of how the method is called. lol
TCP is a stream based connection, not a packet based connection, so you may not assume that what is sent in one time will also be received in one time. You still need some kind of protocol to packetize your stream.
For sending strings, you could use the nul-character as separator or you could begin with a header which contains a magic and a length.
According to http://qt-project.org/doc/qt-4.8/qiodevice.html#readAll this function snarfs all the data and returns it as an array. I don't see how the API raises concerns about overlapping data. The array is returned by value, and given that it represents the entire stream, so what would it even overlap with? Are you worried that the returned object actually has reference semantics (i.e. that it just holds pointers to storage that is re-used in other calls to the same function?)
If send and receive buffers overlap in any system, that's a bug, unless special care is taken that the use is completely serialized. (I.e. a buffer is somehow used only for sending and only for receiving, without any mixup.)
Why dont you use a fixed length header followed by variable length packet with the header holding the information of length of packet.
This way you can avoid worrying about packet boundaries. Say for example instead of just sending the string send the length of the string followed by the string. In the receiver end always read the length and then based on the length read the string.

Buffering Incomplete High Speed Reads

I am reading data ~100 bytes at 100hz from a serial port. My buffer is 1024 bytes, so often my buffer doesn't get completely used. Sometimes however, I get hiccups from the serial port and the buffer gets filled up.
My data is organized as a [header]data[checksum]. When my buffer gets filled up, sometimes a message/data is split across two reads from the serial port.
This is a simple problem, and I'm sure there are a lot of different approaches. I am ahead of schedule so I would like to research different approaches. Could you guys name some paradigms that cover buffering in high speed data that might need to be put together from two reads? Note, the main difference I see in this problem from say other buffering I've done (image acquisition, tcp/ip), is that there we are guaranteed full packets/messages. Here a "packet" may be split between reads, which we will only know once we start parsing the data.
Oh yes, note that the data buffered in from the read has to be parsed, so to make things simple, the data should be contiguous when it reaches the parsing. (Plus I don't think that's the parser's responsibility)
Some Ideas I Had:
Carry over unused bytes to my original buffer, then fill it with the read after the left over bytes from the previous read. (For example, we read 1024 bytes, 24 bytes are left at the end, they're a partial message, memcpy to the beginning of the read_buffer_, pass the beginning + 24 to read and read in 1024 - 24)
Create my own class that just gets blocks of data. It has two pointers, read/write and a large chunk of memory (1024 * 4). When you pass in the data, the class updates the write pointer correctly, wraps around to the beginning of its buffer when it reaches the end. I guess like a ring buffer?
I was thinking maybe using a std::vector<unsigned char>. Dynamic memory allocation, guaranteed to be contiguous.
Thanks for the info guys!
Define some 'APU' application-protocol-unit class that will represent your '[header]data[checksum]'. Give it some 'add' function that takes a char parameter and returns a 'valid' bool. In your serial read thread, create an APU and read some data into your 1024-byte buffer. Iterate the data in the buffer, pushing it into the APU add() until either the APU add() function returns true or the iteration is complete. If the add() returns true, you have a complete APU - queue it off for handling, create another one and start add()-ing the remaining buffer bytes to it. If the iteration is complete, loop back round to read more serial data.
The add() method would use a state-machine, or other mechanism, to build up and check the incoming bytes, returning 'true' only in the case of a full sanity-checked set of data with the correct checksum. If some part of the checking fails, the APU is 'reset' and waits to detect a valid header.
The APU could maybe parse the data itself, either byte-by-byte during the add() data input, just before add() returns with 'true', or perhaps as a separate 'parse()' method called later, perhaps by some other APU-processing thread.
When reading from a serial port at speed, you typically need some kind of handshaking mechanism to control the flow of data. This can be hardware (e.g. RTS/CTS), software (Xon/Xoff), or controlled by a higher level protocol. If you're reading a large amount of data at speed without handshaking, your UART or serial controller needs to be able to read and buffer all the available data at that speed to ensure no data loss. On 16550 compatible UARTs that you see on Windows PCs, this buffer is just 14 bytes, hence the need for handshaking or a real time OS.

Reading large txt efficiently in c++

I have to read a large text file (> 10 GB) in C++. This is a csv file with variable length lines. when I try to read line by line using ifstream it works but takes long time, i guess this is becuase each time I read a line it goes to disk and reads, which makes it very slow.
Is there a way to read in bufferes, for example read 250 MB at one shot (using read method of ifstream) and then get lines from this buffer, i see lot of issues with solution like buffer can have incomplete lines etc..
Is there a solution for this in c++ which handles all these cases etc. Are there any open source libraries that can do this for example boost etc ?
Note: I would want to avoid c stye FILE* pointers etc.
Try using the Windows memory mapped file function. The calls are buffered and you get to treat a file as if its just memory.
memory mapped files
IOstreams already use buffers much as you describe (though usually only a few kilobytes, not hundreds of megabytes). You can use pubsetbuf to get it to use a larger buffer, but I wouldn't expect any huge gains. Most of the overhead in IOstreams stems from other areas (like using virtual functions), not from lack of buffering.
If you're running this on Windows, you might be able to gain a little by writing your own stream buffer, and having it call CreateFile directly, passing (for example) FILE_FLAG_SEQUENTIAL_SCAN or FILE_FLAG_NO_BUFFERING. Under the circumstances, either of these may help your performance substantially.
If you want real speed, then you're going to have to stop reading lines into std::string, and start using char*s into the buffer. Whether you read that buffer using ifstream::read() or memory mapped files is less important, though read() has the disadvantage you note about potentially having N complete lines and an incomplete one in the buffer, and needing to recognise that (can easily do that by scanning the rest of the buffer for '\n' - perhaps by putting a NUL after the buffer and using strchr). You'll also need to copy the partial line to the start of the buffer, read the next chunk from file so it continues from that point, and change the maximum number of characters read such that it doesn't overflow the buffer. If you're nervous about FILE*, I hope you're comfortable with const char*....
As you're proposing this for performance reasons, I do hope you've profiled to make sure that it's not your CSV field extraction etc. that's the real bottleneck.
I hope this helps -
http://www.cppprog.com/boost_doc/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_file
BTW, you wrote "i see lot of issues with solution like buffer can have incomplete lines etc.." - in this situation how about reading 250 MB and then read char by char until you get the delimiter to complete the line.