Boost Beast WebSockets with several indeterminate writes per read - c++

Using boost/beast websockets in C++
I've read up on the issues with beast websockets not supporting non-blocking reads, and the fact that there's no way to check if data is available, and that doing reads and writes in separate threads is probably not thread safe.
The issue I have, then, is figuring out the correct approach to this problem:
The IBM Watson speech-to-text WebSockets API allows you to send chunks of audio data as they become available (or in pieces from an existing file.) However, you do not get text replies for each chunk.
Instead, you keep sending it audio data until it recognizes a pause or an end of utterance, and then it finally sends back some results.
In other words, you may have to do several writes before a read will return anything, and there's no way to predict how many writes you will have to do.
Without a non-blocking read function, and without putting the blocking read in a separate thread, how do I keep sending data and then only retrieving results when they're available?

Don't confuse the lack of thread safety with a lack of full-duplex capability. You can call async_read and then follow it with a call to async_write. This will result in two "pending" asynchronous operations. The write operation will complete shortly afterwards, and the read operation will remain pending until a message is received.
Asio's asynchronous model is "reactive." That means that your completion handler gets called when something happens. You don't "check to see if data is available." Beast doesn't reinvent the wheel here, it adopts the asynchronous model of Asio. If you understand how to write asynchronous network programs with Asio, this knowledge will transfer over to Beast.

Related

What should I do if boost::beast write_some doesn't write everything?

I am sending data on a boost::beast::websocket
I would like to send the data synchronously, so am trying to decide if I should use write or write_some.
From this SO answer (which is about asio rather than beast specifically, but I assume(!) the same rules apply?) I understand that write will block until the entire message is confirmed sent, whereas write_some may return early, and will return the number of bytes sent which may not be all the bytes which were requested be sent.
In my particular use-case I am using a single thread, and the write is done from within this thread's context (ie: from inside a callback issued after entering io_context.run())
Since I don't want to block the caller for some indeterminate amount of time, I want to avoid using write if there is a more elegant solution.
So if I then turn to async_write, I am uncertain what I should do if the number of bytes is less than the number of bytes I requested be sent?
How I would normally handle this with standard tcp sockets is use non-blocking mode, and when I get back EWOULDBLOCK, enqueue the data and carry on. When the socket becomes writeable again, only then complete the write (much akin to an asio async_write). Since non-blocking is not supported in beast, I'm wondering what the analogous approach is?
Presumably I need to perform some additional write operation to ensure the rest of the bytes are sent in due course?
The beast docs say
Callers are responsible for synchronizing operations on the socket
using an implicit or explicit strand, as per the Asio documentation.
The websocket stream asynchronous interface supports one of each of
the following operations to be active at the same time:
async_read or async_read_some
async_write or async_write_some
async_ping or async_pong
async_close
Is it ok to start an async write of the remaining bytes, so long as I ensure that a new synchronous write/write_some isn't started before the outstanding async write has completed?
If I cannot start an async write to complete the send of the remaining bytes, how is one supposed to handle a synchronous write_some which doesn't completely send all bytes?
As to why I don't just use async_write always, I have additional slow processing to do after the attempt to write, such as logging etc. Since I am using a single thread, and the call to async_write happens within that thread, the write will only occur after I return control to the event loop.
So what I'd like to do is attempt to write synchronously (which will work in 90% of the cases) so the data is sent, and then perform my slow tasks which would otherwise delay the write. In the 10% of cases where a sync write doesn't complete immediately, then an alternative async_write operation should be employed - but only in the fallback situation.
Possibly related: I see that write_some has a flag fin, which should be set to true if this is the last part of the message.
I am only ever attempting to write complete messages, so should I always use true for this?

What happens in boost::asio when TCP TX buffer fills up?

I'm trying to get to grips with boost asio but I'm having trouble understanding some of the behavior behind the asynchronous interface.
I have a simple setup with a client and a server.
The client calls async_write regularly with a fixed amount of data
The server polls for data regularly
What happens when the server stops polling for data ?
I guess the various buffers would fill up in the server OS and it would stop sending ACKs ?
Regardless of what happens it seems that the client can happily continue to send several gigabytes of data without receiving any error callback (doesn't receive any success either of course).
I assume the client OS stops accepting packets at one point since they can't be TX'ed ?
Does this means that boost::asio buffers data internally ?
If it does, can I use socket.cancel() to drop packets in case I don't want to wait for delivery ? (I need to make sure ASIO forgets about my packets so I can reuse old buffers for new packets)
asio doesn't buffer internally. And you will always get signaled if you can't transfer more data to the remote.
E.g. if you use synchronous writes in asio they will block until the data could be sent (or at least be copied into the kernel send buffers). If you use async writes the callback/acknowledgement will only be called once it could be sent. If you use nonblocking writes you get EAGAIN/WOULD_BLOCK errors. If you use multiple async_write's in parallel - well - you shouldn't do that, it's behavior is undefined according to the asio docs:
This operation is implemented in terms of zero or more calls to the stream's async_write_some function, and is known as a composed operation. The program must ensure that the stream performs no other write operations (such as async_write, the stream's async_write_some function, or any other composed operations that perform writes) until this operation completes.
Guarantee in your application that you always only perform a single async write operation and once that finishes write the next piece of data. If you need to write data in between you would need to buffer that inside your application.

send the full contents of a ring buffer on subscription and then send new data

I'm a beginner in boost::asio.
I need to code a module which reads from a pipe and puts the data into a ring buffer (I've no problem in how to implement this part).
Another part of the module waits for a consumer to open a new TCP connection or unix domain socket and when the connection is made it sends the full ring buffer contents and then it will send any new data as soon as it is pushed into the ring buffer. Multiple consumers are allowed and one consumer can open a new connection at any time.
The first naive implementation I thought of is to keep a separate asio::streambuf for every connection and push the entire ring buffer into it on connection and then every new data, but it seems a very sub-optimal method to do it both in memory and cpu cycles as data has to be copyed for every connection, maybe multiple times as I don't know if boost::asio::send (or the linux tcp/ip stack) does a copy of the data.
As my idea is to use no multi threading at all, I'm thinking of using some form of custom asio::streambuf derived class which shares the actual buffer with the ring buffer, but keeps a separate state of the read pointer without the need of any lock.
It seems mine it is a pretty unusual need, because I'm unable to find any related documentation/question which deals with a similar subject and the boost documentation seems pretty brief and scarce to me (see e.g.: http://www.boost.org/doc/libs/1_57_0/doc/html/boost_asio/reference/basic_streambuf.html).
It would be nice if someone could point me to some ideas that I could take as starting point to implement my design or point me to an alternative design if he/she considers mine bad, un-implementable and/or improvable.
You should just do what you intend to.
You absolutely don't need a streambuf to use with Boost Asio: http://www.boost.org/doc/libs/release/doc/html/boost_asio/reference/buffer.html
If the problem is how to avoid having the producer "wait" until all consumers (read: connections) are done transmitting the data, you can always use ye olde trick of alternating output buffers.
Many ring buffer implementations allow direct splicing of a complete sequence of elements at once, (e.g. boost lockfree spsc_queue cache memory access). You could use such an operation to your advantage.
Also relevant:
TCP Zero copy using boost
It appears, that performance is a topic here.
Independent of whether boost::asio is used or some hand knitted solution, performance (throughput) might be down the the drain already by the fact (as stated in the comment section of the OP), that single bytes are being traded (read from the pipe).
After the initial "burst phase" when a consumer connects, single bytes trickle from the pipe to the connected consumer sockets with read() and write() operations per byte (or a few bytes, if the application is not constantly polling).
Given that (the fact that the price for system calls read() and write() is paid for small amounts of data), I dare theorize that anything about multiple queues or single queue etc. is already in the shadow of that basic "design flaw". I put "design flaw" in quotes as it cannot always be avoided to have to handle exactly such a situation.
So, if throughput cannot be optimized anyway, I would recommend the most simple and straightforward solution which can be conceived.
The "no threads" statement in the OP implies non-blocking file descriptors for both the accept socket, the consumer data sockets and the pipe. Will this be another 100% CPU/core eating polling application? If this is not some kind of special ops hyper-optimized problem, I would rather not advice to use non-blocking file descriptors. Also, I would not worry about zero-copy or not.
One easy approach with threads would be to have the consumer sockets non-blocking, while pipe is in blocking mode. The thread which reads the pipe then pumps the data into a queue and calls the function which services all currently connected consumers. The listen socket (the one calling accept()) is in signaled state, when new client connections are pending. With mechanisms like kqueue (bsd) or epoll (linux etc.) or WaitForMultipleObjects (windows), the pipe reader thread can react to that situation as well.
In the times when nothing is to be done, your application is sleeping/blocking and friendly to our environment :)

multi socket architecture with asio

I have a Client - Server architecture with 10 Servers with permanent connections with a single Client, the software is written in C++ and uses boost asio libraries.
All the connections are created in the initialization phase, and they are always open during the execution.
When the client needs some information, sends a request to all of the servers. Each server finds the information needed and answers to the client.
In the client there is a single thread that is in charge of receiving the messages from all of the sockets, in particular, I use only one io_services, and one async_read from each of the sockets.
When a message arrives in one of the sockets, the async_read read the first N bit that are the header of the message and than call a function that uses read (synchronous) to read the rest of the message. To the server side, the header and the rest of the message are sent with a single write (synchronous).
Then, the architecture works properly, but I noticed that sometimes the synchronous readtakes more time (~0.24 sec) than the usual.
In theory the data is ready to be read because the synchronous read is called when the async_read has already read the header. I also saw that if I use only one server instead of 10, this problem doesn't occur. Furthermore, I noticed that this problem is not caused because of the dimension of the message.
Is it possible that the problem occurs because the io_service is not able to handle all the 10 async_read? In particular, if all the sockets receive a message at the same time, could the io_service lost some time to manage the queues and slows down my synchronous read?
I haven't posted the code, because is difficult to estract it from the project, but if you don't understand my description I could write an example.
Thank you.
1) When async.read completion handler gets invoked, it doesn't mean that some data is available, it means that all the available to that moment data has already been read (unless you specified a restricting completion-condition). So the subsequent sync.read might wait until some more data arrives.
2) Blocking a completion handler is a bad idea, because you actually block all the other completion handlers and other functors posted to that io_service. Consider changing your design.
If you go for an asynchronous design, don't mix in some synchronous parts. Replace all your synchronous reads and writes with asynchronous ones. Both reads and writes will block your thread while the asynchronous variants will not.
Further, if you know the number of expected bytes exactly after reading the header you should request exactly that number of bytes.
If you don't know it, you could go for a single async_read_some with the size of the biggest message you expect. async_read_some will notify you how many bytes were actually read.

Program structure for bi-directional TCP communication using Boost::Asio

First off, I hope my question makes sense and is even possible! From what I've read about TCP sockets and Boost::ASIO, I think it should be.
What I'm trying to do is to set up two machines and have a working bi-directional read/write link over TCP between them. Either party should be able to send some data to be used by the other party.
The first confusing part about TCP(/IP?) is that it requires this client/server model. However, reading shows that either side is capable of writing or reading, so I'm not yet completely discouraged. I don't mind establishing an arbitrary party as the client and the other as the server. In my application, that can be negotiated ahead of time and is not of concern to me.
Unfortunately, all of the examples I come across seem to focus on a client connecting to a server, and the server immediately sending some bit of data back. But I want the client to be able to write to the server also.
I envision some kind of loop wherein I call io_service.poll(). If the polling shows that the other party is waiting to send some data, it will call read() and accept that data. If there's nothing waiting in the queue, and it has data to send, then it will call write(). With both sides doing this, they should be able to both read and write to each other.
My concern is how to avoid situations in which both enter into some synchronous write() operation at the same time. They both have data to send, and then sit there waiting to send it on both sides. Does that problem just imply that I should only do asynchronous write() and read()? In that case, will things blow up if both sides of a connection try to write asynchronously at the same time?
I'm hoping somebody can ideally:
1) Provide a very high-level structure or best practice approach which could accomplish this task from both client and server perspectives
or, somewhat less ideally,
2) Say that what I'm trying to do is impossible and perhaps suggest a workaround of some kind.
What you want to do is absolutely possible. Web traffic is a good example of a situation where the "client" sends something long before the server does. I think you're getting tripped up by the words "client" and "server".
What those words really describe is the method of connection establishment. In the case of "client", it's "active" establishment; in the case of "server" it's "passive". Thus, you may find it less confusing to use the terms "active" and "passive", or at least think about them that way.
With respect to finding example code that you can use as a basis for your work, I'd strongly encourage you to take a look at W. Richard Stevens' "Unix Network Programming" book. Any edition will suffice, though the 2nd Edition will be more up to date. It will be only C, but that's okay, because the socket API is C only. boost::asio is nice, but it sounds like you might benefit from seeing some of the nuts and bolts under the hood.
My concern is how to avoid situations
in which both enter into some
synchronous write() operation at the
same time. They both have data to
send, and then sit there waiting to
send it on both sides. Does that
problem just imply that I should only
do asynchronous write() and read()? In
that case, will things blow up if both
sides of a connection try to write
asynchronously at the same time?
It sounds like you are somewhat confused about how protocols are used. TCP only provides a reliable stream of bytes, nothing more. On top of that applications speak a protocol so they know when and how much data to read and write. Both the client and the server writing data concurrently can lead to a deadlock if neither side is reading the data. One way to solve that behavior is to use a deadline_timer to cancel the asynchronous write operation if it has not completed in a certain amount of time.
You should be using asynchronous methods when writing a server. Synchronous methods are appropriate for some trivial client applications.
TCP is full-duplex, meaning you can send and receive data in the order you want. To prevent a deadlock in your own protocol (the high-level behaviour of your program), when you have the opportunity to both send and receive, you should receive as a priority. With epoll in level-triggered mode that looks like: epoll for send and receive, if you can receive do so, otherwise if you can send and have something to send do so. I don't know how boost::asio or threads fit here; you do need some measure of control on how sends and receives are interleaved.
The word you're looking for is "non-blocking", which is entirely different from POSIX asynchronous I/O (which involves signals).
The idea is that you use something like fcntl(fd,F_SETFL,O_NONBLOCK). write() will return the number of bytes successfully written (if positive) and both read() and write() return -1 and set errno = EAGAIN if "no progress can be made" (no data to read or write window full).
You then use something like select/epoll/kqueue which blocks until a socket is readable/writable (depending on the flags set).