Boost Asio and usage of OpenSSL in multithreaded app [duplicate]

Boost Asio and usage of OpenSSL in multithreaded app [duplicate] - c++

Do I create one strand that all of my SSL sockets share, or one strand per SSL context (shared by any associated sockets)?
Boost.Asio SSL documentation states this, but it doesn't mention contexts. I assume that this means I must use only one strand for everything, but I think this was written before OpenSSL had multithreading support.
SSL and Threads
SSL stream objects perform no locking of their own. Therefore, it is essential that all asynchronous SSL operations are performed in an implicit or explicit strand. Note that this means that no synchronisation is required (and so no locking overhead is incurred) in single threaded programs.
I'm most likely only going to have only one SSL context, but I'm wondering if it's more proper for the strand to be owned by the SSL context, or by the global network service.
I did provide a handler to CRYPTO_set_locking_callback in case that matters.

I think there is some confusion on this thread because there's a few things that need to be clarified. Lets start by asserting that ::asio::ssl::context == SSL_CTX. The two are one.
Second, when using boost::asio::ssl, unless you're doing something crazy where you're bypassing the internal init objects, there's no need for you to manually set the crypto locking callbacks. This is done for you, as you can see in the sources here.
In fact, you may cause issues by doing this, because the destructor for the init object operates under the assumption that they have done this work internally. Take that last bit with a grain of salt, because I have not reviewed this in depth.
Third, I believe you're confusing SSL streams with SSL contexts. For simplicity, think of the SSL stream as the socket, think of the SSL context as separate object that the sockets can use for various SSL functions, such as handshaking with a particular negotiation key, or as a server, providing information about your server certificate to a connected client so that you can handshake with the client.
The mention of strands comes down to preventing possible simultaneous IO against one specific stream (socket), not contexts. Obviously attempting to read into a buffer and write from the same buffer at the same time on the same socket would be an issue. So, when you supply completion handlers wrapped by strands to the various ::asio::async_X methods, you're enforcing a specific ordering to prevent the aforementioned scenario. You can read more in this answer given by someone who knows much more about this than I.
Now as far as contexts go, David Schwartz points out in the comments and another answer he wrote I need to dig up, that whole purpose of contexts themselves are to provide information that facilitates the function of SSL across multiple SSL streams. He appears to imply that they essentially must be thread safe, given their intended purpose. I believe perhaps he is speaking within the context of ::asio::ssl::context, only because of the way that ::asio::ssl correctly employs the thread safety callbacks, or perhaps he's just speaking in the context of using openSSL correctly in a multithreaded program.
Regardless, beyond such comments and answers on SO, and my own practical experience, it's incredibly difficult to come across concrete evidence of this in documentation, or clearly defined boundaries between what is and isn't thread safe. boost::asio::ssl:context is, as David also points out, simply a very thin wrapper around SSL_CTX. I would further add that its meant to give a more "c++ ish" feel to working with the underlying structure(s). It was probably also designed with some intent of decoupling ::asio::ssl and underlying implementation library, but it doesn't achieve this, the two are tightly bound. David mentions again rightly that this thin wrapper is poorly documented, and one must look at the implementation to get insight.
If you start digging into the implementation, there's a rather easy way to find out what is and isn't thread safe when it comes to contexts. You can do a search for CRYPTO_LOCK_SSL_CTX within sources like ssl_lib.c.
int SSL_CTX_set_generate_session_id(SSL_CTX *ctx, GEN_SESSION_CB cb)
{
CRYPTO_w_lock(CRYPTO_LOCK_SSL_CTX);
ctx->generate_session_id = cb;
CRYPTO_w_unlock(CRYPTO_LOCK_SSL_CTX);
return 1;
}
As you can see, CRYPTO_w_lock is used, which brings us back to the official page about openSSL and threads, here, which states:
OpenSSL can safely be used in multi-threaded applications provided
that at least two callback functions are set, locking_function and
threadid_func.
Now we come full circle to the linked asio/ssl/detail/impl/openssl_init.ipp source code in the first paragraph of this answer, where we see:
do_init()
{
::SSL_library_init();
::SSL_load_error_strings();
::OpenSSL_add_all_algorithms();
mutexes_.resize(::CRYPTO_num_locks());
for (size_t i = 0; i < mutexes_.size(); ++i)
mutexes_[i].reset(new boost::asio::detail::mutex);
::CRYPTO_set_locking_callback(&do_init::openssl_locking_func);
::CRYPTO_set_id_callback(&do_init::openssl_id_func);
#if !defined(SSL_OP_NO_COMPRESSION) \
&& (OPENSSL_VERSION_NUMBER >= 0x00908000L)
null_compression_methods_ = sk_SSL_COMP_new_null();
#endif // !defined(SSL_OP_NO_COMPRESSION)
// && (OPENSSL_VERSION_NUMBER >= 0x00908000L)
}
Note of course:
CRYPTO_set_locking_callback
CRYPTO_set_id_callback
So at least in terms of ::asio::ssl::context, thread safety here has nothing to do with strands and everything to do with openSSL working as openSSL is designed to work when used correctly in a multithreaded program.
Coming to the original question, now with all of this explained, David also gave the answer very simply in the comments with:
The most common method is to use one strand per SSL connection.
Take an example of a HTTPS server that serves up content of example.com. The server has a single context, configured with information such as the certificate for example.com. A client connects, this context is used on all connected clients to perform the handshake and such. You wrap your connected client in a new session object, where you handle that client. It is within this session that you would have a single strand, either implicit or explicit, to protect the socket, not the context.
While I'm not an expert by any means and I welcome corrections to this answer, I have put everything that I know about these subjects into practice in an open source transparent filtering HTTPS proxy. It's a little over 50% of comments to code ratio with over 17K lines total, so everything I know is written down there (be it right or wrong ;)). If you'd like to see an example of this stuff in action, you can look at the TlsCapableHttpBridge.hpp source, which acts as both a client a server on a per-host, per-connection basis.
Server contexts and certificates are spoofed/generated once and shared across all clients panning multiple threads. The only manual locking done is during storage and retrieval of the contexts. There is one strand per side of the bridge, one for the real downstream client socket and one for the upstream server connection, although they technically aren't even necessary because the order of operations creates an implicit strand anyway.
Note that the project is under development as I'm rewriting many things, (dep build instructions are not present yet), but everything is functional in terms of the MITM SSL code, so you're looking at a fully functional class and associated components.

UPDATE
The gist of this answer is contested by David Schwarz, whose authority in this area I hold in high esteem.
There are reasons to expect that ssl contexts can be shared between threads - at least for some operations, if only to facilitate SSL session resumption.
I think David has experience with SSL context as OpenSSL uses it. Boost ASIO uses that in turn (at least on all platforms I know of). So, either David writes an answer sharing his knowledge, or you/me would have to spend some time with the OpenSSL documentation and Boost Asio source code to figure out the effective constraints that apply to Boost Asio's ssl::context usage.
Below are the constraints as currently documented.
[old answer text follows]
Thread Safety
In general, it is safe to make concurrent use of distinct objects, but unsafe to make concurrent use of a single object. However, types such as io_service provide a stronger guarantee that it is safe to use a single object concurrently.
Logically, because the documentation doesn't mention thread-safety of the ssl_context class in particular, you must conclude that it is not.
It doesn't matter that you know that the underlying SSL library supports this if you use some particular hooks (like you mention). This only tells you that it might not be hard to make ssl_context thread-aware.
But until you (work with the library devs to) provide this patch, it's not available.
Long story short, you access each ssl_context from a single strand.

I'd say it depends on how your protocol is like. If it's HTTP, there's no need to use an (explicit) strand as you don't read and write to your socket in parallel.
In fact, what would cause problems, is code like this:
void func()
{
async_write(...);
async_read(...);
}
because here -if your io_service() has a POOL of threads associated with it-, actual read and write could be carried out in parallel by several threads.
If you only have one thread per io_service, no need for a strand. The same is true if you're implementing HTTP for example. In HTTP, you don't read and write to the socket in parallel, due to the layout of the protocol. You read a request from the client -though this might be done in several async calls-, then you somehow process request and headers, then you (async or not) send your reply.
Pretty much the same you can also read in ASIO's strand documentation.

Related

If the Proactor Design Pattern is superior for asyncronous I/O, why isn't it default in ASIO?

I recently got bonked pretty hard in a code review, implementing an ASIO UDP socket within an interface adapter; it seems that there was another input UDP socket implemented and both input and output were assumed to be on the same thread. So, I'm wondering why the ASIO socket libraries don't maintain a static thread (socket context) and use that for each socket? What is the motivation and the trade-offs to consider in employing the Proactor pattern?
Edit / Addendum
After seeing some of the comments about my questions being unclear, I'm adding this code snippet based upon the class definition that I was told didn't follow the Proactor pattern:
class InterfaceAdapter{
public:
typedef std::vector<MsgFragment> MsgPackets;
InterfaceAdapter() :
mySocket(myContext) {}
void sendDataToSystem(const DataStruct& originalData);
private:
asio::io_context myContext;
asio::ip::udp::socket mySocket;
MsgPackets transformData(const DataStruct& originalData);
void sendPackets(const MsgPackets& msgs);
};
Apparently I needed to use a globally-scoped asio::io_context instead of having it as a private member & using it to default-construct the socket?

Your questions is not really clear and mixing or implying things, and I assume that is also the cause of your problem:
Boost ASIO is a proactor pattern, asynchronous handlers are executed, usually after the finish of another handler i.e. callback. This can be done concurrently if the user chooses so by running boost::asio::io_context::run on more than 1 thread.
Boost ASIO gives that freedom to you, it makes no sense for this library to restrict itself to this corner case without any motivation. Static or global variables i.e. threads are also considered broadly as extreme bad style.
However your question suggests that your program i.e. architecture is designed to be single threaded, and the code you wrote was using ASIO as one would if using multiple threads (which should not have any significant overhead at runtime anyway), or your reviewer also misunderstood the boost ASIO semantics. Without your code and specific reasons this remains speculation tough.
Addendum to your addendum:
No I does not have to be global, I assume the reviewers point is that you have your own asio::io_context, which you usually don't need, because your class seems to be just about sending packets, so for it, it should be indifferent on which io_context it runs on. Thats the reason boost sockets and alike just take a reference to an io_context, I do this my self, for example a RTP class I wrote. There you can see I just store a reference to the io_context which the overall RTSP Videostreaming server manages.
However I fear your company/reviewer doesn't use boost asio otherwise and your adapter might be the first one using boost asio internal but is not intended to leak this implementation detail out. Then it depends on how your class is used: Will there usually be only one instance the whole program life time, like this? Then it could manage it's own io_context, but I assume you will rather create several instances of it. Imagine the boost tcp connection i.e. socket creating threads and everything just for itself, for every tcp connection a server has, that would be stupid.
So the solution then would be either also just take a io_context& in your constructor, or if then want to avoid the boost details, another class designed by you, which the creator of InterfaceAdapter has to keep, and reuses every time he creates a new interface adapter. However should that not be possible, than you should really refactor your whole program, but that would be the moment where mediocre would start to use globals. But then don't make your InterfaceAdapter or a boost::io_context global, but something like class my_io_singleton which still has to be given into your InterfaceAdpater, so that one better day, your code will be easily refactored.
Update 2
The next thing could off rail you again so I advise you to read it only after reading the above part and after you had done some more implementation with boost asio, as it is not really important nor relevant for your case: To be fair, there a rare occasions where boost io_contexts seem to be singleton like, I stumbled over myself, but that are just comfort functions build in by asio, which one could argue might be better left out. But they can simply be ignored.

Two threads using the same websock handle- does it cause any issue?

We are having a C++ application to send and receive WebSocket messages
One thread to send the message (using WinHttpWebSocketSend)
the second thread to receive (using WinHttpWebSocketReceive)
But the same WebSocket handle is used across these 2 threads. Will it cause any problems? I don't know if we have to handle it another way. It works in our application - we are able to send and receive messages - but I don't know if it will have any problem in the production environment. Any one has better ideas?

Like most platforms, nearly all Windows API system calls do not provide thread barriers beyond preventing simultaneous access to the key parts of the kernel. While I could not say for sure (the documentation doesn't seem to answer your explicit question) I would be surprised if the WinHTTP API provides barriers that prevent multiple threads from stepping on each other (so to speak)--particularly because it's really just a "helper" API that uses the somewhat lower level Winsock stuff directly--and I would take it upon myself to implement the necessary barriers.
I'm also wondering why you're using threads in this manner to begin with. I know essentially nothing about the WinHTTP API, but I did notice WINHTTP_OPTION_ASSURED_NON_BLOCKING_CALLBACKS which leads me to believe that you can implement an asynchronous approach which would prevent any thread-safety issues to begin with (and probably be much faster and memory efficient).
It appears that the callback mechanism for WinHTTP is rather expressive. See WINHTTP_STATUS_CALLBACK. Presumably, you can simply use non-blocking operation, create an event listener, and associate the connection handle with dwContext. No threads involved.

TCP Two-Way Communication using Qt

I am trying to setup a TCP communication framework between two computers. I would like each computer to send data to the other. So computer A would perform a calculation, and send it to computer B. Computer B would then read this data, perform a calculation using it, and send a result back to computer A. Computer A would wait until it receives something from computer B before proceeding with performing another calculation, and sending it to computer B.
This seems conceptually straightforward, but I haven't been able to locate an example that details two-way (bidirectional) communication via TCP. I've only found one-way server-client communication, where a server sends data to a client. These are some examples that I have looked at closely so far:
Server-Client communication
Synchronized server-client communication
I'm basically looking to have two "servers" communicate with each other. The synchronized approach above is, I believe, important for what I'm trying to do. But I'm struggling to setup a two-way communication framework via a single socket.
I would appreciate it greatly if someone could point me to examples that describe how to setup bidirectional communication with TCP, or give me some pointers on how to set this up, from the examples I have linked above. I am very new to TCP and network communication frameworks and there might be a lot that I could be misunderstanding, so it would be great if I could get some clear pointers on how to proceed.

This answer does not go into specifics, but it should give you a general idea, since that's what you really seem to be asking for. I've never used Qt before, I do all my networking code with BSD-style sockets directly or with my own wrappers.
Stuff to think about:
Protocol. Hand-rolled or existing?
Existing protocols can be heavyweight, depending on what your payload looks like. Examples include HTTP and Google ProtoBuf; there are many more.
Handrolled might mean more work, but more controlled. There are two general approaches: length-based and sentinel-based.
Length-based means embedding the length into the first bytes. Requires caring about endianness. Requires thinking about what if a message is longer than can be embedded in the length byte. If you do this, I strongly recommend that you define your packet formats in some data file, and then generate the low-level packet encoding logic.
Sentinel-based means ending the message when some character (or sequence) is seen. Common sentinels are '\0', '\n', and "\r\n". If the rest of your protocol is also text-based, this means it is much easier to debug.
For both designs, you have to think about what happens if the other side tries to send more data than you are willing (or able) to store in memory. In either case, limiting the payload size to a 16-bit unsigned integer is probably a good idea; you can stream replies with multiple packets. Note that serious protocols (based on UDP + crypto) typically have a protocol-layer size limit of 512-1500 bytes, though application-layer may be larger of course.
For both designs, EOF on the socket without having a sentinel means you must drop the message and log an error.
Main loop. Qt probably has one you can use, but I don't know about it.
It's possible to develop simple operations using solely blocking operations, but I don't recommend it. Always assume the other end of a network connection is a dangerous psychopath who knows where you live.
There are two fundamental operations in a main loop:
Socket events: a socket reports being ready for read, or ready to write. There are also other sorts of events that you probably won't use, since most useful information can be found separately in the read/write handlers: exceptional/priority, (write)hangup, read-hangup, error.
Timer events: when a certain time delta has passed, interrupt the wait-for-socket-events syscall and dispatch to the timer heap. If you don't have any, either pass the syscalls notion of "infinity". But if you have long sleeps, you might want some arbitrary, relatively number like "10 seconds" or "10 minutes" depending on your application, because long timer intervals can do all sorts of weird things with clock changes, hibernation, and such. It's possible to avoid those if you're careful enough and use the right APIs, but most people don't.
Choice of multiplex syscall:
The p versions below include atomic signal mask changing. I don't recommend using them; instead if you need signals either add signalfd to the set or else emulate it using signal handlers and a (nonblocking, be careful!) pipe.
select/pselect is the classic, available everywhere. Cannot have more than FD_SETSIZE file descriptors, which may be very small (but can be #defined on the command-line if you're careful enough. Inefficient with sparse sets. Timeout is microseconds for select and nanonseconds for pselect, but chances are you can't actually get that. Only use this if you have no other choice.
poll/ppoll solves the problems of sparse sets, and more significantly the problem of listening to more than FD_SETSIZE file descriptors. It does use more memory, but it is simpler to use. poll is POSIX, ppoll is GNU-specific. For both, the API provides nanosecond granularity for the timeout, but you probably can't get that. I recommend this if you need BSD compatibility and don't need massive scalability, or if you only have one socket and don't want to deal with epoll's headaches.
epoll solves the problem of having to respecify the file descriptor and event list every time. by keeping the list of file descriptors. Among other things, this means that when, the low-level kernel event occurs, the epoll can immediately be made aware, regardless of whether the user program is already in a syscall or not. Supports edge-triggered mode, but don't use it unless you're sure you understand it. Its API only provides millisecond granularity for the timeout, but that's probably all you can rely on anyway. If you are able to only target Linux, I strongly suggest you use this, except possibly if you can guarantee only a single socket at once, in which case poll is simpler.
kqueue is found on BSD-derived systems, including Mac OS X. It is supposed to solve the same problems as epoll, but instead of keeping things simple by using file descriptors, it has all sorts of strange structures and does not follow the "do only one thing" principle. I have never used it. Use this if you need massive scalability on BSD.
IOCP. This only exists on Windows and some obscure Unixen. I have never used it and it has significantly different semantics. Use this, but be aware that much of this post is not applicable because Windows is weird. But why would you use Windows for any sort of serious system?
io_uring. A new API in Linux 5.1. Significantly reducing the number of syscalls and memory copies. Worth it if you have a lot of sockets, but since it's so new, you must provide a fallback path.
Handler implementation:
When the multiplex syscall signifies an event, look up the handler for that file number (some class with virtual functions) and call the relevant events (note there may be more than one).
Make sure all your sockets have O_NONBLOCK set and also disable Nagle's algorithm (since you're doing buffering yourself), except possibly connect's before the connection is made, since that requires confusing logic, especially if you want to play nice with multiple DNS results.
For TCP sockets, all you need is accept in the listening socket's handler, and read/write family in the accept/connected handler. For other sorts of sockets, you need the send/recv family. See the "see also" in their man pages for more info - chances are one of them will be useful to you sometimes, do this before you hard-code too much into your API design.
You need to think hard about buffering. Buffering reads means you need to be able to check the header of a packet to see if there are enough bytes to do anything with it, or if you have to store the bytes until next time. Also remember that you might receive more than one packet at once (I suggest you rethink your design so that you don't mandate blocking until you get the reply before sending the next packet). Buffering writes is harder than you think, since you don't want to be woken when there is a "can write" even on a socket for which you have no data to write. The application should never write itself, only queue a write. Though TCP_CORK might imply a different design, I haven't used it.
Do not provide a network-level public API of iterating over all sockets. If needed, implement this at a higher level; remember that you may have all sorts of internal file descriptors with special purposes.
All of the above applies to both the server and the client. As others have said, there is no real difference once the connection is set up.
Edit 2019:
The documentation of D-Bus and 0MQ are worth reading, whether you use them or not. In particular, it's worth thinking about 3 kinds of conversations:
request/reply: a "client" makes a request and the "server" does one of 3 things: 1. replies meaningfully, 2. replies that it doesn't understand the request, 3. fails to reply (either due to a disconnect, or due to a buggy/hostile server). Don't let un-acknowledged requests DoS the "client"! This can be difficult, but this is a very common workflow.
publish/subscribe: a "client" tells the "server" that it is interested in certain events. Every time the event happens, the "server" publishes a message to all registered "clients". Variations: , subscription expires after one use. This workflow has simpler failure modes than request/reply, but consider: 1. the server publishes an event that the client didn't ask for (either because it didn't know, or because it doesn't want it yet, or because it was supposed to be a oneshot, or because the client sent an unsubscribe but the server didn't process it yet), 2. this might be a magnification attack (though that is also possible for request/reply, consider requiring requests to be padded), 3. the client might have disconnected, so the server must take care to unsubscribe them, 4. (especially if using UDP) the client might not have received an earlier notification. Note that it might be perfectly legal for a single client to subscribe multiple times; if there isn't naturally discriminating data you may need to keep a cookie to unsubscribe.
distribute/collect: a "master" distributes work to multiple "slaves", then collects the results, aka map/reduce any many other reinvented terms for the same thing. This is similar to a combination of the above (a client subscribes to work-available events, then the server makes a unique request to each clients instead of a normal notification). Note the following additional cases: 1. some slaves are very slow, while others are idle because they've already completed their tasks and the master might have to store the incomplete combined output, 2. some slaves might return a wrong answer, 3. there might not be any slaves, 4.
D-Bus in particular makes a lot of decisions that seem quite strange at first, but do have justifications (which may or may not be relevant, depending on the use case). Normally, it is only used locally.
0MQ is lower-level and most of its "downsides" are solved by building on top of it. Beware of the MxN problem; you might want to artificially create a broker node just for messages that are prone to it.

#include <QAbstractSocket>
#include <QtNetwork>
#include <QTcpServer>
#include <QTcpSocket>
QTcpSocket* m_pTcpSocket;
Connect to host: set up connections with tcp socket and implement your slots. If data bytes are available readyread() signal will be emmited.
void connectToHost(QString hostname, int port){
if(!m_pTcpSocket)
{
m_pTcpSocket = new QTcpSocket(this);
m_pTcpSocket->setSocketOption(QAbstractSocket::KeepAliveOption,1);
}
connect(m_pTcpSocket,SIGNAL(readyRead()),SLOT(readSocketData()),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(error(QAbstractSocket::SocketError)),SIGNAL(connectionError(QAbstractSocket::SocketError)),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(stateChanged(QAbstractSocket::SocketState)),SIGNAL(tcpSocketState(QAbstractSocket::SocketState)),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(disconnected()),SLOT(onConnectionTerminated()),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(connected()),SLOT(onConnectionEstablished()),Qt::UniqueConnection);
if(!(QAbstractSocket::ConnectedState == m_pTcpSocket->state())){
m_pTcpSocket->connectToHost(hostname,port, QIODevice::ReadWrite);
}
}
Write:
void sendMessage(QString msgToSend){
QByteArray l_vDataToBeSent;
QDataStream l_vStream(&l_vDataToBeSent, QIODevice::WriteOnly);
l_vStream.setByteOrder(QDataStream::LittleEndian);
l_vStream << msgToSend.length();
l_vDataToBeSent.append(msgToSend);
m_pTcpSocket->write(l_vDataToBeSent, l_vDataToBeSent.length());
}
Read:
void readSocketData(){
while(m_pTcpSocket->bytesAvailable()){
QByteArray receivedData = m_pTcpSocket->readAll();
}
}

TCP is inherently bidirectional. Get one way working (client connects to server). After that both ends can use send and recv in exactly the same way.

Have a look at QWebSocket, this is based on HTTP and it also allows for HTTPS

Boost.Asio SSL thread safety

Do I create one strand that all of my SSL sockets share, or one strand per SSL context (shared by any associated sockets)?
Boost.Asio SSL documentation states this, but it doesn't mention contexts. I assume that this means I must use only one strand for everything, but I think this was written before OpenSSL had multithreading support.
SSL and Threads
SSL stream objects perform no locking of their own. Therefore, it is essential that all asynchronous SSL operations are performed in an implicit or explicit strand. Note that this means that no synchronisation is required (and so no locking overhead is incurred) in single threaded programs.
I'm most likely only going to have only one SSL context, but I'm wondering if it's more proper for the strand to be owned by the SSL context, or by the global network service.
I did provide a handler to CRYPTO_set_locking_callback in case that matters.

UPDATE
The gist of this answer is contested by David Schwarz, whose authority in this area I hold in high esteem.
There are reasons to expect that ssl contexts can be shared between threads - at least for some operations, if only to facilitate SSL session resumption.
I think David has experience with SSL context as OpenSSL uses it. Boost ASIO uses that in turn (at least on all platforms I know of). So, either David writes an answer sharing his knowledge, or you/me would have to spend some time with the OpenSSL documentation and Boost Asio source code to figure out the effective constraints that apply to Boost Asio's ssl::context usage.
Below are the constraints as currently documented.
[old answer text follows]
Thread Safety
In general, it is safe to make concurrent use of distinct objects, but unsafe to make concurrent use of a single object. However, types such as io_service provide a stronger guarantee that it is safe to use a single object concurrently.
Logically, because the documentation doesn't mention thread-safety of the ssl_context class in particular, you must conclude that it is not.
It doesn't matter that you know that the underlying SSL library supports this if you use some particular hooks (like you mention). This only tells you that it might not be hard to make ssl_context thread-aware.
But until you (work with the library devs to) provide this patch, it's not available.
Long story short, you access each ssl_context from a single strand.

I'd say it depends on how your protocol is like. If it's HTTP, there's no need to use an (explicit) strand as you don't read and write to your socket in parallel.
In fact, what would cause problems, is code like this:
void func()
{
async_write(...);
async_read(...);
}
because here -if your io_service() has a POOL of threads associated with it-, actual read and write could be carried out in parallel by several threads.
If you only have one thread per io_service, no need for a strand. The same is true if you're implementing HTTP for example. In HTTP, you don't read and write to the socket in parallel, due to the layout of the protocol. You read a request from the client -though this might be done in several async calls-, then you somehow process request and headers, then you (async or not) send your reply.
Pretty much the same you can also read in ASIO's strand documentation.

Is Perforce's C++ P4API thread-safe?

Simple question - is the C++ API provided by Perforce thread-safe? There is no mention of it in the documentation.
By "thread-safe" I mean for server requests from the client. Obviously there will be issues if I have multiple threads trying to set client names and such on the same connection.
But given a single connection object, can I have multiple threads fetching changelists, getting status, translating files through a p4 map, etc.?

Late answer, but... From the release notes themselves:
Known Limitations
The Perforce client-server protocol is not designed to support
multiple concurrent queries over the same connection. For this
reason, multi-threaded applications using the C++ API or the
derived APIs (P4API.NET, P4Perl, etc.) should ensure that a
separate connection is used for each thread or that only one
thread may use a shared connection at a time.
It does not look like the client object has thread affinity, so in order to share a connection between threads, one just has to use a mutex to serialize the calls.

If the documentation doesn't mention it, then it is not safe.
Making something thread-safe in any sense is often difficult and may result in a performance penalty because of the addition of locks. It wouldn't make sense to go through the trouble and then not mention it in the documentation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js