Simple question - is the C++ API provided by Perforce thread-safe? There is no mention of it in the documentation.
By "thread-safe" I mean for server requests from the client. Obviously there will be issues if I have multiple threads trying to set client names and such on the same connection.
But given a single connection object, can I have multiple threads fetching changelists, getting status, translating files through a p4 map, etc.?
Late answer, but... From the release notes themselves:
Known Limitations
The Perforce client-server protocol is not designed to support
multiple concurrent queries over the same connection. For this
reason, multi-threaded applications using the C++ API or the
derived APIs (P4API.NET, P4Perl, etc.) should ensure that a
separate connection is used for each thread or that only one
thread may use a shared connection at a time.
It does not look like the client object has thread affinity, so in order to share a connection between threads, one just has to use a mutex to serialize the calls.
If the documentation doesn't mention it, then it is not safe.
Making something thread-safe in any sense is often difficult and may result in a performance penalty because of the addition of locks. It wouldn't make sense to go through the trouble and then not mention it in the documentation.
Related
We are having a C++ application to send and receive WebSocket messages
One thread to send the message (using WinHttpWebSocketSend)
the second thread to receive (using WinHttpWebSocketReceive)
But the same WebSocket handle is used across these 2 threads. Will it cause any problems? I don't know if we have to handle it another way. It works in our application - we are able to send and receive messages - but I don't know if it will have any problem in the production environment. Any one has better ideas?
Like most platforms, nearly all Windows API system calls do not provide thread barriers beyond preventing simultaneous access to the key parts of the kernel. While I could not say for sure (the documentation doesn't seem to answer your explicit question) I would be surprised if the WinHTTP API provides barriers that prevent multiple threads from stepping on each other (so to speak)--particularly because it's really just a "helper" API that uses the somewhat lower level Winsock stuff directly--and I would take it upon myself to implement the necessary barriers.
I'm also wondering why you're using threads in this manner to begin with. I know essentially nothing about the WinHTTP API, but I did notice WINHTTP_OPTION_ASSURED_NON_BLOCKING_CALLBACKS which leads me to believe that you can implement an asynchronous approach which would prevent any thread-safety issues to begin with (and probably be much faster and memory efficient).
It appears that the callback mechanism for WinHTTP is rather expressive. See WINHTTP_STATUS_CALLBACK. Presumably, you can simply use non-blocking operation, create an event listener, and associate the connection handle with dwContext. No threads involved.
Do I create one strand that all of my SSL sockets share, or one strand per SSL context (shared by any associated sockets)?
Boost.Asio SSL documentation states this, but it doesn't mention contexts. I assume that this means I must use only one strand for everything, but I think this was written before OpenSSL had multithreading support.
SSL and Threads
SSL stream objects perform no locking of their own. Therefore, it is essential that all asynchronous SSL operations are performed in an implicit or explicit strand. Note that this means that no synchronisation is required (and so no locking overhead is incurred) in single threaded programs.
I'm most likely only going to have only one SSL context, but I'm wondering if it's more proper for the strand to be owned by the SSL context, or by the global network service.
I did provide a handler to CRYPTO_set_locking_callback in case that matters.
I think there is some confusion on this thread because there's a few things that need to be clarified. Lets start by asserting that ::asio::ssl::context == SSL_CTX. The two are one.
Second, when using boost::asio::ssl, unless you're doing something crazy where you're bypassing the internal init objects, there's no need for you to manually set the crypto locking callbacks. This is done for you, as you can see in the sources here.
In fact, you may cause issues by doing this, because the destructor for the init object operates under the assumption that they have done this work internally. Take that last bit with a grain of salt, because I have not reviewed this in depth.
Third, I believe you're confusing SSL streams with SSL contexts. For simplicity, think of the SSL stream as the socket, think of the SSL context as separate object that the sockets can use for various SSL functions, such as handshaking with a particular negotiation key, or as a server, providing information about your server certificate to a connected client so that you can handshake with the client.
The mention of strands comes down to preventing possible simultaneous IO against one specific stream (socket), not contexts. Obviously attempting to read into a buffer and write from the same buffer at the same time on the same socket would be an issue. So, when you supply completion handlers wrapped by strands to the various ::asio::async_X methods, you're enforcing a specific ordering to prevent the aforementioned scenario. You can read more in this answer given by someone who knows much more about this than I.
Now as far as contexts go, David Schwartz points out in the comments and another answer he wrote I need to dig up, that whole purpose of contexts themselves are to provide information that facilitates the function of SSL across multiple SSL streams. He appears to imply that they essentially must be thread safe, given their intended purpose. I believe perhaps he is speaking within the context of ::asio::ssl::context, only because of the way that ::asio::ssl correctly employs the thread safety callbacks, or perhaps he's just speaking in the context of using openSSL correctly in a multithreaded program.
Regardless, beyond such comments and answers on SO, and my own practical experience, it's incredibly difficult to come across concrete evidence of this in documentation, or clearly defined boundaries between what is and isn't thread safe. boost::asio::ssl:context is, as David also points out, simply a very thin wrapper around SSL_CTX. I would further add that its meant to give a more "c++ ish" feel to working with the underlying structure(s). It was probably also designed with some intent of decoupling ::asio::ssl and underlying implementation library, but it doesn't achieve this, the two are tightly bound. David mentions again rightly that this thin wrapper is poorly documented, and one must look at the implementation to get insight.
If you start digging into the implementation, there's a rather easy way to find out what is and isn't thread safe when it comes to contexts. You can do a search for CRYPTO_LOCK_SSL_CTX within sources like ssl_lib.c.
int SSL_CTX_set_generate_session_id(SSL_CTX *ctx, GEN_SESSION_CB cb)
{
CRYPTO_w_lock(CRYPTO_LOCK_SSL_CTX);
ctx->generate_session_id = cb;
CRYPTO_w_unlock(CRYPTO_LOCK_SSL_CTX);
return 1;
}
As you can see, CRYPTO_w_lock is used, which brings us back to the official page about openSSL and threads, here, which states:
OpenSSL can safely be used in multi-threaded applications provided
that at least two callback functions are set, locking_function and
threadid_func.
Now we come full circle to the linked asio/ssl/detail/impl/openssl_init.ipp source code in the first paragraph of this answer, where we see:
do_init()
{
::SSL_library_init();
::SSL_load_error_strings();
::OpenSSL_add_all_algorithms();
mutexes_.resize(::CRYPTO_num_locks());
for (size_t i = 0; i < mutexes_.size(); ++i)
mutexes_[i].reset(new boost::asio::detail::mutex);
::CRYPTO_set_locking_callback(&do_init::openssl_locking_func);
::CRYPTO_set_id_callback(&do_init::openssl_id_func);
#if !defined(SSL_OP_NO_COMPRESSION) \
&& (OPENSSL_VERSION_NUMBER >= 0x00908000L)
null_compression_methods_ = sk_SSL_COMP_new_null();
#endif // !defined(SSL_OP_NO_COMPRESSION)
// && (OPENSSL_VERSION_NUMBER >= 0x00908000L)
}
Note of course:
CRYPTO_set_locking_callback
CRYPTO_set_id_callback
So at least in terms of ::asio::ssl::context, thread safety here has nothing to do with strands and everything to do with openSSL working as openSSL is designed to work when used correctly in a multithreaded program.
Coming to the original question, now with all of this explained, David also gave the answer very simply in the comments with:
The most common method is to use one strand per SSL connection.
Take an example of a HTTPS server that serves up content of example.com. The server has a single context, configured with information such as the certificate for example.com. A client connects, this context is used on all connected clients to perform the handshake and such. You wrap your connected client in a new session object, where you handle that client. It is within this session that you would have a single strand, either implicit or explicit, to protect the socket, not the context.
While I'm not an expert by any means and I welcome corrections to this answer, I have put everything that I know about these subjects into practice in an open source transparent filtering HTTPS proxy. It's a little over 50% of comments to code ratio with over 17K lines total, so everything I know is written down there (be it right or wrong ;)). If you'd like to see an example of this stuff in action, you can look at the TlsCapableHttpBridge.hpp source, which acts as both a client a server on a per-host, per-connection basis.
Server contexts and certificates are spoofed/generated once and shared across all clients panning multiple threads. The only manual locking done is during storage and retrieval of the contexts. There is one strand per side of the bridge, one for the real downstream client socket and one for the upstream server connection, although they technically aren't even necessary because the order of operations creates an implicit strand anyway.
Note that the project is under development as I'm rewriting many things, (dep build instructions are not present yet), but everything is functional in terms of the MITM SSL code, so you're looking at a fully functional class and associated components.
UPDATE
The gist of this answer is contested by David Schwarz, whose authority in this area I hold in high esteem.
There are reasons to expect that ssl contexts can be shared between threads - at least for some operations, if only to facilitate SSL session resumption.
I think David has experience with SSL context as OpenSSL uses it. Boost ASIO uses that in turn (at least on all platforms I know of). So, either David writes an answer sharing his knowledge, or you/me would have to spend some time with the OpenSSL documentation and Boost Asio source code to figure out the effective constraints that apply to Boost Asio's ssl::context usage.
Below are the constraints as currently documented.
[old answer text follows]
Thread Safety
In general, it is safe to make concurrent use of distinct objects, but unsafe to make concurrent use of a single object. However, types such as io_service provide a stronger guarantee that it is safe to use a single object concurrently.
Logically, because the documentation doesn't mention thread-safety of the ssl_context class in particular, you must conclude that it is not.
It doesn't matter that you know that the underlying SSL library supports this if you use some particular hooks (like you mention). This only tells you that it might not be hard to make ssl_context thread-aware.
But until you (work with the library devs to) provide this patch, it's not available.
Long story short, you access each ssl_context from a single strand.
I'd say it depends on how your protocol is like. If it's HTTP, there's no need to use an (explicit) strand as you don't read and write to your socket in parallel.
In fact, what would cause problems, is code like this:
void func()
{
async_write(...);
async_read(...);
}
because here -if your io_service() has a POOL of threads associated with it-, actual read and write could be carried out in parallel by several threads.
If you only have one thread per io_service, no need for a strand. The same is true if you're implementing HTTP for example. In HTTP, you don't read and write to the socket in parallel, due to the layout of the protocol. You read a request from the client -though this might be done in several async calls-, then you somehow process request and headers, then you (async or not) send your reply.
Pretty much the same you can also read in ASIO's strand documentation.
Do I create one strand that all of my SSL sockets share, or one strand per SSL context (shared by any associated sockets)?
Boost.Asio SSL documentation states this, but it doesn't mention contexts. I assume that this means I must use only one strand for everything, but I think this was written before OpenSSL had multithreading support.
SSL and Threads
SSL stream objects perform no locking of their own. Therefore, it is essential that all asynchronous SSL operations are performed in an implicit or explicit strand. Note that this means that no synchronisation is required (and so no locking overhead is incurred) in single threaded programs.
I'm most likely only going to have only one SSL context, but I'm wondering if it's more proper for the strand to be owned by the SSL context, or by the global network service.
I did provide a handler to CRYPTO_set_locking_callback in case that matters.
I think there is some confusion on this thread because there's a few things that need to be clarified. Lets start by asserting that ::asio::ssl::context == SSL_CTX. The two are one.
Second, when using boost::asio::ssl, unless you're doing something crazy where you're bypassing the internal init objects, there's no need for you to manually set the crypto locking callbacks. This is done for you, as you can see in the sources here.
In fact, you may cause issues by doing this, because the destructor for the init object operates under the assumption that they have done this work internally. Take that last bit with a grain of salt, because I have not reviewed this in depth.
Third, I believe you're confusing SSL streams with SSL contexts. For simplicity, think of the SSL stream as the socket, think of the SSL context as separate object that the sockets can use for various SSL functions, such as handshaking with a particular negotiation key, or as a server, providing information about your server certificate to a connected client so that you can handshake with the client.
The mention of strands comes down to preventing possible simultaneous IO against one specific stream (socket), not contexts. Obviously attempting to read into a buffer and write from the same buffer at the same time on the same socket would be an issue. So, when you supply completion handlers wrapped by strands to the various ::asio::async_X methods, you're enforcing a specific ordering to prevent the aforementioned scenario. You can read more in this answer given by someone who knows much more about this than I.
Now as far as contexts go, David Schwartz points out in the comments and another answer he wrote I need to dig up, that whole purpose of contexts themselves are to provide information that facilitates the function of SSL across multiple SSL streams. He appears to imply that they essentially must be thread safe, given their intended purpose. I believe perhaps he is speaking within the context of ::asio::ssl::context, only because of the way that ::asio::ssl correctly employs the thread safety callbacks, or perhaps he's just speaking in the context of using openSSL correctly in a multithreaded program.
Regardless, beyond such comments and answers on SO, and my own practical experience, it's incredibly difficult to come across concrete evidence of this in documentation, or clearly defined boundaries between what is and isn't thread safe. boost::asio::ssl:context is, as David also points out, simply a very thin wrapper around SSL_CTX. I would further add that its meant to give a more "c++ ish" feel to working with the underlying structure(s). It was probably also designed with some intent of decoupling ::asio::ssl and underlying implementation library, but it doesn't achieve this, the two are tightly bound. David mentions again rightly that this thin wrapper is poorly documented, and one must look at the implementation to get insight.
If you start digging into the implementation, there's a rather easy way to find out what is and isn't thread safe when it comes to contexts. You can do a search for CRYPTO_LOCK_SSL_CTX within sources like ssl_lib.c.
int SSL_CTX_set_generate_session_id(SSL_CTX *ctx, GEN_SESSION_CB cb)
{
CRYPTO_w_lock(CRYPTO_LOCK_SSL_CTX);
ctx->generate_session_id = cb;
CRYPTO_w_unlock(CRYPTO_LOCK_SSL_CTX);
return 1;
}
As you can see, CRYPTO_w_lock is used, which brings us back to the official page about openSSL and threads, here, which states:
OpenSSL can safely be used in multi-threaded applications provided
that at least two callback functions are set, locking_function and
threadid_func.
Now we come full circle to the linked asio/ssl/detail/impl/openssl_init.ipp source code in the first paragraph of this answer, where we see:
do_init()
{
::SSL_library_init();
::SSL_load_error_strings();
::OpenSSL_add_all_algorithms();
mutexes_.resize(::CRYPTO_num_locks());
for (size_t i = 0; i < mutexes_.size(); ++i)
mutexes_[i].reset(new boost::asio::detail::mutex);
::CRYPTO_set_locking_callback(&do_init::openssl_locking_func);
::CRYPTO_set_id_callback(&do_init::openssl_id_func);
#if !defined(SSL_OP_NO_COMPRESSION) \
&& (OPENSSL_VERSION_NUMBER >= 0x00908000L)
null_compression_methods_ = sk_SSL_COMP_new_null();
#endif // !defined(SSL_OP_NO_COMPRESSION)
// && (OPENSSL_VERSION_NUMBER >= 0x00908000L)
}
Note of course:
CRYPTO_set_locking_callback
CRYPTO_set_id_callback
So at least in terms of ::asio::ssl::context, thread safety here has nothing to do with strands and everything to do with openSSL working as openSSL is designed to work when used correctly in a multithreaded program.
Coming to the original question, now with all of this explained, David also gave the answer very simply in the comments with:
The most common method is to use one strand per SSL connection.
Take an example of a HTTPS server that serves up content of example.com. The server has a single context, configured with information such as the certificate for example.com. A client connects, this context is used on all connected clients to perform the handshake and such. You wrap your connected client in a new session object, where you handle that client. It is within this session that you would have a single strand, either implicit or explicit, to protect the socket, not the context.
While I'm not an expert by any means and I welcome corrections to this answer, I have put everything that I know about these subjects into practice in an open source transparent filtering HTTPS proxy. It's a little over 50% of comments to code ratio with over 17K lines total, so everything I know is written down there (be it right or wrong ;)). If you'd like to see an example of this stuff in action, you can look at the TlsCapableHttpBridge.hpp source, which acts as both a client a server on a per-host, per-connection basis.
Server contexts and certificates are spoofed/generated once and shared across all clients panning multiple threads. The only manual locking done is during storage and retrieval of the contexts. There is one strand per side of the bridge, one for the real downstream client socket and one for the upstream server connection, although they technically aren't even necessary because the order of operations creates an implicit strand anyway.
Note that the project is under development as I'm rewriting many things, (dep build instructions are not present yet), but everything is functional in terms of the MITM SSL code, so you're looking at a fully functional class and associated components.
UPDATE
The gist of this answer is contested by David Schwarz, whose authority in this area I hold in high esteem.
There are reasons to expect that ssl contexts can be shared between threads - at least for some operations, if only to facilitate SSL session resumption.
I think David has experience with SSL context as OpenSSL uses it. Boost ASIO uses that in turn (at least on all platforms I know of). So, either David writes an answer sharing his knowledge, or you/me would have to spend some time with the OpenSSL documentation and Boost Asio source code to figure out the effective constraints that apply to Boost Asio's ssl::context usage.
Below are the constraints as currently documented.
[old answer text follows]
Thread Safety
In general, it is safe to make concurrent use of distinct objects, but unsafe to make concurrent use of a single object. However, types such as io_service provide a stronger guarantee that it is safe to use a single object concurrently.
Logically, because the documentation doesn't mention thread-safety of the ssl_context class in particular, you must conclude that it is not.
It doesn't matter that you know that the underlying SSL library supports this if you use some particular hooks (like you mention). This only tells you that it might not be hard to make ssl_context thread-aware.
But until you (work with the library devs to) provide this patch, it's not available.
Long story short, you access each ssl_context from a single strand.
I'd say it depends on how your protocol is like. If it's HTTP, there's no need to use an (explicit) strand as you don't read and write to your socket in parallel.
In fact, what would cause problems, is code like this:
void func()
{
async_write(...);
async_read(...);
}
because here -if your io_service() has a POOL of threads associated with it-, actual read and write could be carried out in parallel by several threads.
If you only have one thread per io_service, no need for a strand. The same is true if you're implementing HTTP for example. In HTTP, you don't read and write to the socket in parallel, due to the layout of the protocol. You read a request from the client -though this might be done in several async calls-, then you somehow process request and headers, then you (async or not) send your reply.
Pretty much the same you can also read in ASIO's strand documentation.
I need to execute parallel HTTP requests using Libcurl.
From what I understand I need to create a new handle for each thread
and use CURLOPT_WRITEDATA with some kind of Thread Local Storage.
Does the multi interface makes this task a little easier?
I'm also using cookies, does using CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR will make
Libcurl load the cookie file for each thread?
As you probably know, libcurl is not thread safe, so you should ensure that the libcurl handle is never shared between multiple threads. Each thread will need to store local data (among other things, the connection handle).
From this, it ensues that for each handle, i.e., for each thread, libcurl will need to read the cookie file from scratch, since this information cannot be shared. This is not a problem, in my opinion, although there could be issues when updating it (you will have multiple thread attempting it).
About the multi interface, it allows you to multiplex multiple transfers, so it is another approach to what you are trying to do but in a single thread.
UPDATE March 2013
libcurl is now thread-safe.
libcurl is free, thread-safe, IPv6 compatible, feature rich, well supported, fast, thoroughly documented and is already used by many known, big and successful companies and numerous applications."
This is not a direct answer, but why do you need multithreading for parallel HTTP requests?
The multi interface is designed for this purpose: you add multiple handles and then process all of them with one call, all in the same thread. From the documentation:
Enable multiple simultaneous transfers in the same thread without
making it complicated for the application.
If you want multiple threads, I suggest you use the easy interface in each thread, and forget about the multi interface.
Sharing simply shares data between easy handles, you can use the interface with/without the multi interface. If you do use multiple threads, you have to provide your own locking.
Also check out libcurl share interface. It was designed for this purpose, i.e. to share data between requests:
You can have multiple easy handles share data between them. Have them
update and use the same cookie database, DNS cache, TLS session cache!
This way, each single transfer will take advantage from data updates
made by the other transfer(s). The sharing interface, however, does
not share active or persistent connections between different easy
handles.
We're doing a small benchmark of MySQL where we want to see how it performs for our data.
Part of that test is to see how it works when multiple concurrent threads hammers the server with various queries.
The MySQL documentation (5.0) isn't really clear about multi threaded clients. I should point out that I do link against the thread safe library (libmysqlclient_r.so)
I'm using prepared statements and do both read (SELECT) and write (UPDATE, INSERT, DELETE).
Should I open one connection per thread? And if so: how do I even do this.. it seems mysql_real_connect() returns the original DB handle which I got when I called mysql_init())
If not: how do I make sure results and methods such as mysql_affected_rows returns the correct value instead of colliding with other thread's calls (mutex/locks could work, but it feels wrong)
As maintainer of a fairly large C application that makes MySQL calls from multiple threads, I can say I've had no problems with simply making a new connection in each thread. Some caveats that I've come across:
Edit: it seems this bullet only applies to versions < 5.5; see this page for your appropriate version: Like you say you're already doing, link against libmysqlclient_r.
Call mysql_library_init() (once, from main()). Read the docs about use in multithreaded environments to see why it's necessary.
Make a new MYSQL structure using mysql_init() in each thread. This has the side effect of calling mysql_thread_init() for you. mysql_real_connect() as usual inside each thread, with its thread-specific MYSQL struct.
If you're creating/destroying lots of threads, you'll want to use mysql_thread_end() at the end of each thread (and mysql_library_end() at the end of main()). It's good practice anyway.
Basically, don't share MYSQL structs or anything created specific to that struct (i.e. MYSQL_STMTs) and it'll work as you expect.
This seems like less work than making a connection pool to me.
You could create a connection pool. Each thread that needs a connection could request a free one from the pool. If there's no connection available then you either block, or grow the pool by adding a new connection to it.
There's an article here describing the pro's and cons of a connection pool (though it is java based)
Edit: Here's a SO question / answer about connection pools in C
Edit2: Here's a link to a sample Connection Pool for MySQL written in C++. (you should probably ignore the goto statements when you implement your own.)
Seems clear to me from the mySQL Docs that any specific MYSQL structure can be used in a thread without difficulty - using the same MYSQL structure in different threads simultaneously is clearly going to give you extremely unpredictable results as state is stored within the MYSQL connection.
Thus either create a connection per thread or used a pool of connections as suggested above and protect access to that pool (i.e. reserving or releasing a connection) using some kind of Mutex.
MySQL Threaded Clients in C
It states that mysql_real_connect() is not thread safe by default. The client library needs to be compiled for threaded access.