Is a cassandra session thread safe? (using cpp driver) - c++

I am developing a multi-threaded application and using Cassandra for the back-end.
Earlier, I created a separate session for each child thread and closed the session before killing the thread after its execution. But then I thought it might be an expensive job so I now designed it like, I have a single session opened at the start of the server and any number of clients can use that session for querying purposes.
Question: I just want to know if this is correct, or is there a better way to do this? I know connection pooling is an option but, is that really needed in this scenario?

It's certainly thread safe in the Java driver, so I assume the C++ driver is the same.
You are encouraged to only create one session and have all your threads use it so that the driver can efficiently maintain a connection pool to the cluster and process commands from your client threads asynchronously.
If you create multiple sessions on one client machine or keep opening and closing sessions, you would be forcing the driver to keep making and dropping connections to the cluster, which is wasteful of resources.
Quoting this Datastax blog post about 4 simple rules when using the DataStax drivers for Cassandra:
Use one Cluster instance per (physical) cluster (per application
lifetime)
Use at most one Session per keyspace, or use a single
Session and explicitely specify the keyspace in your queries
If you execute a statement more than once, consider using a PreparedStatement
You can reduce the number of network roundtrips and also have atomic operations by using Batches

The C/C++ driver is definitely thread safe at the session and future levels.
The CassSession object is used for query execution. Internally, a session object also manages a pool of client connections to Cassandra and uses a load balancing policy to distribute requests across those connections. An application should create a single session object per keyspace as a session object is designed to be created once, reused, and shared by multiple threads within the application.
They actually have a section called Thread Safety:
A CassSession is designed to be used concurrently from multiple threads. CassFuture is also thread safe. Other than these exclusions, in general, functions that might modify an object’s state are NOT thread safe. Objects that are immutable (marked ‘const’) can be read safely by multiple threads.
They also have a note about freeing objects. That is not thread safe. So you have to make sure all your threads are done before you free objects:
NOTE: The object/resource free-ing functions (e.g. cass_cluster_free, cass_session_free, … cass_*_free) cannot be called concurrently on the same instance of an object.
Source:
http://datastax.github.io/cpp-driver/topics/

Related

Lock service using SQL DataBase

I have a requirement to synchronize concurrent access to a shared resource modified by different processes which run on different hosts. I am thinking to synchronize this by creating a lock table in a sql database which is accessible from a service that can access the database. All the process will first request for lock from the service and only the one getting the lock will go forward and change the shared resource. Processes will then release the lock after their computation. The lock table will hold information like host, pid, lock creation time of the process currently holding the lock so as to clear the lock if the current process holding the lock has died unexpectedly and some other process has requested for the lock.
I am not inclined for a zookeeper based solution as the traffic in my case is minimal(2-5 process may run in a single day and so the probability of concurrent access is already minimal) and so I am not thinking to maintain a separate service for lock but extend one of the existing service itself by adding an additional table in its database.
I wanted suggestions on this approach or if there is some other simpler solution for this problem.

C++ gRPC thread number configuration

Does gRPC server/client have any concept of thread pool for connections? That it is possible to reuse threads, pre-allocate threads, queue request on thread limit reached, etc.
If no, how does it work, is it just allocating/destroying a thread when it need, without any limitation and/or reuse? If yes, is it possible to configure it?
It depends whether you are using sync or async API.
For a sync client, your RPC calls blocks the calling thread, so it is not really relevant. For a sync server, there is an internal threadpool handling all the incoming requests, you can use a grpc::ResourceQuota on a ServerBuilder to limit the max number of threads used by the threadpool.
For async client and server, gRPC uses CompletionQueue as a way for users to define their own threading model. A common way of building clients and servers is to use a user-provided threadpool to run CompletionQueue::Next in each thread. Then once it gets some tag from the Next call, you can cast it to a user-defined type and run some methods to proceed the state. In this case, the user has full control of threads being used.
Note gRPC does create some internal threads, but they should not be used for the majority of the rpc work.

Deliberate Blocking in Akka Actors

I understand that Akka actors should not block in order to stay reactive to messages, but how do I structure my service where I want to monitor a process running for an indefinite period of time?
For example, we are using the Amazon Kinesis Connector library. You create a connector with a given configuration, which inherits from Runnable, and then call the Run() method. The connector simply runs indefinitely, pulling data from Kinesis, and writing it to Amazon S3. In fact, if the runnable returns, then that is an error, and it needs to be restarted.
Approach (1) would be to simply create a child actor for each Kinesis Connector running, and if the Run() method returns, you throw an exception, the Supervising Actor notices the exception and restarts the child actor. One connector per child actor per thread.
Approach (2) would be for the child actor to wrap the Kinesis Connector in a Future, and if the future returns, the actor would restart the Connector in another Future. Conceivably a single actor could manage multiple Connectors, but does this mean each Future is executing in a separate thread?
Which approach would be most in line with the philosophy of Akka, or is there some other approach people recommend? In general, I want to catch any problems with any Connector, and restart it. In total there would not be more than a half dozen Connectors running in parallel.
I would take approach 1. It should be noted though that actors do not have a dedicated thread by default but they share a thread pool (the so called dispatcher, see: http://doc.akka.io/docs/akka/2.3.6/scala/dispatchers.html). This means that blocking is inherently dangerous because it exhausts the threads of the pool not letting other non-blocked actors to run (since the blocked actors do not put the thread back into the pool). Therefore you should separate blocking calls into a fixed size pool of dedicated actors, and you should assign these actors a PinnedDispatcher. This latter step ensures that these actors do not interfere with each other (they each have a dedicated thread) and ensures that these actors do not interfere with the rest of the system (all of the other actors will run on another dispatchers, usually on default-dispatcher). Be sure though to limit the number of actors running on the PinnedDispatcher since the number of used threads will grow with the number of actors on that dispatcher.
Of your two options, I'd say 1 is the more appropriate. No.2 suffers from the fact that, in order to exit from the future monad's world you need to call an Await somewhere, and there you need to specify a max duration which, in your case, does not make sense.
Maybe you could look into other options before going for it, tough. A few keywords that may inspire you are streams and distributed channels.

Connecting to remote services from multiple threaded requests

I have a boost asio application with many threads, similar to a web server, handling hundreds of concurrent requests. Every request will need to make calls to both memcached and redis (via libmemcached and redispp respectively). Is the best practice in this situation to make a separate connection to both redis and memcached from each thread (effectively tripling the open sockets on the server, three per request)? Or is there a way for me to build a static object, with a single memcached/redis connection, and allow all threads to share that single connection? I'm a bit confused when it comes to the thread safety of something like this, and everything needs to be asynchronous between the threads, but blocking for each thread's individual request (so each thread has a linear progression, but many threads can be in different places in their own progression at any given time). Does that make sense?
Thanks so much!
Since memcached have syncronous protocol you should not write next request before you got answer to prevous. So, no other thread can chat in same memcached connection. I'd prefer to make thread-local connection if you work with it in "blocking" mode.
Or you can make it work in "async" manner: make pool of connections, pick a connection from it (and lock it). After request is done, return it to pool.
Also, you can make a request queue and process it in special thread(s) (using multigets and callbacks).

C++ MySQL and multithreading - 1 DB connection per user?

Is in multithreaded application suitable to have 1 connection per 1 connected client? To me it seems ineffective but if there is not connection pooling, how it could be done when one wants to let each connection communicate with DB?
Thanks
If you decide to share a connection amongst threads, you need to be sure that one thread completely finishes with the connection before another uses it (use a mutex, semaphore, or critical section to protect the connections). Alternately, you could write your own connection pool. This is not as hard as it sounds ... make 10 connections (or however big your pool needs to be) on startup and allocate/deallocate them on demand. Again protecting with mutex/cs/sema.
That depends on your architecture.
It sounds like you're using a server->distributed client model? In that case I would implement some sort of a layer for DB access, and hide connection pooling, etc. behind a data access facade.