How to share rte_keepalive across different applications? - dpdk

In the DPDK Keep Alive Sample Application, each slave core accesses the global rte_global_keepalive_info to mark itself as still alive.
Consider the case when you have a master application that uses core 1, and a slave application that uses core 2. The master application needs to regularly check if the slave application is still alive. So the master creates rte_global_keepalive_info and expects that the slave will regularly call rte_keepalive_mark_alive() using this variable.
If, however, the master and slave applications cannot share global variables as they are distinct processes with separate memory allocations, how is it possible for the slave application to "mark alive" the rte_global_keepalive_info created by the master application? Should the master still use rte_keepalive_create() to create the rte_global_keepalive_info variable?

Basically, both processes should use some form of Inter Process Communication, for example, use a shared memory with shm_open(3)
There is an example for that, please have a look at keepalive shared memory management
and
keepalive Agent example

Related

Does DPDK support to launch a new thread by slave?

Documentation of DPDK shows that rte_eal_remote_launch can only be called by master lcore. Does it mean I can only launch extra threads in runtime with master thread? Can I assign it to slave threads?
ps: There is another question. The documentation also said things like:
Note: This function is not designed to offer optimum performance. It
is just a practical way to launch a function on another lcore at
initialization time.
What does it mean? Is there another more effiective way to launch a thread?
Based on the logical cores specified as part of EAL Arguments, one can choose either a single or multiple cores for DPDK LCORE. The DPDK API rte_eal_remote_launch helps to start a specific function with arguments under DPDK context on specific LCORE. If not invoked with the API, the thread will be in wait state (created and waiting to run user application). One should use the API in the context of the master core for launching new functions to be running under DPDK EAL context.
Can I assign it to slave threads?
[Answer] One run alternative threads or worker threads other than DPDK lcore threads also by using
Service cores using rte_service API
non-dpdk or non-eal threads using rte_thread_register
Please refer the DPDK documentation for sample with explanation for use of rte_remote_launch

Running multiple app instances on a single container in PCF

We have an internal installation of PCF.
A developer wants to push a stateless (obeys 12 factor rules) nodejs app which will spawn other app instances i.e leverage nodejs clustering as per https://nodejs.org/api/cluster.html. Hence there would be multiple processes running on each container. Any known issues with this from a PCF perspective? I appreciate it violates the rule/suggestion of one app instance per container but that is just a suggestion :) All info welcome.
Regards
John
When running an application on Cloud Foundry that spawns child processes, the number one thing you need to watch out for is memory consumption. You set a memory limit when you push your application which is for the entire container. That includes the parent process, whatever child processes are spawned and a little overhead (currently init process, sshd process & health check).
Why is this a problem? Most buildpacks make the assumption that only one process will be running and that it will consume as much memory as possible while staying under the defined memory limit. They attempt to configure the software which is running your application to do this. When you spawn child processes, this breaks the buildpack's assumptions and can create scenarios where your application will exceed the defined memory limit. When this happens, even by one byte, the process will be killed and restarted.
If you're concerned with scaling your application, you should not try to spin off child processes in one extra large container. Instead, let the platform help you and scale up the number of application instances. The platform can easily do this and by using multiple smaller containers you can scale just as well. In fact, if you already have a 12-factor app, it should be well positioned to work in this manner.
Good luck!

Is a cassandra session thread safe? (using cpp driver)

I am developing a multi-threaded application and using Cassandra for the back-end.
Earlier, I created a separate session for each child thread and closed the session before killing the thread after its execution. But then I thought it might be an expensive job so I now designed it like, I have a single session opened at the start of the server and any number of clients can use that session for querying purposes.
Question: I just want to know if this is correct, or is there a better way to do this? I know connection pooling is an option but, is that really needed in this scenario?
It's certainly thread safe in the Java driver, so I assume the C++ driver is the same.
You are encouraged to only create one session and have all your threads use it so that the driver can efficiently maintain a connection pool to the cluster and process commands from your client threads asynchronously.
If you create multiple sessions on one client machine or keep opening and closing sessions, you would be forcing the driver to keep making and dropping connections to the cluster, which is wasteful of resources.
Quoting this Datastax blog post about 4 simple rules when using the DataStax drivers for Cassandra:
Use one Cluster instance per (physical) cluster (per application
lifetime)
Use at most one Session per keyspace, or use a single
Session and explicitely specify the keyspace in your queries
If you execute a statement more than once, consider using a PreparedStatement
You can reduce the number of network roundtrips and also have atomic operations by using Batches
The C/C++ driver is definitely thread safe at the session and future levels.
The CassSession object is used for query execution. Internally, a session object also manages a pool of client connections to Cassandra and uses a load balancing policy to distribute requests across those connections. An application should create a single session object per keyspace as a session object is designed to be created once, reused, and shared by multiple threads within the application.
They actually have a section called Thread Safety:
A CassSession is designed to be used concurrently from multiple threads. CassFuture is also thread safe. Other than these exclusions, in general, functions that might modify an object’s state are NOT thread safe. Objects that are immutable (marked ‘const’) can be read safely by multiple threads.
They also have a note about freeing objects. That is not thread safe. So you have to make sure all your threads are done before you free objects:
NOTE: The object/resource free-ing functions (e.g. cass_cluster_free, cass_session_free, … cass_*_free) cannot be called concurrently on the same instance of an object.
Source:
http://datastax.github.io/cpp-driver/topics/

Redundancy without central control point?

If it possible to provide a service to multiple clients whereby if the server providing this service goes down, another one takes it's place- without some sort of centralised "control" which detects whether the main server has gone down and to redirect the clients to the new server?
Is it possible to do without having a centralised interface/gateway?
In other words, its a bit like asking can you design a node balancer without having a centralised control to direct clients?
Well, you are not giving much information about the "service" you are asking about, so I'll answer in a generic way.
For the first part of my answer, I'll assume you are talking about a "centralized interface/gateway" involving ip addresses. For this, there's CARP (Common Address Redundancy Protocol), quoted from the wiki:
The Common Address Redundancy Protocol or CARP is a protocol which
allows multiple hosts on the same local network to share a set of IP
addresses. Its primary purpose is to provide failover redundancy,
especially when used with firewalls and routers. In some
configurations CARP can also provide load balancing functionality. It
is a free, non patent-encumbered alternative to Cisco's HSRP. CARP is
mostly implemented in BSD operating systems.
Quoting the netbsd's "Introduction to CARP":
CARP works by allowing a group of hosts on the same network segment to
share an IP address. This group of hosts is referred to as a
"redundancy group". The redundancy group is assigned an IP address
that is shared amongst the group members. Within the group, one host
is designated the "master" and the rest as "backups". The master host
is the one that currently "holds" the shared IP; it responds to any
traffic or ARP requests directed towards it. Each host may belong to
more than one redundancy group at a time.
This might solve your question at the network level, by having the slaves takeover the ip address in order, without a single point of failure.
Now, for the second part of the answer (the application level), with distributed erlang, you can have several nodes (a cluster) that will give you fault tolerance and redundancy (so you would not use ip addresses here, but "distributed erlang" -a cluster of erlang nodes- instead).
You would have lots of nodes lying around with your Distributed Applciation started, and your application resource file would contain a list of nodes (ordered) where the application can be run.
Distributed erlang will control which of the nodes is "the master" and will automagically start and stop your application in the different nodes, as they go up and down.
Quoting (as less as possible) from http://www.erlang.org/doc/design_principles/distributed_applications.html:
In a distributed system with several Erlang nodes, there may be a need
to control applications in a distributed manner. If the node, where a
certain application is running, goes down, the application should be
restarted at another node.
The application will be started at the first node, specified by the
distributed configuration parameter, which is up and running. The
application is started as usual.
For distribution of application control to work properly, the nodes
where a distributed application may run must contact each other and
negotiate where to start the application.
When started, the node will wait for all nodes specified by
sync_nodes_mandatory and sync_nodes_optional to come up. When all
nodes have come up, or when all mandatory nodes have come up and the
time specified by sync_nodes_timeout has elapsed, all applications
will be started. If not all mandatory nodes have come up, the node
will terminate.
If the node where the application is running goes down, the
application is restarted (after the specified timeout) at the first
node, specified by the distributed configuration parameter, which is
up and running. This is called a failover
distributed = [{Application, [Timeout,] NodeDesc}]
If a node is started, which has higher priority according to
distributed, than the node where a distributed application is
currently running, the application will be restarted at the new node
and stopped at the old node. This is called a takeover.
Ok, that was meant as a general overview, since it can be a long topic :)
For the specific details, it is highly recommended to read the Distributed OTP Applications chapter for learnyousomeerlang (and of course the previous link: http://www.erlang.org/doc/design_principles/distributed_applications.html)
Also, your "service" might depend on other external systems like databases, so you should consider fault tolerance and redundancy there, too. The whole architecture needs to be fault tolerance and distributed for "the service" to work in this way.
Hope it helps!
This answer is a general overview to high availability for networked applications, not specific to Erlang. I don't know too much about what is available in the OTP framework yet because I am new to the language.
There are a few different problems here:
Client connection must be moved to the backup machine
The session may contain state data
How to detect a crash
Problem 1 - Moving client connection
This may be solved in many different ways and on different layers of the network architecture. The easiest thing is to code it right into the client, so that when a connection is lost it reconnects to another machine.
If you need network transparency you may use some technology to sync TCP states between different machines and then reroute all traffic to the new machine, which may be entirely invisible for the client. This is much harder to do than the first suggestion.
I'm sure there are lots of things to do in-between these two.
Problem 2 - State data
You obviously need to transfer the session state from the crashed machine unto the backup machine. This is really hard to do in a reliable way and you may lose the last few transactions because the crashed machine may not be able to send the last state before the crash. You can use a synchronized call in this way to be really sure about not losing state:
Transaction/message comes from the client into the main machine.
Main machine updates some state.
New state is sent to backup machine.
Backup machine confirms arrival of the new state.
Main machine confirms success to the client.
This may potentially be expensive (or at least not responsive enough) in some scenarios since you depend on the backup machine and the connection to it, including latency, before even confirming anything to the client. To make it perform better you can let the client check with the backup machine upon connection what transactions it received and then resend the lost ones, making it the client's responsibility to queue the work.
Problem 3 - Detecting a crash
This is an interesting problem because a crash is not always well-defined. Did something really crash? Consider a network program that closes the connection between the client and server, but both are still up and connected to the network. Or worse, makes the client disconnect from the server without the server noticing. Here are some questions to think about:
Should the client connect to the backup machine?
What if the main server updates some state and send it to the backup machine while the backup have the real client connected - will there be a data race?
Can both the main and backup machine be up at the same time or do you need to shut down work on one of them and move all sessions?
Do you need some sort of authority on this matter, some protocol to decide which one is master and which one is slave? Who is that authority? How do you decentralise it?
What if your nodes loses their connection between them but both continue to work as expected (called network partitioning)?
See Google's paper "Chubby lock server" (PDF) and "Paxos made live" (PDF) to get an idea.
Briefly,this solution involves using a consensus protocol to elect a master among a group of servers that handles all the requests. If the master fails, the protocol is used again to elect the next master.
Also, see gen_leader for an example in leader election which works with detecting failures and transferring service ownership.

Isapi filter - state

I have a isapi filer and I want to add a logic based on the incoming domain ( my server farm hosts many domains).
There domain list is dynamic , I can export these domain list into a text file and read it from the isapi , but is there a way to keep this file in memory (is array or linked list) to save the IO call.
similar to global application state .
How are your worker processes distributed across your servers? Do you have one server with one worker process, or multiple servers?
If you have one server with one worker process, you can just read the file into a static array or string to manage it (just make sure you account for concurrent threads reading/modifying it simultaneously)
If you have multiple worker processes on just one server, you can use named shared memory. I've used this before in ISAPI filters to share information, and it works pretty well. It should even take care of concurrency for you. You can read more here: http://msdn.microsoft.com/en-us/library/aa366551%28v=vs.85%29.aspx
If you're spread across multiple servers, you could use a distributed cache like memcached. This is more complex to set up, but it'll give you good performance. There's a thread on setting this up here: C++ api for memcache