The play application which we are using, has to be kept always alive. This application is basically a RabbitMQ listener which has an infinite while loop, which keeps listening to capture the messages from Message Queue.
The following code is placed in controller, and this play application has to be kept alive at all times
public class Application extends Controller {
public static Result wanHLPT() {
ckmsg()
return ok();
}
public static Result ckmsg() {
try{
ConnectionFactory factory = new ConnectionFactory();
factory.setHost("localhost");
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
channel.queueDeclare(QUEUE_NAME, false, false, false, null);
while (true) {
QueueingConsumer.Delivery delivery = consumer.nextDelivery();
String message = new String(delivery.getBody());
// business logic
process()
}
}
}
}
But the play application throws akka exception [AskTimeOut Exception: Timed Out],
Please provide any pointers on how to handle this.
Here is the exception:
! #6f977jo6m - Internal server error, for (GET) [/myapp] ->
play.api.Application$$anon$1: Execution exception[[AskTimeoutException: Timed out]]
at play.api.Application$class.handleError(Application.scala:289) ~ [play_2.10.jar:2.1.0]
at play.api.DefaultApplication.handleError(Application.scala:383) ~[play_2.10.jar:2.1.0]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anon$2$$anonfun$handle$1.apply(PlayDefaultUpstreamHandler.scala:132) ~[play_2.10.jar:2.1.0]
at play.core.server.netty.PlayDefaultUpstreamHandler$$anon$2$$anonfun$handle$1.apply(PlayDefaultUpstreamHandler.scala:128) ~[play_2.10.jar:2.1.0]
at play.api.libs.concurrent.PlayPromise$$anonfun$extend1$1.apply(Promise.scala:113) ~[play_2.10.jar:2.1.0]
at play.api.libs.concurrent.PlayPromise$$anonfun$extend1$1.apply(Promise.scala:113) ~[play_2.10.jar:2.1.0]
at play.api.libs.concurrent.PlayPromise$$anonfun$extend$1$$anonfun$apply$1.apply(Promise.scala:104) ~[play_2.10.jar:2.1.0]
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) ~[scala-library.jar:na]
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) ~[scala-library.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0]
at java.lang.Thread.run(Thread.java:781) ~[na:1.7.0]
akka.pattern.AskTimeoutException: Timed out
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:310) ~[akka-actor_2.10.jar:na]
at akka.actor.DefaultScheduler$$anon$8.run(Scheduler.scala:193) ~[akka-actor_2.10.jar:na]
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:137) ~[akka-actor_2.10.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1417) ~[scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:262) ~[scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) ~[scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1478) ~[scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) ~[scala-library.jar:na]
Related
We've implemented a Java gRPC service that runs in the cloud, with an unidirectional (client to server) streaming RPC which looks like:
rpc PushUpdates(stream Update) returns (Ack);
A C++ client (a mobile device) calls this rpc as soon as it boots up, to continuously send an update every 30 or so seconds, perpetually as long as the device is up and running.
ChannelArguments chan_args;
// this will be secure channel eventually
auto channel_p = CreateCustomChannel(remote_addr, InsecureChannelCredentials(), chan_args);
auto stub_p = DialTcc::NewStub(channel_p);
// ...
Ack ack;
auto strm_ctxt_p = make_unique<ClientContext>();
auto strm_p = stub_p->PushUpdates(strm_ctxt_p.get(), &ack);
// ...
While(true) {
// wait until we are ready to send a new update
Update updt;
// populate updt;
if(!strm_p->Write(updt)) {
// stream is not kosher, create a new one and restart
break;
}
}
Now different kinds of network interruptions happen while this is happening:
the gRPC service running in the cloud may go down (for maintenance) or may simply become unreachable.
the device's own ip address keeps changing as it is a mobile device.
We've seen that on such events, neither the channel, nor the Write() API is able to detect network disconnection reliably. At times the client keep calling Write() (which doesn't return false) but the server doesn't receive any data (wireshark doesn't show any activity at the outgoing port of the client device).
What are the best practices to recover in such cases, so that the server starts receiving the updates within X seconds from the time when such an event occurs? It is understandable that there would loss of X seconds worth data whenever such an event happens, but we want to recover reliably within X seconds.
gRPC version: 1.30.2, Client: C++-14/Linux, Sever: Java/Linux
Here's how we've hacked this. I want to check if this can be made any better or anyone from gRPC can guide me about a better solution.
The protobuf for our service looks like this. It has an RPC for pinging the service, which is used frequently to test connectivity.
// Message used in IsAlive RPC
message Empty {}
// Acknowledgement sent by the service for updates received
message UpdateAck {}
// Messages streamed to the service by the client
message Update {
...
...
}
service GrpcService {
// for checking if we're able to connect
rpc Ping(Empty) returns (Empty);
// streaming RPC for pushing updates by client
rpc PushUpdate(stream Update) returns (UpdateAck);
}
Here is how the c++ client looks, which does the following:
Connect():
Create the stub for calling the RPCs, if the stub is nullptr.
Call Ping() in regular intervals until it is successful.
On success call PushUpdate(...) RPC to create a new stream.
On failure reset the stream to nullptr.
Stream(): Do the following a while(true) loop:
Get the update to be pushed.
Call Write(...) on the stream with the update to be pushed.
If Write(...) fails for any reason break and the control goes back to Connect().
Once in every 30 minutes (or some regular interval), reset everything (stub, channel, stream) to nullptr to start afresh. This is required because at times Write(...) does not fail even if there is no connection between the client and the service. Write(...) calls are successful but the outgoing port on the client does not show any activity on wireshark!
Here is the code:
constexpr GRPC_TIMEOUT_S = 10;
constexpr RESTART_INTERVAL_M = 15;
constexpr GRPC_KEEPALIVE_TIME_MS = 10000;
string root_ca, tls_key, tls_cert; // for SSL
string remote_addr = "https://remote.com:5445";
...
...
void ResetStreaming() {
if (stub_p) {
if (strm_p) { // graceful restart/stop, this pair of API are called together, in this order
if (!strm_p->WritesDone()) {
// Log a message
}
strm_p->Finish(); // Log if return value of this is NOT grpc::OK
}
strm_p = nullptr;
strm_ctxt_p = nullptr;
stub_p = nullptr;
channel_p = nullptr;
}
}
void CreateStub() {
if (!stub_p) {
ChannelArguments chan_args;
chan_args.SetInt(GRPC_ARG_KEEPALIVE_TIME_MS, GRPC_KEEPALIVE_TIME_MS);
channel_p = CreateCustomChannel(
remote_addr,
SslCredentials(SslCredentialsOptions{root_ca, tls_key, tls_cert}),
chan_args);
stub_p = GrpcService::NewStub(m_channel_p);
}
}
void Stream() {
const auto restart_time = steady_clock::now() + minutes(RESTART_INTERVAL_M);
while (!stop) {
// restart every RESTART_INTERVAL_M (15m) even if ALL IS WELL!!
if (steady_clock::now() > restart_time) {
break;
}
Update updt = GetUpdate(); // get the update to be sent
if (!stop) {
if (channel_p->GetState(true) == GRPC_CHANNEL_SHUTDOWN ||
!strm_p->Write(updt)) {
// could not write!!
return; // we will Connect() again
}
}
}
// stopped due to stop = true or interval to create new stream has expired
ResetStreaming(); // channel, stub, stream are recreated once in every 15m
}
bool PingRemote() {
ClientContext ctxt;
ctxt.set_deadline(system_clock::now() + seconds(GRPC_TIMEOUT_S));
Empty req, resp;
CreateStub();
if (stub_p->Ping(&ctxt, req, &resp).ok()) {
static UpdateAck ack;
strm_ctxt_p = make_unique<ClientContext>(); // need new context
strm_p = stub_p->PushUpdate(strm_ctxt_p.get(), &ack);
return true;
}
if (strm_p) {
strm_p = nullptr;
strm_ctxt_p = nullptr;
}
return false;
}
void Connect() {
while (!stop) {
if (PingRemote() || stop) {
break;
}
sleep_for(seconds(5)); // wait before retrying
}
}
// set to true from another thread when we want to stop
atomic<bool> stop = false;
void StreamUntilStopped() {
if (stop) {
return;
}
strm_thread_p = make_unique<thread>([&] {
while (!stop) {
Connect();
Stream();
}
});
}
// called by the thread that sets stop = true
void Finish() {
strm_thread_p->join();
}
With this we are seeing that the streaming recovers within 15 minutes (or RESTART_INTERVAL_M) whenever there is a disruption for any reason. This code runs in a fast path, so I am curious to know if this can be made any better.
I am trying to write a long running Subscriber service in Java. I have set up the Listeners to listen to any failures inside the Subscriber service. I am trying to make this fault tolerant and I do not quite understand few things, Below are my doubts/questions.
I have followed the basic setup shown here https://github.com/googleapis/google-cloud-java/blob/master/google-cloud-examples/src/main/java/com/google/cloud/examples/pubsub/snippets/SubscriberSnippets.java. Specifically, I have setup addListener as shown below.
As shown in the following code, initializeSubscriber acts a state variable which will determine if the Subscriber service should restart. Inside the while loop, this variable is continuously monitored to determine if the restart is required.
My question here is,
1. How do I raise an exception inside Subscriber.Listener's failed method and capture it in the main while loop. I tried throwing a new Exception() in failed method and catching it in catch block inside while, However, I am unable to compile the code as it is a checked exception.
2. As shown here, I use Java Executor thread to run the Listener. How do I handle the Listener failures ? Will I able to catch Listener failures under general Exception catch block as shown here ?
try {
boolean initializeSubscriber = true;
while (true) {
try {
if (initializeSubscriber) {
createSingleThreadedSubscriber();
addErrorListenerToSubscriber();
subscriber.startAsync().awaitRunning();
initializeSubscriber = false;
}
// Checks the status of subscriber service every minute
Thread.sleep(60000);
} catch (Exception ex) {
LOGGER.error("Could not start the Subscriber service", ex);
cleanupSubscriber();
initializeSubscriber = true;
}
}
} catch (RuntimeException e) {
} finally {
shutdown();
}
private void addErrorListenerToSubscriber() {
subscriber.addListener(
new Subscriber.Listener() {
#Override
public void failed(Subscriber.State from, Throwable failure) throws RuntimeException {
LOGGER.info("Subscriber reached a failed state due to " + failure.getMessage()
+ ",Restarting Subscriber service");
initializeSubscriber = true;
}
},
Executors.newSingleThreadExecutor());
}
private void cleanupSubscriber() {
try {
if (subscriber != null) {
subscriber.stopAsync().awaitTerminated();
}
if (!subscriptionListener.isShutdown()) {
subscriptionListener.shutdown();
}
} catch (Exception ex) {
LOGGER.error("Error in cleaning up Subscriber thread " + ex);
}
}
It should not be necessary to add a listener to the subscriber if you just want to recreate the subscriber on a failure. You could instead catch the exception on awaitTerminated:
try {
boolean initializeSubscriber = true;
while (initializeSubscriber) {
try {
createSingleThreadedSubscriber();
subscriber.startAsync().awaitRunning();
initializeSubscriber = false;
subscriber.awaitTerminated();
} catch (Exception ex) {
LOGGER.error("Error in the Subscriber service", ex);
cleanupSubscriber();
initializeSubscriber = true;
}
}
} catch (RuntimeException e) {
} finally {
shutdown();
}
If the subscriber shutdown successfully because of a call to stopAsync, then awaitTerminated will not throw an exception. If there was some kind of exception, then awaitTerminated will throw an IllegalStateException because the state will be FAILED instead of TERMINATED.
Note that transient errors are handled by the library itself. For example, if the server become briefly unavailable due to a network hiccup, the library will seamlessly reconnect and continue to deliver messages. Failures that result in a change in state for the subscriber are likely permanent failures such as permission issues (where the account running the subscriber does not have permission to subscribe to the subscription) or resource issues (such as the subscription having been deleted). In these permanent failure cases, recreating the subscriber will likely just result in the same error unless one takes manual steps to intervene and fix the problem.
I have a server method that waits for new incoming TCP connections, for each connection I'm creating two threads (detached) for handling various tasks.
void MyClass::startServer(boost::asio::io_service& io_service, unsigned short port) {
tcp::acceptor TCPAcceptor(io_service, tcp::endpoint(tcp::v4(), port));
bool UARTToWiFiGatewayStarted = false;
for (;;) {
auto socket(std::shared_ptr<tcp::socket>(new tcp::socket(io_service)));
/*!
* Accept a new connected WiFi client.
*/
TCPAcceptor.accept(*socket);
socket->set_option( tcp::no_delay( true ) );
MyClass::enableCommunicationSession();
// start one worker thread.
std::thread(WiFiToUARTWorkerSession, socket, this->LINport, this->LINbaud).detach();
// only if this is the first connected client:
if(false == UARTToWiFiGatewayStarted) {
std::thread(UARTToWifiWorkerSession, socket, this->UARTport, this->UARTbaud).detach();
UARTToWiFiGatewayStarted = true;
}
}
}
This works fine for starting the communication, but the problem appears when the client disconnects and connects again (or at least tries to connect again).
When the current client disconnects, I stop the communication (by stopping the internal infinite loops from both functions, then they'll return).
void Gateway::WiFiToUARTWorkerSession(std::shared_ptr<tcp::socket> socket, ...) {
/*!
* various code here...
*/
try {
while(true == MyClass::communicationSessionStatus) {
/*!
* Buffer used for storing the UART-incoming data.
*/
unsigned char WiFiDataBuffer[max_incoming_wifi_data_length];
boost::system::error_code error;
/*!
* Read the WiFi-available data.
*/
size_t length = socket->read_some(boost::asio::buffer(WiFiDataBuffer), error);
/*!
* Handle possible read errors.
*/
if (error == boost::asio::error::eof) {
break; // Connection closed cleanly by peer.
}
else if (error) {
// this will cause the infinite loops from the both worker functions to stop, and when they stop the functions will return.
MyClass::disableCommunicationSession();
sleep(1);
throw boost::system::system_error(error); // Some other error.
}
uart->write(WiFiDataBuffer, length);
}
}
catch (std::exception &exception) {
std::cerr << "[APP::exception] Exception in thread: " << exception.what() << std::endl;
}
}
I expect that when I reconnect the communication should work again (the MyClass::startServer(...) will create and detach again two worker threads that will do the same things.
The problem is that when I connect the second time I get:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
what(): write: Broken pipe
From what I found about this error it seems that the server (this application) sends something via TCP to a client that was disconnected.
What I'm doing wrong?
How can I solve this problem?
A read of length 0 with no error is also an indication of eof. The boost::asio::error::eof error code is normally more useful when you're checking the result of a composed operation.
When this error condition is missed, the code as presented will call write on a socket which has now been shutdown. You have used the form of write which does not take a reference to an error_code. This form will throw if there is an error. There will be an error. The read has failed.
I am trying to get a Scala program to communicate with a c++ program via zeromq by using the request-reply pattern. The scala program should send a request to the C++ program which replies.
However I see the error
org.zeromq.ZMQException: Operation cannot be accomplished in current state
But all I can find in the docu is that one has to read responses before sending a second request. In my case I am issuing a request, followed by a reading of the response (this is where the exception is thrown).
Code of the server:
#include "zmq.hpp"
#include <string>
#include <iostream>
#include <thread>
int main()
{
zmq::context_t context(1);
zmq::socket_t socket(context, ZMQ_REP);
socket.bind("tcp://*:5555");
while (1) {
zmq::message_t request;
socket.recv(&request);
std::string requ = std::string(static_cast<char*>(request.data()), request.size());
std::cout << requ << std::endl;
// Write response
zmq::message_t req(2);
memcpy((void *)req.data(), "ok", 5);
socket.send(req);
}
}
Code of the client:
import org.zeromq.ZMQ
import org.zeromq.ZMQ.{Context, Socket}
object Adapter {
def main( args: Array[String] ) = {
val context = ZMQ.context(1)
val socket = context.socket(ZMQ.REQ)
println { "Connecting to backend" }
socket.connect("tcp://127.0.0.1:5555")
val request = "1 1 1 1".getBytes()
request(request.length - 1) = 0.toByte
println { "Sending Request" }
if (!socket.send(request, 0))
println{ "could not send"}
println { "Receiving Response" }
val reply = socket.recv(0)
println { "Received reply: " + new String(reply, 0, reply.length - 1) }
}
}
The complete output of sbt:
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/jna7980154308052950568.tmp which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Connecting to backend
Sending Request
Receiving Response
[error] (run-main-0) org.zeromq.ZMQException: Operation cannot be accomplished in current state
org.zeromq.ZMQException: Operation cannot be accomplished in current state
at org.zeromq.ZMQ$Socket.raiseZMQException(ZMQ.java:448)
at org.zeromq.ZMQ$Socket.recv(ZMQ.java:368)
at ZeroMQActor$.main(ZeroMQExample.scala:56)
at ZeroMQActor.main(ZeroMQExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
[trace] Stack trace suppressed: run last compile:run for the full output.
java.lang.RuntimeException: Nonzero exit code: 1
at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code: 1
[error] Total time: 5 s, completed Jun 16, 2015 4:42:42 PM
Sbt pulls Scala 2.9.1 and akka-zeromq 2.0. I have installed zeromq 3.5 from source but I see the same behavior when I install the ubuntu package libzqm3-dev. One possible work-around is using JeroMQ, a pure java-based implementation of zmq, but I would prefer to depend on one zmq library in my whole stack rather than dealing with interop issues.
Thanks in advance.
I believe
memcpy((void *)req.data(), "ok", 5);
should be
memcpy((void *)req.data(), "ok", 2);
... which could be enough to break message handling.
I instantiate a PostgreSQL connection through libpqxx. I query the database and get correct response. After that I tried the following error case: after instance of pqxx::connection has been created, I pause my program, manually kill the Postgre's connection process from a Linux's command shell and resume the program. It continues until it tries to create new transaction pqxx::work where it throws pqxx::broken_connection. I handle this exception and try to reconnect with a call to pqxx::connection::activate() but another pqxx::broken_connection gets thrown. How to reconnect to DB without instantiate another pqxx::connection?
P.S. reactivation is not inhibited. I use the standard connection type -
namespace pqxx
{
typedef basic_connection<connect_direct> connection;
}
Ok, nobody has answered. I noticed that after I manually kill the process behind the connection after several successive calls to pqxx::connection::activate, it gets reconnected, so that's my workaround.
class dbconnection : public pqxx::connection
{
public:
dbconnection(std::string options) : pqxx::connection(options) { };
void reconnect()
{
static int times = 0;
try
{
times++;
if(!this->is_open())
{
this->activate();
}
times = 0;
}
catch(const pqxx::broken_connection & e)
{
if(times > 10)
{
times = 0;
return;
}
this->reconnect();
}
};
};
I call dbconnection::reconnect each time after I catch a pqxx::broken_connection. Let me know do you have better solution?