Occasional high latency in qpid application - c++

I'm hoping someone can help me with an issue I'm seeing with a Qpid C++ application I'm using. Essentially, we have one application publishing a status to a last_value_queue at about a 10Hz rate and a couple other applications continuously processing this status. The receivers also use the status as a kind of heartbeat and will complain if the status message isn't updated for a certain amount of time (500ms, to be exact.)
This works fine for about a day, after which we start seeing issues. Every couple hours, a single fetch call by a receiver will block for over 500ms (sometimes for up to 900ms.) This behavior will continue until we restart the broker.
I'm no expert, but I don't think I'm doing anything particularly dumb. I've been able to repeat this behavior with a pair of small applications that connect to the broker. Every 100ms the sender sends a std::chrono::time_point object set to the current time. The receiver fetches the message and calculates the delay to the millisecond. The delay is always 0ms or 1ms, except for the single spikes every hour or so after the initial day of everything being happy. The connection is created like so:
qpid::messaging::Connection c("host1:5672","{ reconnect: true}");
and the sender and receiver are both created with the string
"testQueue; { mode: browse, create: always, node: { type: queue, x-declare:{ arguments:{'qpid.last_value_queue_key':'key','qpid.replicate':'none'}}}}"
High availability replication is enabled on the broker, but I have it explicitly disabled for everything for the purpose of my testing. I see no difference in behavior when the broker and apps are running on the same host or different hosts on the LAN. Using qpid-stat, I can see that the broker replication queue is still transmitting quite a bit of data, but its message count is always at 0 so I don't think it's sending more than it can handle. Can anyone think of anything I might be missing that could cause this behavior? We're using the Qpid 0.26 and the C++ broker.

Related

High data usage with AWS IOT Core

I have developed an application that does simple publish-subscrible messages with AWS IOT Core service.
As per the requirement of AWS IoT SDK, I need to call aws_iot_mqtt_yield() frequently.
Following is description of function aws_iot_mqtt_yield
Called to yield the current thread to the underlying MQTT client. This
time is used by the MQTT client to manage PING requests to monitor the
health of the TCP connection as well as periodically check the socket
receive buffer for subscribe messages. Yield() must be called at a
rate faster than the keepalive interval. It must also be called at a
rate faster than the incoming message rate as this is the only way the
client receives processing time to manage incoming messages. This is
the outer function which does the validations and calls the internal
yield above to perform the actual operation. It is also responsible
for client state changes
I am calling this function at a period of 1 second.
As it is sending PING on tcp connection, it creates too much internet data usage in long run when system is IDLE for most of the time.
My system works on LTE as well and paying more money for IDLE time is not acceptable for us.
I tried to extend period from 1 second to 30 seconds to limit our data usage but it adds 30 seconds latency in receiving messages from cloud.
My requirement is to achieve fast connectivity with low additional data usage in maintaining connection with AWS.

GAE - how to avoid service request timing out after 1 day

As I explained in this post, I'm trying to scrape tweets from Twitter.
I implemented the suggested solution with services, so that the actual heavy lifting happens in the backend.
The problem is that after about one day, I get this error
"Process terminated because the request deadline was exceeded. (Error code 123)"
I guess this is because the manual scaling has the requests timing out after 24 hours.
Is it possible to make it run for more than 24 hours?
You can't make a single request / task run for more than 24 hours but you can split up your request into different parts and each lasting a day. It's unwise to have a request run indefinitely that's why app engine closes them after a certain time to prevent idling / loopy request that lasts indefinitely.
I would recommend having your task fire a call at the end to trigger the queuing of the next task, that way it's automatic and you don't have to queue a task daily. Make sure there's some cursor or someway for your task to communicate progress so it won't duplicate work.

How to survive a database outage?

I have a web service that is made using spring, hibernate and c3p0. I also have a service wide cache(which has the results of requests ever made to the service) which can be used to return results when the service isn't able to return(due to whatever reason). The cache might return stale results when the database is out but that's ok.
I recently faced a database outage and my service came to a crashing halt.
I want the clients of my service to survive database outages happening ever again in future.
For that, I need my service to:
Handle new incoming requests like this: quickly say that the database is down and throw some exception(fast-fail).
Requests already being processed: Don't last longer than x seconds. How do I make the thread handling the request be interrupted somehow.
Cache the whole database in memory for read-only purposes(Is this insane?).
There are some observations that I made:
If there is one or more connection(s) with status ESTABLISHED, then an attempt to checkout a new connection is not made. Seems like any one connection with status ESTABLISED is handed over to the thread receiving the request. Now, this thread just hangs till the time the database comes back up.
I would want to make this request fast-fail by knowing before handling over a connection to a thread whether db is up or not. If no, the service should throw exception instead of hanging up.
If there's no connection with status ESTABLISHED, then the request fails in 10 secs with the exception that "Could not checkout a new connection". This is due to my checkout timeout being set for 10s.
If the service was processing some request, now the db goes and then the service makes a call to db, the thread making the call to db gets stuck forever. It resumes execution only after the db comes back.
I would like to interrupt the thread after say x seconds whether or not it was able to complete the request.
Are there ways to accomplish what I seek?
Thanks in advance.

How can I immediately detect my phone going back online with IBM MobileFirst?

I am facing a problem the heartbeat interval in my IBM MobileFirst application.
The first time the app runs everything is ok. If I go offline, it recognizes that I am offline. The problem is that if I go offline and then go online again, the apps tries to send the heartbeat and it keeps trying to send it for about 20-30 seconds. Even after being online with my phone, I am still 'offline' in the application because the heartbeat is still trying to send be sent and be successful. 20-30 later is when I receive that the connection was successful and then the app recognizes that I am online. Is there a way to avoid this 'delay'?
I want the app to know that I am offline/online as soon as possible. Is there a way to achieve this?
This is my initOptions where I am using the timeout:
var wlInitOptions = {
timeout: 5000,
.
.
.
And this is my app.js where i am using the WL.Client.setHeartBeatInterval
WL.Client.setHeartBeatInterval(5);
document.addEventListener(WL.Events.WORKLIGHT_IS_CONNECTED, function (event) {
WL.Logger.error('We are online, lower the heartbeat');
WL.Client.setHeartBeatInterval(5);
}, false);
document.addEventListener(WL.Events.WORKLIGHT_IS_DISCONNECTED, function (event) {
WL.Logger.error('We are no longer online, raise heartbeat');
WL.Client.setHeartBeatInterval(1);
}, false);
The heartbeat indirectly recognises you are online due to a successful heartbeat - it doesn't look at the status of online/offline using the phone's hardware. Therefore, there will always be a delay, which on average will be half of the heartbeat length. If you wish to increase the speed at which it recognises you're online, you need to reduce the heartbeat interval. Of course, that will increase network traffic.
There are Cordova plugins, such as this one, which instead look at the phone's hardware and detect whether it is online or offline, providing events you can listen to. They don't (as far as I know) attempt to initiate a network connection to a remote host, so that will tell you only whether the phone thinks it has a network connection, not whether it is stable/robust/fast. As far as I know, MFP doesn't have that functionality built in. To be clear, the plugin is not supported by IBM and I haven't tested it.

When to use delay queue feature of Amazon SQS?

I understand the concept of delay queue of Amazon SQS, but I wonder why it is useful.
What's the usage of SQS delay queue?
Thanks
One use case which i can think of is usage in distributed applications which have eventual consistency semantics. The system consuming the message may have an dependency like a co-relation identifier to be available and hence may need to wait for certain guaranteed duration of time before seeing the co-relation data. In this case, it makes sense for the message to be delayed for certain duration of time.
Like you I was confused as to a use-case for delay queues, until I stumbled across one in my own work. My application needs to have an internal queue with each item waiting at least one minute between each check for completion.
So instead of having to manage a "last-checked-time" on every object, I just shove the object's ID into an SQS queue messagewith a delay time of 60 seconds, and my main loop then becomes a simple long-poll against the queue.
A few off the top of my head:
Emails - Let's say you have a service that sends reminder emails triggered from queue messages. You'd have to delay enqueueing the message in that case.
Race conditions - Delivery delays can be used to overcome race conditions in distributed systems. For example, a service could insert a row into a table, and sends a message about its availability to other services. They can't use the new entry just yet, so you have to delay publishing the SQS message.
Handling retries - Sometimes if a message fails you want to retry with exponential backoffs. This requires re-enqueuing the message with longer delays.
I've built a suite of API's to make queue message scheduling easy. You can call our API's to schedule queue messages, cancel, edit, and check on the status of such messages. Think of it like a scheduler microservice.
www.schedulerapi.com
If you are looking for a solution, let me know. I've built these schedulers before at work for delivering emails at high scale, so I have experience with similar use cases.
One use-case can be:
Think of a time critical expression like a scheduled equity trade order.
If one of your system is fetching all the order scheduled in next 60 minutes and putting them in queue (which will be fetched by another sub system).
If you send these order directly, then they will be visible immediately to process in queue and will be processed depending upon their order.
But most likely, they will not execute in exact time (Hour:Minute:Seconds) in which Customer wanted and this will impact the outcome.
So to solve this, what first sub system will do, it will add delay seconds (difference between current and execution time) so message will only be visible after that much delay or at exact time when user wanted.