Is there an Eventhub Explorer debugging tool? - azure-eventhub

I have a EventHub receiver running in a WebJob.
We are currently with the BASIC tier, so we are limited to 1 Consumer Group.
I'm looking for a debugging tool that would allow me to consult the EventHub stream without affecting the running WebJob.
I tried the Azure Service Bus Explorer, but it gives this exception:
Exception: A receiver with a higher epoch '14' already exists. A new receiver with epoch 0 cannot be created.
Make sure you are creating receiver with increasing epoch value to ensure connectivity, or ensure all old epoch receivers are closed or disconnected.
Using a different Consumer Group solve this issue, but as I said, I would like to stay with BASIC tier if possible.

Related

Spring Data Neo4J - Unable to acquire connection from pool within configured maximum time

We have a Reactive REST API using Spring Data Neo4j (SpringBoot v2.7.5) deployed to Kubernetes. When running a stress test to determine the breaking point, once the volume of requests that the service can handle has been breached, the service does not auto-recover, even after the load has dropped to a level at which the service can handle.
After the service has fallen over the Neo4J health indicator shows:
“org.neo4j.driver.exceptions.ClientException: Unable to acquire connection from the pool within configured maximum time of 60000ms”
With respect to connection/configuration settings we are using defaults configured by SDN.
Observations:
Up until the point at which the service breaks only a small number of connections are utilised, at the point at which it breaks the connections in use jumps up to the max pool size and the above mentioned error is observed. No matter how much time passes (even well beyond the max connection lifetime) the service is unable to acquire a connection from the pool. Upon manually shutting down and restarting the service/pod the service returns to a healthy state.
As an interim solution we now check the Neo4J health indicator, if the mentioned error is present the liveness state is set to down which triggers Kubernetes to restart the service automatically. However, I’m wondering if there is an underlying issue with the connections in the pool not getting ‘cleaned up’?
You can take a look at this discussion https://github.com/spring-projects/spring-data-neo4j/issues/2632
I had the same issue. The problem is that either Spring Framework or Neo4j reactive transaction manager doesn't close connections in a complex reactive flow e.g. when there are a lot of inner calls/mappings and somewhere inside an exception is thrown.
So as a workaround you can add #Transactional in such places to avoid multiple transactions to be created.

How to stream events with GCP platform?

I am looking into building a simple solution where producer services push events to a message queue and then have a streaming service make those available through gRPC streaming API.
Cloud Pub/Sub seems well suited for the job however scaling the streaming service means that each copy of that service would need to create its own subscription and delete it before scaling down and that seems unnecessarily complicated and not what the platform was intended for.
On the other hand Kafka seems to work well for something like this but I'd like to avoid having to manage the underlying platform itself and instead leverage the cloud infrastructure.
I should also mention that the reason for having a streaming API is to allow for streaming towards a frontend (who may not have access to the underlying infrastructure)
Is there a better way to go about doing something like this with the GCP platform without going the route of deploying and managing my own infrastructure?
If you essentially want ephemeral subscriptions, then there are a few things you can set on the Subscription object when you create a subscription:
Set the expiration_policy to a smaller duration. When a subscriber is not receiving messages for that time period, the subscription will be deleted. The tradeoff is that if your subscriber is down due to a transient issue that lasts longer than this period, then the subscription will be deleted. By default, the expiration is 31 days. You can set this as low as 1 day. For pull subscribers, the subscribers simply need to stop issuing requests to Cloud Pub/Sub for the timer on their expiration to start. For push subscriptions, the timer starts based on when no messages are successfully delivered to the endpoint. Therefore, if no messages are published or if the endpoint is returning an error for all pushed messages, the timer is in effect.
Reduce the value of message_retention_duration. This is the time period for which messages are kept in the event a subscriber is not receiving messages and acking them. By default, this is 7 days. You can set it as low as 10 minutes. The tradeoff is that if your subscriber disconnects or gets behind in processing messages by more than this duration, messages older than that will be deleted and the subscriber will not see them.
Subscribers that cleanly shut down could probably just call DeleteSubscription themselves so that the subscription goes away immediately, but for ones that shut down unexpectedly, setting these two properties will minimize the time for which the subscription continues to exist and the number of messages (that will never get delivered) that will be retained.
Keep in mind that Cloud Pub/Sub quotas limit one to 10,000 subscriptions per topic and per project. Therefore, if a lot of subscriptions are created and either active or not cleaned up (manually, or automatically after expiration_policy's ttl has passed), then new subscriptions may not be able to be created.
I think your original idea was better than ephemeral subscriptions tbh. I mean it works, but it feels totally unnatural. Depending on what your requirements are. For example, do clients only need to receive messages while they're connected or do they all need to get all messages?
Only While Connected
Your original idea was better imo. What I probably would have done is to create a gRPC stream service that clients could connect to. The implementation is essentially an observer pattern. The consumer will receive a message and then iterate through the subscribers to do a "Send" to all of them. From there, any time a client connects to the service, it just registers itself with that observer collection and unregisters when it disconnects. Horizontal scaling is passive since clients are sticky to whatever instance they've connected to.
Everyone always get the message, if eventually
The concept is similar to the above but the client doesn't implicitly un-register from the observer on disconnect. Instead, it would register and un-register explicitly (through a method/command designed to do so). Modify the 'on disconnected' logic to tell the observer list that the client has gone offline. Then the consumer's broadcast logic is slightly different. Now it iterates through the list and says "if online, then send, else queue", and send the message to a ephemeral queue (that belongs to the client). Then your 'on connect' logic will send all messages that are in queue to the client before informing the consumer that it's back online. Basically an inbox. Setting up ephemeral, self-deleting queues is really easy in most products like RabbitMQ. I think you'll have to do a bit of managing whether or not it's ok to delete a queue though. For example, never delete the queue unless the client explicitly unsubscribes or has been inactive for so long. Fail to do that, and the whole inbox idea falls apart.
The selected answer above is most similar to what I'm subscribing here in that the subscription is the queue. If I did this, then I'd probably implement it as an internal bus instead of an observer (since it would be unnecessary) - You create a consumer on demand for a connecting client that literally just forwards the message. The message consumer subscribes and unsubscribes based on whether or not the client is connected. As Kamal noted, you'll run into problems if your scale exceeds the maximum number of subscriptions allowed by pubsub. If you find yourself in that position, then you can unshackle that constraint by implementing the pattern above. It's basically the same pattern but you shift the responsibility over to your infra where the only constraint is your own resources.
gRPC makes this mechanism pretty easy. Alternatively, for web, if you're on a Microsoft stack, then SignalR makes this pretty easy too. Clients connect to the hub, and you can publish to all connected clients. The consumer pattern here remains mostly the same, but you don't have to implement the observer pattern by hand.
(note: arrows in diagram are in the direction of dependency, not data flow)

Google Cloud IoT Few config updates mesages are missing when sending config updates frequently from cloud functions to device

I am using config update and cloud functions for communication between mobile application and esp32 device by following the example here, but when I am sending config update messages frequently some of them are not sending; say out of 5 only 3 config update messages are going, I have two questions:
1) How frequently we can send config update to avoid some missing updates.
2) Is there any alternative way to communicate between cloud functions and IoT device.
According to the docs: [IoT docs]
Configuration updates are limited to 1 update per second, per device.
However, for best results, device configuration should be updated much
less often — at most, once every 10 seconds.
The update rate is calculated as the time between the most recent
server acknowledgment and the next update request.
If your operations are mostly configuration updates I cannot think another alternative that could perform better.

Orleans EventHub stream provider

I am using the EventHubStream provider in a project based on Orleans.
After the system has been running a few minutes Orleans starts throwing a QueueCacheMissException while trying to push an event to OnNext from a producer.
i have tried to increase the size of the cache but that helped only for a while.
Is this a normal behavior due to the size of the cache?
In this situation should i unsubscribe and subscribe again? i have tried to resume stream but that didn't work, the stream was in faulted state... any ideas?
It is likely that the service is reading events from eventhub faster than grains are processing them. EventHub can deliver events at a rate of ~1k/second per partition.
The latest version of the EventHub stream provider supports backpressure that should prevent this problem, but it has not been released. You can however build your own nugets.

Occasional high latency in qpid application

I'm hoping someone can help me with an issue I'm seeing with a Qpid C++ application I'm using. Essentially, we have one application publishing a status to a last_value_queue at about a 10Hz rate and a couple other applications continuously processing this status. The receivers also use the status as a kind of heartbeat and will complain if the status message isn't updated for a certain amount of time (500ms, to be exact.)
This works fine for about a day, after which we start seeing issues. Every couple hours, a single fetch call by a receiver will block for over 500ms (sometimes for up to 900ms.) This behavior will continue until we restart the broker.
I'm no expert, but I don't think I'm doing anything particularly dumb. I've been able to repeat this behavior with a pair of small applications that connect to the broker. Every 100ms the sender sends a std::chrono::time_point object set to the current time. The receiver fetches the message and calculates the delay to the millisecond. The delay is always 0ms or 1ms, except for the single spikes every hour or so after the initial day of everything being happy. The connection is created like so:
qpid::messaging::Connection c("host1:5672","{ reconnect: true}");
and the sender and receiver are both created with the string
"testQueue; { mode: browse, create: always, node: { type: queue, x-declare:{ arguments:{'qpid.last_value_queue_key':'key','qpid.replicate':'none'}}}}"
High availability replication is enabled on the broker, but I have it explicitly disabled for everything for the purpose of my testing. I see no difference in behavior when the broker and apps are running on the same host or different hosts on the LAN. Using qpid-stat, I can see that the broker replication queue is still transmitting quite a bit of data, but its message count is always at 0 so I don't think it's sending more than it can handle. Can anyone think of anything I might be missing that could cause this behavior? We're using the Qpid 0.26 and the C++ broker.