What is different between EventHub Client with EventHub Processor? - azure-eventhub

I find there are two way to receive EventHub message data:
Using EventHub Processor, it seems will use checkpoint to save. It will make sure when the process running EventProcessor on a specific partition dies/crashes.
public class SimpleEventProcessor : IEventProcessor
{
public Task CloseAsync(PartitionContext context, CloseReason reason)
{
Console.WriteLine($"Processor Shutting Down. Partition '{context.PartitionId}', Reason: '{reason}'.");
return Task.CompletedTask;
}
public Task OpenAsync(PartitionContext context)
{
Console.WriteLine($"SimpleEventProcessor initialized. Partition: '{context.PartitionId}'");
return Task.CompletedTask;
}
public Task ProcessErrorAsync(PartitionContext context, Exception error)
{
Console.WriteLine($"Error on Partition: {context.PartitionId}, Error: {error.Message}");
return Task.CompletedTask;
}
public Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
{
foreach (var eventData in messages)
{
var data = Encoding.UTF8.GetString(eventData.Body.Array, eventData.Body.Offset, eventData.Body.Count);
Console.WriteLine($"Message received. Partition: '{context.PartitionId}', Data: '{data}'");
}
return context.CheckpointAsync();
}
}
Using EventHub client to receive message:
EventHubClient eventHub
var reciever = eventHub.CreateReceiver("consumer1", "0", EventPosition.FromStart());
var recieved = await reciever.ReceiveAsync(10);
What is the difference for them? Could we save the checkpoint for second ways? How to handle the crash case in second ways? Why does it need two different ways?

EventHubClient aka. low level API is used for building connectors. In this case, developer is responsible to manage partition receivers, checkpoints, load distribution, and crash recovery etc. Most won't be using this API to receive, and again this API is for building source to sink connectors.
Processor Host comes with built-in checkpointing, load distribution, and partition receiver manager. This API might look like an overkill when it comes to implement IEventProcessor, and provide storage checkpoint store however in the long run it is more worry-free.

Related

Setting JMSMessageID on stubbed jms endpoints in camel unit tests

I have a route that I am testing. I use stub://jms:queue:whatever to send/receive messages and extending CamelTestSupport for my test classes. I am having an issue with one of the routes that has a bean that uses an idempotent repo to store messages by "message id" for which it reads and stores the JMSMessageID property from exchange.
The problem I run into is that I can't figure out a way to set this property on messages sent on stubbed endpoints. Every time the method that requires this prop is called, the id returns null and i have to handle it as a null pointer. I can do this but the cleanest approach would be to just set the header on the test message. I tried includeSentJMSMessageId=true on endpoint, I tried using sendBodyAndHeader on producer and passing "JMSMessageID", "ID: whatever" in arguments, doesn't appear to work? I read that the driver/connectionfactory is supposed to set the header, but I'm not too familiar with how/where to do this. And since I am using a stubbed end points, I'm not creating any brokers/connectionfactories in my uts.
So dont stud out the JMS component replace it with a processor and then add the preferred JMSMessageID in the processor.
Something like this code:
#Test
void testIdempotency() throws Exception {
mockOut.expectedMinimumMessageCount(1);
//specify the route to test
AdviceWithRouteBuilder.adviceWith(context, "your-route-name", enrichRoute -> {
//replace the from with a end point we can call directly.
enrichRoute.replaceFromWith("direct:start");
//replace the jms endpoint with a processor so it can act as the JMS Endpoint.
enrichRoute.weaveById("jms:queue:whatever").replace().process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
//Set that ID to the one I want to test
exchange.getIn().setHeader("JMSMEssageID", "some-value-to-test");
}
});
// add an endpoint at the end to check if received a mesage
enrichRoute.weaveAddLast().to(mockOut);
});
context.start();
//send some message
Map<String,Object> sampleMsg = getSampleMessageAsHashMap("REQUEST.json");
//get the response
Map<String,Object> response = (Map<String,Object>)template.requestBody("direct:start", sampleMsg);
// you will need to check if the response is what you expected.
// Check the headers etc.
mockOut.assertIsSatisfied();
}
The JMSMessageID can only be set by the provider. It cannot be set by a client despite the fact that javax.jms.Message has setJMSMessageId(). As the JavaDoc states:
This method is for use by JMS providers only to set this field when a message is sent. This message cannot be used by clients to configure the message ID. This method is public to allow a JMS provider to set this field when sending a message whose implementation is not its own.

AWS Kinesis KCL skips records added before startup

I started to use both KPL and KCL to exchange data between services. But whenever consumer service is offline, all data sent by KPL are lost forever. So I get only those chunks of data that were sent while consumer service is up and its shardConsumer is ready. I need to start from the last consumed point or somehow else process data left behind.
Here is my ShardProcessor code:
#Override
public void initialize(InitializationInput initializationInput) {
}
#Override
public void processRecords(ProcessRecordsInput processRecordsInput) {
processRecordsInput.records()
.forEach(record -> {
//my logic
});
}
#Override
public void leaseLost(LeaseLostInput leaseLostInput) {
}
#Override
public void shardEnded(ShardEndedInput shardEndedInput) {
try {
shardEndedInput.checkpointer().checkpoint();
} catch (ShutdownException | InvalidStateException e) {
LOG.error("Kinesis error on Shard Ended", e);
}
}
#Override
public void shutdownRequested(ShutdownRequestedInput shutdownRequestedInput) {
try {
shutdownRequestedInput.checkpointer().checkpoint();
} catch (ShutdownException | InvalidStateException e) {
LOG.error("Kinesis error on Shutdown Requested", e);
}
}
And configuration code:
public void configure(String streamName, ShardRecordProcessorFactory factory) {
Region region = Region.of(awsRegion);
KinesisAsyncClient kinesisAsyncClient =
KinesisClientUtil.createKinesisAsyncClient(KinesisAsyncClient.builder().region(region));
DynamoDbAsyncClient dynamoClient = DynamoDbAsyncClient.builder().region(region).build();
CloudWatchAsyncClient cloudWatchClient = CloudWatchAsyncClient.builder().region(region).build();
ConfigsBuilder configsBuilder =
new ConfigsBuilder(streamName, appName, kinesisAsyncClient, dynamoClient, cloudWatchClient,
UUID.randomUUID().toString(), factory);
Scheduler scheduler = new Scheduler(
configsBuilder.checkpointConfig(),
configsBuilder.coordinatorConfig(),
configsBuilder.leaseManagementConfig(),
configsBuilder.lifecycleConfig(),
configsBuilder.metricsConfig(),
configsBuilder.processorConfig(),
configsBuilder.retrievalConfig()
.retrievalSpecificConfig(new PollingConfig(streamName, kinesisAsyncClient))
);
Thread schedulerThread = new Thread(scheduler);
schedulerThread.setDaemon(true);
schedulerThread.start();
}
There are two ways to address this. First, the problem.
By default, the KCL is configured to start reading the stream at LATEST. This setting tells the stream reader to pick up the stream at the "current" timestamp.
In your case, you have data in that stream that was placed in there before "now." In order to read that data, you might want to consider reading the earliest data you have in the stream. If you set up a default stream, the stream will store data for 24 hours.
To read the data from the "beginning" of that stream, or 24 hours before you start the KCL application, you'll want to set the stream reader to TRIM_HORIZON. This setting is called initialPositionInStream. You can read about it here. There are three different settings documented in the API.
To solve your issue, the preferred method, as noted in the first link, is to add an entry to the properties file. If you're not using a properties file, you can simply add this to your Scheduler ctor:
Scheduler scheduler = new Scheduler(
configsBuilder.checkpointConfig(),
configsBuilder.coordinatorConfig(),
configsBuilder.leaseManagementConfig(),
configsBuilder.lifecycleConfig(),
configsBuilder.metricsConfig(),
configsBuilder.processorConfig(),
configsBuilder.retrievalConfig()
.initialPositionInStreamExtended(InitialPositionInStreamExtended.newInitialPosition(TRIM_HORIZON))
.retrievalSpecificConfig(new PollingConfig(streamName, kinesisAsyncClient))
);
One thing to keep in mind with this setting is startup functionality when you have data in the stream and you start at TRIM_HORIZON. In this scenario, the RecordProcessor will iterate through records as fast as it can. This could create performance issues at the Kinesis API, or even downstream systems (wherever you're sending the data once the RecordProcessor has it),

Preventing a WCF client from issuing too many requests

I am writing an application where the Client issues commands to a web service (CQRS)
The client is written in C#
The client uses a WCF Proxy to send the messages
The client uses the async pattern to call the web service
The client can issue multiple requests at once.
My problem is that sometimes the client simply issues too many requests and the service starts returning that it is too busy.
Here is an example. I am registering orders and they can be from a handful up to a few 1000s.
var taskList = Orders.Select(order => _cmdSvc.ExecuteAsync(order))
.ToList();
await Task.WhenAll(taskList);
Basically, I call ExecuteAsync for every order and get a Task back. Then I just await for them all to complete.
I don't really want to fix this server-side because no matter how much I tune it, the client could still kill it by sending for example 10,000 requests.
So my question is. Can I configure the WCF Client in any way so that it simply takes all the requests and sends the maximum of say 20, once one completes it automatically dispatches the next, etc? Or is the Task I get back linked to the actual HTTP request and can therefore not return until the request has actually been dispatched?
If this is the case and WCF Client simply cannot do this form me, I have the idea of decorating the WCF Client with a class that queues commands, returns a Task (using TaskCompletionSource) and then makes sure that there are no more than say 20 requests active at a time. I know this will work but I would like to ask if anyone knows of a library or a class that does something like this?
This is kind of like Throttling but I don't want to do exactly that because I don't want to limit how many requests I can send in a given period of time but rather how many active requests can exist at any given time.
Based on #PanagiotisKanavos suggjestion, here is how I solved this.
RequestLimitCommandService acts as a decorator for the actual service which is passed in to the constructor as innerSvc. Once someone calls ExecuteAsync a completion source is created which along with the command is posted to the ActonBlock, the caller then gets back the a Task from the completion source.
The ActionBlock will then call the processing method. This method sends the command to the web service. Depending on what happens, this method will use the completion source to either notify the original sender that a command was processed successfully or attach the exception that occurred to the source.
public class RequestLimitCommandService : IAsyncCommandService
{
private class ExecutionToken
{
public TaskCompletionSource<bool> Source { get; }
public ICommand Command { get; }
public ExecutionToken(TaskCompletionSource<bool> source, ICommand command)
{
Source = source;
Command = command;
}
}
private IAsyncCommandService _innerSrc;
private ActionBlock<ExecutionToken> _block;
public RequestLimitCommandService(IAsyncCommandService innerSvc, int maxDegreeOfParallelism)
{
_innerSrc = innerSvc;
var options = new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = maxDegreeOfParallelism };
_block = new ActionBlock<ExecutionToken>(Execute, options);
}
public Task IAsyncCommandService.ExecuteAsync(ICommand command)
{
var source = new TaskCompletionSource<bool>();
var token = new ExecutionToken(source, command);
_block.Post(token);
return source.Task;
}
private async Task Execute(ExecutionToken token)
{
try
{
await _innerSrc.ExecuteAsync(token.Command);
token.Source.SetResult(true);
}
catch (Exception ex)
{
token.Source.SetException(ex);
}
}
}

How can I get the ActorRef from Source.actorRef()?

I want to do some server-side events (SSE) to a web app. I think I have all the SSE plumbing up and going. I now need to create a Source on the Akka HTTP side of the house.
I found you can do something like this:
val source = Source.actorRef(5, akka.stream.OverflowStrategy.dropTail)
What I want to do is somehow "publish" to this source, presumably by sending an actor a message. I see from the docs that this call creates Source<T,ActorRef>.
How can I get this ActorRef instance so I can send messages to it?
To obtain the materialized ActorRef from Source.actorRef, the stream has to be running. For example, let's say that you want to send the SSE payload data (in the form of a String) to this actor, which converts that data to ServerSentEvent objects to send to the client. You could do something like:
val (actor, sseSource) =
Source.actorRef[String](5, akka.stream.OverflowStrategy.dropTail)
.map(s => /* convert String to ServerSideEvent */)
.keepAlive(1.second, () => ServerSentEvent.heartbeat)
.toMat(BroadcastHub.sink[ServerSentEvent])(Keep.both)
.run()
// (ActorRef, Source[ServerSentEvent, NotUsed])
Now you can send messages to the materialized actor:
actor ! "quesadilla"
And use sseSource in your route:
path("events") {
get {
complete(sseSource)
}
}
Note that there is no backpressure with this approach (i.e., messages to the actor are fired-and-forgotten).

How to lock a long async call in a WebApi action?

I have this scenario where I have a WebApi and an endpoint that when triggered does a lot of work (around 2-5min). It is a POST endpoint with side effects and I would like to limit the execution so that if 2 requests are sent to this endpoint (should not happen, but better safe than sorry), one of them will have to wait in order to avoid race conditions.
I first tried to use a simple static lock inside the controller like this:
lock (_lockObj)
{
var results = await _service.LongRunningWithSideEffects();
return Ok(results);
}
this is of course not possible because of the await inside the lock statement.
Another solution I considered was to use a SemaphoreSlim implementation like this:
await semaphore.WaitAsync();
try
{
var results = await _service.LongRunningWithSideEffects();
return Ok(results);
}
finally
{
semaphore.Release();
}
However, according to MSDN:
The SemaphoreSlim class represents a lightweight, fast semaphore that can be used for waiting within a single process when wait times are expected to be very short.
Since in this scenario the wait times may even reach 5 minutes, what should I use for concurrency control?
EDIT (in response to plog17):
I do understand that passing this task onto a service might be the optimal way, however, I do not necessarily want to queue something in the background that still runs after the request is done.
The request involves other requests and integrations that take some time, but I would still like the user to wait for this request to finish and get a response regardless.
This request is expected to be only fired once a day at a specific time by a cron job. However, there is also an option to fire it manually by a developer (mostly in case something goes wrong with the job) and I would like to ensure the API doesn't run into concurrency issues if the developer e.g. double-sends the request accidentally etc.
If only one request of that sort can be processed at a given time, why not implement a queue ?
With such design, no more need to lock nor wait while processing the long running request.
Flow could be:
Client POST /RessourcesToProcess, should receive 202-Accepted quickly
HttpController simply queue the task to proceed (and return the 202-accepted)
Other service (windows service?) dequeue next task to proceed
Proceed task
Update resource status
During this process, client should be easily able to get status of requests previously made:
If task not found: 404-NotFound. Ressource not found for id 123
If task processing: 200-OK. 123 is processing.
If task done: 200-OK. Process response.
Your controller could look like:
public class TaskController
{
//constructor and private members
[HttpPost, Route("")]
public void QueueTask(RequestBody body)
{
messageQueue.Add(body);
}
[HttpGet, Route("taskId")]
public void QueueTask(string taskId)
{
YourThing thing = tasksRepository.Get(taskId);
if (thing == null)
{
return NotFound("thing does not exist");
}
if (thing.IsProcessing)
{
return Ok("thing is processing");
}
if (!thing.IsProcessing)
{
return Ok("thing is not processing yet");
}
//here we assume thing had been processed
return Ok(thing.ResponseContent);
}
}
This design suggests that you do not handle long running process inside your WebApi. Indeed, it may not be the best design choice. If you still want to do so, you may want to read:
Long running task in WebAPI
https://blogs.msdn.microsoft.com/webdev/2014/06/04/queuebackgroundworkitem-to-reliably-schedule-and-run-background-processes-in-asp-net/