How to lock a long async call in a WebApi action? - concurrency

I have this scenario where I have a WebApi and an endpoint that when triggered does a lot of work (around 2-5min). It is a POST endpoint with side effects and I would like to limit the execution so that if 2 requests are sent to this endpoint (should not happen, but better safe than sorry), one of them will have to wait in order to avoid race conditions.
I first tried to use a simple static lock inside the controller like this:
lock (_lockObj)
{
var results = await _service.LongRunningWithSideEffects();
return Ok(results);
}
this is of course not possible because of the await inside the lock statement.
Another solution I considered was to use a SemaphoreSlim implementation like this:
await semaphore.WaitAsync();
try
{
var results = await _service.LongRunningWithSideEffects();
return Ok(results);
}
finally
{
semaphore.Release();
}
However, according to MSDN:
The SemaphoreSlim class represents a lightweight, fast semaphore that can be used for waiting within a single process when wait times are expected to be very short.
Since in this scenario the wait times may even reach 5 minutes, what should I use for concurrency control?
EDIT (in response to plog17):
I do understand that passing this task onto a service might be the optimal way, however, I do not necessarily want to queue something in the background that still runs after the request is done.
The request involves other requests and integrations that take some time, but I would still like the user to wait for this request to finish and get a response regardless.
This request is expected to be only fired once a day at a specific time by a cron job. However, there is also an option to fire it manually by a developer (mostly in case something goes wrong with the job) and I would like to ensure the API doesn't run into concurrency issues if the developer e.g. double-sends the request accidentally etc.

If only one request of that sort can be processed at a given time, why not implement a queue ?
With such design, no more need to lock nor wait while processing the long running request.
Flow could be:
Client POST /RessourcesToProcess, should receive 202-Accepted quickly
HttpController simply queue the task to proceed (and return the 202-accepted)
Other service (windows service?) dequeue next task to proceed
Proceed task
Update resource status
During this process, client should be easily able to get status of requests previously made:
If task not found: 404-NotFound. Ressource not found for id 123
If task processing: 200-OK. 123 is processing.
If task done: 200-OK. Process response.
Your controller could look like:
public class TaskController
{
//constructor and private members
[HttpPost, Route("")]
public void QueueTask(RequestBody body)
{
messageQueue.Add(body);
}
[HttpGet, Route("taskId")]
public void QueueTask(string taskId)
{
YourThing thing = tasksRepository.Get(taskId);
if (thing == null)
{
return NotFound("thing does not exist");
}
if (thing.IsProcessing)
{
return Ok("thing is processing");
}
if (!thing.IsProcessing)
{
return Ok("thing is not processing yet");
}
//here we assume thing had been processed
return Ok(thing.ResponseContent);
}
}
This design suggests that you do not handle long running process inside your WebApi. Indeed, it may not be the best design choice. If you still want to do so, you may want to read:
Long running task in WebAPI
https://blogs.msdn.microsoft.com/webdev/2014/06/04/queuebackgroundworkitem-to-reliably-schedule-and-run-background-processes-in-asp-net/

Related

how can I prevent dynamoDB Stream handler from infinitely processing a record when I use batchItemFailures

I have a dynamoDB stream which is triggering a lambda handler that looks like this:
let failedRequestId: string
await asyncForEachSerial(event.Records, async (record) => {
try {
await handle(record.dynamodb.OldImage, record.dynamodb.NewImage, record, context)
return true
} catch (e) {
failedRequestId = record.dynamodb.SequenceNumber
}
return false //break;
})
return {
batchItemFailures:[ { itemIdentifier: failedRequestId } ]
}
I have my lambda set up with a DestinationConfig.onFailure pointing to a DLQ I configured in SQS. The idea behind the handler is to process a batch of events and interrupt at the first failure. Then it reports the most recent failure in 'batchItemFailures' which tells the stream to continue at that record next try. (I pulled the idea from this article)
My current issue is that if there is a genuine failure of my handle() function on one of those records, then my exit code will trigger that record as my checkpoint for the next handler call. However the dlq condition doesn't ever trigger and I end up processing that record over and over again. I should also note that I am trying to avoid reprocessing records multiple times since handle() is not idempotent.
How can I elegantly handle errors while maintaining batching, but without triggering my handle() function more than once for well-behaved stream records?
I'm not sure if you have found the answer you were looking for. I'll respond in case someone else come across this issue.
There are 2 other parameters you'd want to use to avoid that issue. Quoting documentation (https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html):
Retry attempts – The maximum number of times that Lambda retries when the function returns an error. This doesn't apply to service errors or throttles where the batch didn't reach the function.
Maximum age of record – The maximum age of a record that Lambda sends to your function.
Basically, you'll have to specify how many time the failures should be retried and how far back in the events Lambda should be looking at.

Using a lock in C++ across multiple tasks

I am not really seeking code examples, but I'm hoping someone can review my program design and provide feedback. I am trying to figure out how do I ensure I have one instance of my "workflow" running at a time.
I am working in C++.
This is my workflow:
I read rows off of a Postgres database.
If the table has any records, I want to do these instructions:
Read the records and transform them to JSON
Send the JSON document to a remote Web service
Parse the response from the service. The service tells me which records were saved or not saved, based on their primary key.
I delete the successfully saved records
I log the unsuccessful records (there's another process that consumes the logs and so my work is done).
I want to perform all of this threads using a separate thread (or "task", whatever higher-level abstraction is available in C++), and I want to make sure that if my function for [1] gets called multiple times, the additional calls basically get "dropped" if step 1 is already in flight.
In C++, I believe I can use a flag and a mutex. I use a something like std::lock_guard<std::mutex> at the top of my method. Then the next line checks for a flag.
// MyWorkflow.cpp
std::mutex myMutex;
int inFlight = 0;
void process() {
std::lock_guard<std::mutex> guard(myMutex);
if (inflight) {
return;
}
inflight = 1;
std::vector<Widget> widgets = readFromMyTable();
std::string json = getJson(&widgets);
... // Send the json to the remote service and handle the response
}
Okay, let me explain my confusion. I want to use Curl to perform the HTTP request. But Curl works asynchronously. And so if I make the asynchronous HTTP call via Curl, my update function will just return and myMutex will be released, right?
I think in my asynchronous response handler, I need to call a second function that's in MyWorkflow.cpp
void markCompletion() {
std::lock_guard<std::mutex> guard(myMutex);
inFlight = 0; // Reset the inflight flag here
}
Is this the right approach? I am worried that if an exception is thrown anywhere before I call markCompletion(), I will block all future callers. I think I need to ensure I have proper exception handling and always call markCompletion().
I am terribly sorry for asking such a noob question, but I really want to learn to do this the right way.

MismatchingMessageCorrelationException : Cannot correlate message ‘onEventReceiver’: No process definition or execution matches the parameters

We are facing an MismatchingMessageCorrelationException for the receive task in some cases (less than 5%)
The call back to notify receive task is done by :
protected void respondToCallWorker(
#NonNull final String correlationId,
final CallWorkerResultKeys result,
#Nullable final Map<String, Object> variables
) {
try {
runtimeService.createMessageCorrelation("callWorkerConsumer")
.processInstanceId(correlationId)
.setVariables(variables)
.setVariable("callStatus", result.toString())
.correlateWithResult();
} catch(Exception e) {
e.printStackTrace();
}
}
When i check the logs : i found that the query executed is this one :
select distinct RES.* from ACT_RU_EXECUTION RES
inner join ACT_RE_PROCDEF P on RES.PROC_DEF_ID_ = P.ID_
WHERE RES.PROC_INST_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0' and RES.SUSPENSION_STATE_ = '1'
and exists (select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = RES.ID_ and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer' )
Some times, When i look for the instance of the process in the database i found it waiting in the receive task
SELECT DISTINCT * FROM ACT_RU_EXECUTION RES
WHERE id_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
However, when i check the subscription event, it's not yet created in the database
select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer'
I think that the solution is to save the "receive task" before getting the response for respondToCallWorker, but sadly i can't figure it out.
I tried "asynch before" callWorker and "Message consumer" but it did not work,
I also tried camunda.bpm.database.jdbc-batch-processing=false and got the same results,
I tried also parallel branches but i get OptimisticLocak exception and MismatchingMessageCorrelationException
Maybe i am doing it wrong
Thanks for your help
This is an interesting problem. As you already found out, the error happens, when you try to correlate the result from the "worker" before the main process ended its transaction, thus there is no message subscription registered at the time you correlate.
This problem in process orchestration is described and analyzed in this blog post, which is definitely worth reading.
Taken from that post, here is a design that should solve the issue:
You make message send and receive parallel and put an async before the send task.
By doing so, the async continuation job for the send event and the message subscription are written in the same transaction, so when the async message send executes, you already have the subscription waiting.
Although this should work and solve the issue on BPMN model level, it might be worth to consider options that do not require remodeling the process.
First, instead of calling the worker directly from your delegate, you could (assuming you are on spring boot) publish a "CallWorkerCommand" (simple pojo) and use a TransactionalEventLister on a spring bean to execute the actual call. By doing so, you first will finish the BPMN process by subscribing to the message and afterwards, spring will execute your worker call.
Second: you could use a retry mechanism like resilience4j around your correlate message call, so in the rare cases where the result comes to quickly, you fail and retry a second later.
Another solution I could think of, since you seem to be using an "external worker" pattern here, is to use an external-task-service task directly, so the send/receive synchronization gets solved by the Camunda external worker API.
So many options to choose from. I would possibly prefer the external task, followed by the transactionalEventListener, but that is a matter of personal preference.

Implementing a custom async task type and await

I am developing a C++ app in which i need to receive messages from an MQ and then parsing them according to their type and for a particular reason I want to make this process (receiving a single message followed by processing it) asynchronous. Since, I want to keep things as simple as possible in a way that the next developer would have no problem continuing the code, I have written a very small class to implement Asynchrony.
I first raise a new thread and pass a function to the thread:
task = new thread([&] {
result = fn();
isCompleted = true;
});
task->detach();
and in order to await the task I do the following:
while (!isCompleted && !(*cancelationToken))
{
Sleep(5);
}
state = 1; // marking the task as completed
So far there is no problem and I have not faced any bug or error but I am not sure if this is "a good way to do this" and my question is focused on determining this.
Read about std::future and std::async.
If your task runs in another core or processor, the variable isCompleted may become un-synchronized having two copies in core cache. So you may be waiting more than needed.
If you have to wait for something it is better to use a semaphore.
As said in comments, using standard methods is better anyway.

Play Framework how to purposely delay a response

We have a Play app, currently using version 2.6. We are trying to prevent dictionary attacks against our login by delaying a "failed login" message back to our users when they provide a failed password. We currently hash and salt and have all the best practices, but we are not sure if we are delaying correctly. So we have in our Controller:
public Result login() { return ok(loginHtml) }
and we have a:
public Result loginAction()
{
// Check for user in database
User user = User.find.query()...
// Was the user found?
if (user == null) {
// Wrong password! Delay and redirect
Thread.sleep(10000); <<-- how do delay correctly?
return redirect(routes.Controller.login())
}
// User is not null, so all good!
...
}
We are not sure if Thread.sleep(10000) is the best way to delay a response since this might hang other requests that come in, or use too many thread from the default pool. We have noticed that under 80+ hits per second the Play Framework does not route our HTTP calls to the Routes. That is, if we receive a HTTP POST request, our app will not even send that request to the Controller until 20+ seconds later, HOWEVER, in the SAME time period if we get a HTTP GET request, our app will process that GET instantly!
Currently we have 300 threads as the min/max in our Akka settings for the default fork pool. Any insights would be appreciated. We run a t2.xlarge AWS EC2 instance running Ubuntu.
Thank you.
Thread.sleep causes current thread blocking, please, try to avoid using it in production code as much as possible.
What you need to use, is CompletionStage / CompletableFuture or any abstraction for deeling with async programming and asynchronous action.
Please, take a look for more details about asynchronios actions: https://www.playframework.com/documentation/2.8.x/JavaAsync
In your case solution would look like something too (excuse me, please, this might have mistakes - I'm Scala engineer primary):
import play.libs.concurrent.HttpExecutionContext;
import play.mvc.*;
import javax.inject.Inject;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionStage;
public class LoginController extends Controller {
private HttpExecutionContext httpExecutionContext;
// Create and inject separate ScheduledExecutorService
private ScheduledExecutorService executor;
#Inject
public LoginController(HttpExecutionContext ec,
ScheduledExecutorService executor) {
this.httpExecutionContext = ec;
this.executor = executor;
}
public CompletionStage<Result> loginAction() {
User user = User.find.query()...
if (user == null) {
return executor.schedule(() -> {redirect(routes.Controller.login());}, 10, TimeUnit.SECONDS);
} else {
// return another response
}
}
}
Hope this helps!
I don't like this approach at all. This hogs threads for no reason and can probably cause your entire system to lock up if someone finds out you are doing this and they have malicious ideas. Let me propose a better approach:
In the User table store a nullable LocalDateTime of the last login attempt time.
When you fetch the user from the DB check the last attempt time (compare to LocalDateTime.now()), if 10 secs have passed since last attempt perform the password comparison.
If passwords don't match store the last attempt time as now.
This can also be handled gracefully on the front end if you provide good error responses.
EDIT: If you want to delay login attempts NOT based on the user, you could create an attempt table and store last attempt by IP address.
If you really want to do your way which I don't recommend you need to read up on this first: https://www.playframework.com/documentation/2.8.x/ThreadPools