Historic external task logs - camunda

I have a test case which starts a process using Rest API POST /process-definition/key/{key}/start with one of invalid variable in the request body. From result response I took Id of the process instance to verify the external task logs using Rest API GET /history/external-task-log
In my testing I am expecting failureLog as true from the historic external task log for that processId, but actually the response has deletionLog as true.
Is this expected behavior or is it a bug ?

Related

Continue request django rest framework

I have a request that lasts more than 3 minutes, I want the request to be sent and immediately give the answer 200 and after the end of the work - give the result
The workflow you've described is called asynchronous task execution.
The main idea is to remove time or resource consuming parts of work from the code that handles HTTP requests and deligate it to some kind of worker. The worker might be a diffrent thread or process or even a separate service that runs on a different server.
This makes your application more responsive, as the users gets the HTTP response much quicker. Also, with this approach you can display such UI-friendly things as progress bars and status marks for the task, create retrial policies if task failes etc.
Example workflow:
user makes HTTP request initiating the task
the server creates the task, adds it to the queue and returns the HTTP response with task_id immediately
the front-end code starts ajax polling to get the results of the task passing task_id
the server handles polling HTTP requests and gets status information for this task_id. It returns the info (whether results or "still waiting") with the HTTP response
the front-end displays spinner if server returns "still waiting" or the results if they are ready
The most popular way to do this in Django is using the celery disctributed task queue.
Suppose a request comes, you will have to verify it. Then send response and use a mechanism to complete the request in the background. You will have to be clear that the request can be completed. You can use pipelining, where you put every task into pipeline, Django-Celery is an option but don't use it unless required. Find easy way to resolve the issue

"LAMBDA_RUNTIME" Error on high-volume Lambda Function

I'm currently using a Lambda Function written in Javascript that is setup with an SQS event source to automatically pull messages from an SQS Queue and do some basic processing on the message contents. I cannot show the code but the summary of the lambda function's execution is basically:
For each message in the batch it receives as part of the event:
It parses the body, which is a JSON string, into a Javascript object.
It reads an object from S3 that is listed in the object using getObject.
It puts a record into a DynamoDB table using put.
If there were no errors, it deletes the individual SQS message that was processed from the Queue using deleteMessage.
This SQS queue is high-volume and receives messages in-bulk, regularly building up a backlog of millions of messages. The Lambda is normally able to scale to process hundreds of thousands of messages concurrently. This solution has worked well for me with other applications in the past but I'm now encountering the following intermittent error that reliably begins to appear as the Lambda scales up:
[ERROR] [#############] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 400.
I've been unable to find any information anywhere about what this error means and what causes it. There appears to be not discernible pattern as to which executions encounter it. The function is usually able to run for a brief period without encountering the error and scale to expected levels. But then, as you can see, the error starts to appear quite suddenly and completely destroys the Lambda throughput by forcing it to auto-scale down:
Does anyone know what this "LAMBDA_RUNTIME" error means and what might cause it? My Lambda Function runtime is Node v12.
Your function is being invoked asynchronously, so when it finishes it signals the caller if it was sucessful.
You should have an error some milliseconds earlier, probably an unhandled exception not being logged. If that's the case, your functions ends without knowing about the exception and tries to post a success response.
I have this error only that I get:
[ERROR] [1638918279694] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413.
I went to the lambda function on aws console and ran the test with a custom event I build and the error I got there was:
{
"errorMessage": "Response payload size exceeded maximum allowed payload size (6291556 bytes).",
"errorType": "Function.ResponseSizeTooLarge"
}
So this is the actual error that cloudwatch doesn't return but the testing section of the lambda function console do.
I think I'll have to return info to an S3 file or something, but that's another matter.

AWS Storage gateway : refresh cache Too many requests have been sent to server

I am calling AWS Storage Gateway refreshCache method quite too frequently I guess, (As the message suggests), but I am not sure how long do I need to wait till I hit it again, any help will be appreciated.
AWSStorageGateway gatewayClient = AWSStorageGatewayClientBuilder.standard().build();
RefreshCacheRequest cacheRequest = new RefreshCacheRequest();
cacheRequest.setFileShareARN(this.fileShareArn);
gatewayClient.refreshCache(cacheRequest);
com.amazonaws.services.storagegateway.model.InvalidGatewayRequestException: Too many requests have been sent to server. (Service: AWSStorageGateway; Status Code: 400; Error Code: InvalidGatewayRequestException; Request ID: f1ffa249-6908-4ae1-9f71-93fe7f26b2af)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
I think you can refer to the official document. https://docs.aws.amazon.com/storagegateway/latest/APIReference/API_RefreshCache.html
As it said,
When this API is called, it only initiates the refresh operation. When the API call completes and returns a success code, it doesn't necessarily mean that the file refresh has completed. You should use the refresh-complete notification to determine that the operation has completed before you check for new files on the gateway file share.
So I guess after you called AWS Storage Gateway refreshCache method, you must wait until the refresh action completed. And if you call the method again during this period,some exceptions will be raised.
For the solution, you can refer to Monitoring Your File Share to set a notification.

AWS lambda execution fails only first time I run it with 'customer function error'

I trigger a lambda function via API gateway and everything works perfectly with the one exception that the very first time I trigger it on a given day it fails.
Strangely, the lambda function logs don't show any errors. I get my usual START log statement and then the request and context of the trigger, then after 5s, it ends unexpectedly.
When I look into the API gateway logs this is the error it returns:
Lambda execution failed with status 200 due to customer function error: 2018-12-10T11:00:31.208Z cc233168-fc9n-11fc-a05a-577bb4sd2b2ccc Task timed out after 5.01 seconds.
Has anyone encountered a similar problem? What is customer function error and how may I resolve this?
without knowing much of the background code you are using, i would termed this a Cold Start. Cold start happens for the first request where your function has not be called for a very long time. If you notice error message says "Time Out after 5.01 seconds. which is default set. you can increase a time out.
Alternatively, you could consider reducing the impact of cold starts by reducing the length of cold starts reference :
by authoring your Lambda functions in a language that doesn’t incur a high cold start time — i.e. Node.js, Python, or Go
choose a higher memory setting for functions on the critical path of handling user requests (i.e. anything that the user would have to wait for a response from, including intermediate APIs)
optimizing your function’s dependencies, and package size
You can also explore by putting a cron job through Cloud Watch after every specific interval to call your API through PING
Adding to Yash's answer:
I've only seen Lambda execution failed with status 200 in API Gateway execution logs, though in case it can manifest in other ways: ensure you have execution logging enabled for the endpoint. If you didn't already have it enabled you'll need to wait for the problem to manifest again.
You can verify it's a cold start problem as follows:
In the log entry with the error grab the #logStream value and the timestamp for the event; it'll be a long string of alphanumerics like a4f8115980dc83a511eeedc493a78741
Open the log group for that endpoint's execution log -> find the log stream with the identifier you just grabbed
Narrow the date/time range to a window around the time where the event occurred
If you chose a narrow window and if it's a cold start problem: I would expect the offending request to be the first one in the list. Click the There are older events to load. Load more. at the top of the list.
You should now see a gap of time between the last request received and the offending request.
In my case the error says connection reset by peer which leads me to think it's behaving as though a virtual machine were put to sleep then awoken in the sense that it believes TCP connections it previously had open are still valid.
In the short term the solution we're going with is to implement a retry strategy.
Besides the cold-start problem, there's another potential aspect of this problem: your API Gateway access log format.
Do the following:
Find the access log entries that correspond to the offending request in the execution log.
Is the HTTP status == 502?
502s in the API Gateway access log usually (always?) indicate the Lambda responded with malformed JSON.
The most obvious reason for it returning malformed JSON is a bug in your code. One of the less obvious reasons: a mistake in the access log format.
If you suspect that's the case, look for the following:
Quoted fields that shouldn't be; eg $context.error.messageString
Un-quoted fields that should be. A common idiom is to leave numeric fields un-quoted because it makes insights queries like this work: | filter #status >= 500. As convenient as that is, if the field isn't guaranteed to produce a numeric result then the JSON response will be malformed.
Trailing commas in {} bodies
Here's the documentation for many of the the context variables, though one thing to keep in mind: the context variables that are available differ between the different API Gateway endpoint types (lambda, websocket, etc).

Wso2 bps process hangs without reason

I have simple bpl process with 3 invokes in loop. One of instances hang without any visible reason. So process is in active state but it is not executing any longer. Last logged activity is call to invoke. I search database and find out that both request and response are present in table ode_message and they looks correctly. But output variable from invoke in table ode_xml_data is not filled. There is no logs in bps from time when message arrived. Is any way to find out what happen wrong?
I'm try to use Wso2 BPS 2.1.2