I tried to create a wrapper around Google Text-to-Speech copied verbatim from your help pages. From the logs it seems it tried to create a Jetty Instance(to forward the call to another service?) but fails as it can't assign it a network address by reading from the default. I will include the log output for your reference. Can you suggest changes to the code or a different strategy to use when using Cloudfunctions?
Log Output follows:
"textPayload": "java.io.IOException: com.google.api.gax.rpc.DeadlineExceededException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 19.821560546s. [buffered_nanos=2713815445, buffered_nanos=17108132243, waiting_for_connection]\n\tat com.gcp.cloudfunctions.Test.service(Test.java:91)\n\tat com.google.cloud.functions.invoker.NewHttpFunctionExecutor.service(NewHttpFunctionExecutor.java:67)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:755)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:547)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat com.google.cloud.functions.invoker.runner.Invoker$NotFoundHandler.handle(Invoker.java:379)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\nCaused by: com.google.api.gax.rpc.DeadlineExceededException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 19.821560546s. [buffered_nanos=2713815445, buffered_nanos=17108132243, waiting_for_connection]\n\tat com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:51)\n\tat com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)\n\tat com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)\n\tat com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)\n\tat com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)\n\tat com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:982)\n\tat com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)\n\tat com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)\n\tat com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:957)\n\tat com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)\n\tat io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:522)\n\tat io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:497)\n\tat io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)\n\tat io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)\n\tat io.grpc.internal.ClientCallImpl$1CloseInContext.runInContext(ClientCallImpl.java:416)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\t... 1 more\n\tSuppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed\n\t\tat com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57)\n\t\tat com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112)\n\t\tat com.google.cloud.texttospeech.v1beta1.TextToSpeechClient.synthesizeSpeech(TextToSpeechClient.java:268)\n\t\tat com.google.cloud.texttospeech.v1beta1.TextToSpeechClient.synthesizeSpeech(TextToSpeechClient.java:241)\n\t\tat com.gcp.cloudfunctions.CloudTextToSpeech.getSpeech(CloudTextToSpeech.java:97)\n\t\tat com.gcp.cloudfunctions.Test.service(Test.java:53)\n\t\tat com.google.cloud.functions.invoker.NewHttpFunctionExecutor.service(NewHttpFunctionExecutor.java:67)\n\t\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\t\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:755)\n\t\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:547)\n\t\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\t\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)\n\t\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\t\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\t\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\t\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)\n\t\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\t\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\t\tat com.google.cloud.functions.invoker.runner.Invoker$NotFoundHandler.handle(Invoker.java:379)\n\t\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\t\tat org.eclipse.jetty.server.Server.handle(Server.java:500)\n\t\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\t\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\t\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\t\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)\n\t\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\t\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\t\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\t\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\t\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)\n\t\t... 1 more\nCaused by: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 19.821560546s. [buffered_nanos=2713815445, buffered_nanos=17108132243, waiting_for_connection]\n\tat io.grpc.Status.asRuntimeException(Status.java:533)\n\t... 12 more",
"insertId": "000000-a16ab144-e2f2-4fe6-ae1a-fc939a327991",
"resource": {
"type": "cloud_function",
"labels": {
"region": "us-central1",
"function_name": "function-1",
"project_id": "nice-theater-281908"
}
},....
Update:
It turns out that the line causing a DEADLINE_EXCEEDED Error is the culprit. Googling the error showed that it may be because the client does not receive a response from the server in time (before timeout). Here is the line causing the error.
SynthesizeSpeechResponse response = textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);
Any advice on how to work around it or how to increase the timeout for the process? Willing to give anything a shot.
SynthesizeSpeech is a synchronous method so you have to wait for the response. Cloud Functions have a default timeout, but you can extend the timeout. You could also try breaking up the text so that it is smaller and thus more likely to finish within the Cloud Function time limit.
Related
I have
datetime_format = "%Y-%m-%dT%H:%M:%S.%f%z"
in /etc/awslogs/awslogs.conf
And I have log like this:
{
"level": "info",
"ts": "2023-01-08T21:46:03.381067Z",
"caller": "bot/bot.go:172",
"msg": "Creating test subscription declined",
"user_id": "0394c017-2a94-416c-940c-31b1aadb12ee"
}
However timestamp does not parsed
I see warning in logs
2023-01-08 21:46:03,423 - cwlogs.push.reader - WARNING - 9500 - Thread-4 - Fall back to previous event time: {'timestamp': 1673211877689, 'start_position': 6469L, 'end_position': 6640L}, previousEventTime: 1673211877689, reason: timestamp could not be parsed from message.
upd:
tried to remove level
{
"ts": "2023-01-08T23:15:00.518545Z",
"caller": "bot/bot.go:172",
"msg": "Creating test subscription declined",
"user_id": "0394c017-2a94-416c-940c-31b1aadb12ee"
}
and still does not work.
There 2 different formats of cloudwatch log configurations:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html. This is deprecated as mentioned in the alert section of the page.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html. This is the configuration for new unified cloudwatch agent and it doesn't have the parameter datetime_format to configure. Instead it has the timestamp_format.
Since you have mentioned the datetime_format, I'm assuming you are using the old agent. In that case, the %z refers to UTC offset in the form +HHMM or -HHMM. +0000, -0400, +1030 as per the linked documentation[1 above]. Your timestamp doesn't have an offset mentioned hence your format should be %Y-%m-%dT%H:%M:%S.%fZ. There the Z is similar to T where it just represents a character. Also, specify the time_zone as UTC.
I get this error when I launch, from zero, more than 4 process in sync:
{
"insertId": "61a4a4920009771002b74809",
"jsonPayload": {
"asctime": "2021-11-29 09:59:46,620",
"message": "Exception in callback <bound method ResumableBidiRpc._on_call_done of <google.api_core.bidi.ResumableBidiRpc object at 0x3eb1636b2cd0>>: ValueError('Cannot invoke RPC: Channel closed!')",
"funcName": "handle_event",
"lineno": 183,
"filename": "_channel.py"
}
This is the pub-sub schema:
pub-sub-schema
The error seems to happen at step 9 or 10.
The actual code is:
future = publisher.publish(
topic_path,
encoded_message,
msg_attribute=message_key
)
future.add_done_callback(
callback=lambda f:
logging.info(...)
)
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
PROJECT_ID,
"..."
)
streaming_pull_future = subscriber.subscribe(
subscription_path,
callback=aggregator_callback_handler.handle_message
)
aggregator_callback_handler.callback = streaming_pull_future
wait_result(
timeout=300,
pratica=...,
f_check_res_condition=lambda: aggregator_callback_handler.response is not None
)
streaming_pull_future.cancel()
subscriber.close()
The module aggregator_callback_handler handles .nack and .ack.
The error is returned for some seconds, then the VMs on which the services are hosted scales and the error stops. Same if, instead of launching the processes all together, I scale them manually launching them one by one and leaving some sleep in-between.
I've already checked the timeouts and put the subscriber outside of context manager, but those solutions doesn't work.
Any idea on how to handle this?
Setup
AWS Lambda (3s timeout)
NodeJS 12.x
mssql 2.6.1
tedious (dependency of mssql and so it installs tedious 6.7.0)
SQL Server DB in RDS (db.t3.small)
I'm also dealing with a fair bit of traffic, with roughly 1k invocations per minute.
Problem
Most of the time the Lambda executes just fine. Roughly 0.35% of the time the Lambda throws an error. The logs look like this:
In the screenshot you can see that the function STARTs, prints some debug info, then throws an error, ENDs, and REPORTs.
While this is a "timeout" error, the error message says,
RequestError: Timeout: Request failed to complete in 15000ms
This confuses me because as you see in the REPORT log, the invocation time was just 255.33ms total.
Question
The obvious question is how does something timeout after 15 seconds in just 255ms? Is this an issue with tedious, mssql, my code, or something else? If my code is relevant to the question please let me know and I can add it. I assume the code is basically functional because it works > 99% of the time.
Failed Theories:
The logs are interleaved and the errors are not from the 225ms invocation. That's wrong because in the screenshot the request ID's match up.
There's an intermittent error connecting to the DB, possibly a DNS issue. I have seen very rare EAIAGAIN DNS errors when resolving the DB host, but that doesn't seem to match with the steadiness of this error.
I didn't spot anything super helpful in the GitHub issues for Tedious.
Full Error
{
"errorType": "Runtime.UnhandledPromiseRejection",
"errorMessage": "RequestError: Timeout: Request failed to complete in 15000ms",
"reason": {
"errorType": "RequestError",
"errorMessage": "Timeout: Request failed to complete in 15000ms",
"code": "ETIMEOUT",
"originalError": {
"errorType": "RequestError",
"errorMessage": "Timeout: Request failed to complete in 15000ms",
"code": "ETIMEOUT",
"message": "Timeout: Request failed to complete in 15000ms",
"stack": [
"RequestError: Timeout: Request failed to complete in 15000ms",
" at RequestError (/var/task/node_modules/mssql/node_modules/tedious/lib/errors.js:32:12)",
" at Connection.requestTimeout (/var/task/node_modules/mssql/node_modules/tedious/lib/connection.js:1212:46)",
" at Timeout._onTimeout (/var/task/node_modules/mssql/node_modules/tedious/lib/connection.js:1180:14)",
" at listOnTimeout (internal/timers.js:549:17)",
" at processTimers (internal/timers.js:492:7)"
]
},
"name": "RequestError",
"number": "ETIMEOUT",
"precedingErrors": [],
"stack": [
"RequestError: Timeout: Request failed to complete in 15000ms",
" at Request.userCallback (/var/task/node_modules/mssql/lib/tedious/request.js:429:19)",
" at Request.callback (/var/task/node_modules/mssql/node_modules/tedious/lib/request.js:56:14)",
" at Connection.endOfMessageMarkerReceived (/var/task/node_modules/mssql/node_modules/tedious/lib/connection.js:2407:20)",
" at Connection.dispatchEvent (/var/task/node_modules/mssql/node_modules/tedious/lib/connection.js:1279:15)",
" at Parser.<anonymous> (/var/task/node_modules/mssql/node_modules/tedious/lib/connection.js:1072:14)",
" at Parser.emit (events.js:315:20)",
" at Parser.EventEmitter.emit (domain.js:482:12)",
" at Parser.<anonymous> (/var/task/node_modules/mssql/node_modules/tedious/lib/token/token-stream-parser.js:37:14)",
" at Parser.emit (events.js:315:20)",
" at Parser.EventEmitter.emit (domain.js:482:12)"
]
},
"promise": {},
"stack": [
"Runtime.UnhandledPromiseRejection: RequestError: Timeout: Request failed to complete in 15000ms",
" at process.<anonymous> (/var/runtime/index.js:35:15)",
" at process.emit (events.js:315:20)",
" at process.EventEmitter.emit (domain.js:482:12)",
" at processPromiseRejections (internal/process/promises.js:209:33)",
" at processTicksAndRejections (internal/process/task_queues.js:98:32)"
]
}
I'm trying AutoML Vision of ML Codelabs on Cloud Healthcare API GitHub tutorials.
https://github.com/GoogleCloudPlatform/healthcare/blob/master/imaging/ml_codelab/breast_density_auto_ml.ipynb
I run the Export DICOM data cell code of Convert DICOM to JPEG section and the request as well as all the premise cell code succeeded.
But waiting for operation completion is timed out and never finish.
(ExportDicomData request status on Dataset page stays "Running" over the day. I did many times but all the requests were stacked staying "Running". A few times I tried to do from scratch and the results were same.)
I did so far:
1) Remove "output_config" since INVALID ARGUMENT error occurs.
https://github.com/GoogleCloudPlatform/healthcare/issues/133
2) Enable Cloud Resource Manager API since it is needed.
This is the cell code.
# Path to export DICOM data.
dicom_store_url = os.path.join(HEALTHCARE_API_URL, 'projects', project_id, 'locations', location, 'datasets', dataset_id, 'dicomStores', dicom_store_id)
path = dicom_store_url + ":export"
# Headers (send request in JSON format).
headers = {'Content-Type': 'application/json'}
# Body (encoded in JSON format).
# output_config = {'output_config': {'gcs_destination': {'uri_prefix': jpeg_folder, 'mime_type': 'image/jpeg; transfer-syntax=1.2.840.10008.1.2.4.50'}}}
output_config = {'gcs_destination': {'uri_prefix': jpeg_folder, 'mime_type': 'image/jpeg; transfer-syntax=1.2.840.10008.1.2.4.50'}}
body = json.dumps(output_config)
resp, content = http.request(path, method='POST', headers=headers, body=body)
assert resp.status == 200, 'error exporting to JPEG, code: {0}, response: {1}'.format(resp.status, content)
print('Full response:\n{0}'.format(content))
# Record operation_name so we can poll for it later.
response = json.loads(content)
operation_name = response['name']
This is the result of waiting.
Waiting for operation completion...
Full response:
{
"name": "projects/my-datalab-tutorials/locations/us-central1/datasets/sample-dataset/operations/18300485449992372225",
"metadata": {
"#type": "type.googleapis.com/google.cloud.healthcare.v1beta1.OperationMetadata",
"apiMethodName": "google.cloud.healthcare.v1beta1.dicom.DicomService.ExportDicomData",
"createTime": "2019-08-18T10:37:49.809136Z"
}
}
AssertionErrorTraceback (most recent call last)
<ipython-input-18-1a57fd38ea96> in <module>()
21 timeout = time.time() + 10*60 # Wait up to 10 minutes.
22 path = os.path.join(HEALTHCARE_API_URL, operation_name)
---> 23 _ = wait_for_operation_completion(path, timeout)
<ipython-input-18-1a57fd38ea96> in wait_for_operation_completion(path, timeout)
15
16 print('Full response:\n{0}'.format(content))
---> 17 assert success, "operation did not complete successfully in time limit"
18 print('Success!')
19 return response
AssertionError: operation did not complete successfully in time limit
API Version is v1beta1.
I was wondering if somebody has any suggestion.
Thank you.
After several times kept trying and stayed running one night, it finally succeeded. I don't know why.
There was a recent update to the codelabs. The error message is due to the timeout in the codelab and not the actual operation. This has been addressed in the update. Please let me know if you are still running into any issues!
I am having a 3 node Akka Cluster and 3 actors are running on each node of the cluster. The cluster is running fine for some 2 hours but after 2 hours I am getting the following warning:
[INFO] [06/07/2018 15:08:51.923] [ClusterSystem-akka.remote.default-remote-dispatcher-6] [akka.tcp://ClusterSystem#192.168.2.8:2552/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FClusterSystem%40192.168.2.7%3A2552-112] No response from remote for outbound association. Handshake timed out after [15000 ms].
[WARN] [06/07/2018 15:08:51.923] [ClusterSystem-akka.remote.default-remote-dispatcher-18] [akka.tcp://ClusterSystem#192.168.2.8:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FClusterSystem%40192.168.2.7%3A2552-8] Association with remote system [akka.tcp://ClusterSystem#192.168.2.7:2552] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://ClusterSystem#192.168.2.7:2552]] Caused by: [No response from remote for outbound association. Handshake timed out after [15000 ms].]
[WARN] [06/07/2018 16:07:06.347] [ClusterSystem-akka.actor.default-dispatcher-101] [akka.remote.PhiAccrualFailureDetector#3895fa5b] heartbeat interval is growing too large: 2839 millis
Edit: The Akka CLuster Managemant Response from the API
{
"selfNode": "akka.tcp://ClusterSystem#127.0.0.1:2551",
"leader": "akka.tcp://ClusterSystem#127.0.0.1:2551",
"oldest": "akka.tcp://ClusterSystem#127.0.0.1:2551",
"unreachable": [
{
"node": "akka.tcp://ClusterSystem#127.0.0.1:2552",
"observedBy": [
"akka.tcp://ClusterSystem#127.0.0.1:2551",
"akka.tcp://ClusterSystem#127.0.0.1:2560"
]
}
],
"members": [
{
"node": "akka.tcp://ClusterSystem#127.0.0.1:2551",
"nodeUid": "105742380",
"status": "Up",
"roles": [
"Frontend",
"dc-default"
]
},
{
"node": "akka.tcp://ClusterSystem#127.0.0.1:2552",
"nodeUid": "-150160059",
"status": "Up",
"roles": [
"RuleExecutor",
"dc-default"
]
},
{
"node": "akka.tcp://ClusterSystem#127.0.0.1:2560",
"nodeUid": "-158907672",
"status": "Up",
"roles": [
"RuleExecutor",
"dc-default"
]
}
]
}
**Edit1: ** Cluster Setup Configuration and Failure Detector Configuration
cluster {
jmx.multi-mbeans-in-same-jvm = on
roles = ["Frontend"]
seed-nodes = [
"akka.tcp://ClusterSystem#192.168.2.9:2551"]
auto-down-unreachable-after = off
failure-detector {
# FQCN of the failure detector implementation.
# It must implement akka.remote.FailureDetector and have
# a public constructor with a com.typesafe.config.Config and
# akka.actor.EventStream parameter.
implementation-class = "akka.remote.PhiAccrualFailureDetector"
# How often keep-alive heartbeat messages should be sent to each connection.
# heartbeat-interval = 10 s
# Defines the failure detector threshold.
# A low threshold is prone to generate many wrong suspicions but ensures
# a quick detection in the event of a real crash. Conversely, a high
# threshold generates fewer mistakes but needs more time to detect
# actual crashes.
threshold = 18.0
# Number of the samples of inter-heartbeat arrival times to adaptively
# calculate the failure timeout for connections.
max-sample-size = 1000
# Minimum standard deviation to use for the normal distribution in
# AccrualFailureDetector. Too low standard deviation might result in
# too much sensitivity for sudden, but normal, deviations in heartbeat
# inter arrival times.
min-std-deviation = 100 ms
# Number of potentially lost/delayed heartbeats that will be
# accepted before considering it to be an anomaly.
# This margin is important to be able to survive sudden, occasional,
# pauses in heartbeat arrivals, due to for example garbage collect or
# network drop.
acceptable-heartbeat-pause = 15 s
# Number of member nodes that each member will send heartbeat messages to,
# i.e. each node will be monitored by this number of other nodes.
monitored-by-nr-of-members = 2
# After the heartbeat request has been sent the first failure detection
# will start after this period, even though no heartbeat message has
# been received.
expected-response-after = 10 s
}
}