Timeout for incomplete HTTP requests - istio

I'm new to Envoy Filters and I want to set the timeout period for incomplete HTTP requests to be equal to or less than 40 seconds.
Envoy Filters provides HTTP connection manager extension where you can do :
common_http_protocol_options:
idle_timeout: 40s
I'm not very confident if this is the solution to what I'm trying to implement since a lot of concepts are new to me. Your help is very appreciated.

Related

How to specify AWS SQS request timeout with C++ API?

I'm using AWS SQS and making a GetQueueURL request at startup. The problem is that if there is no network connection, the request blocks and doesn't time out until more than 400 seconds have passed (using default ClientConfiguration parameters).
The only timeout parameters I can see in ClientConfiguration are:
httpRequestTimeoutMs
requestTimeoutMs
connectTimeoutMs
However, changing these seem to have no effect in the case I mentioned above. I tried setting all three to 1000ms and, without a network connection, the GetQueueUrl request still blocks for several minutes until it times out.
If it helps, when the timeout does occur, the error I get on the request is: curlCode: 28, Timeout was reached
What am I missing here? Thanks!

AWS HTTP API Gateway 503 Service Unavailable

I have an HTTP API Gateway with a HTTP Integration backend server on EC2. The API has lots of queries during the day and looking at the logs i realized that the API is returning sometimes a 503 HTTP Code with a body:
{ "message": "Service Unavailable" }
When i found out this, i tried the API and running the HTTP requests many times on Postman, when i try twenty times i get at least one 503.
I then thought that the HTTP Integration Server was busy but the server is not loaded and i tried going directly to the HTTP Integration Server and i get 200 responses all the times.
The timeout parameter is set to 30000ms and the endpoint average response time is 200ms so timeout is not a problem. Also the HTTP 503 is not after 30 seconds of the request but instantly.
Can anyone help me?
Thanks
I solved this issue by editing the keep-alive connection parameters of my internal integration server. The AWS API Gateway needs the keep alive parameters on a standard configuration, so I started tweaking my NGINX server parameters until I solved the issue.
Had the same issue on a selfmade Microservice with Node that was integrated into AWS API-Gateway. After some reconfiguration of the Cloudwatch-Logs I got further indicator on what is wrong: INTEGRATION_NETWORK_FAILURE
Verify your problem is alike - i.e. through elaborated log output
In API-Gateway - Logging add more output in "Log format"
Use this or similar content for "Log format":
{"httpMethod":"$context.httpMethod","integrationErrorMessage":"$context.integrationErrorMessage","protocol":"$context.protocol","requestId":"$context.requestId","requestTime":"$context.requestTime","resourcePath":"$context.resourcePath","responseLength":"$context.responseLength","routeKey":"$context.routeKey","sourceIp":"$context.identity.sourceIp","status":"$context.status","errMsg":"$context.error.message","errType":"$context.error.responseType","intError":"$context.integration.error","intIntStatus":"$context.integration.integrationStatus","intLat":"$context.integration.latency","intReqID":"$context.integration.requestId","intStatus":"$context.integration.status"}
After using API-Gateway Endpoint and failing consult the logs again - should be looking like that:
Solve in NodeJS Microservice (using Express)
Add timeouts for headers and keep-alive on express servers socket configuration when upon listening.
const app = require('express')();
// if not already set and required to advertise the keep-alive through HTTP-Response you might want to use this
/*
app.use((req: Request, res: Response, next: NextFunction) => {
res.setHeader('Connection', 'keep-alive');
res.setHeader('Keep-Alive', 'timeout=30');
next();
});
*/
/* ..you r main logic.. */
const server = app.listen(8080, 'localhost', () => {
console.warn(`⚡️[server]: Server is running at http://localhost:8080`);
});
server.keepAliveTimeout = 30 * 1000; // <- important lines
server.headersTimeout = 35 * 1000; // <- important lines
Reason
Some AWS Components seem to demand a connection kept alive - even if server responding otherwise (connection: close). Upon reusage in API Gateway (and possibly AWS ELBs) the recycling will fail because other-side most likely already closed hence the assumed "NETWORK-FAILURE".
This error seems intermittent - since at least the API-Gateway seems to close unused connections after a while providing a clean execution the next time. I can only assume they do that for high-performance and not divert to anything less.

Go: i/o timeout reading a request proxied by AWS ALB

We have a pretty standard API server written in Go. The http handler unmarshals the request body (JSON) into a protobuf struct, and sending it off to be processed.
The service is deployed as ECS containers on AWS, fronted by ALB. The service has a pretty high request volume, and we observed that about 0.2% requests fail with messages like this:
read tcp $ECS_CONTAINER_IP:$PORT->$REMOTE_IP:$REMOTE_RANDOM_PORT: i/o timeout
We tracked it down that the error is returned from jsonpb.Unmarshal method. Our tracing tells us that all these i/o timeout requests take 5s.
Within the container, we did ss -t | wc -l to see in flight requests, and the number is quite reasonable (about 200-300ish, which is far lower than the no-file ulimit).
Some quick awk/sort/uniq tells us that the number of inflight requests coming from the ALBs are roughly balanced.
Any idea how do we go on from here?

Elastic search 403 Request throttled due to too many requests /_bulk

I am trying to sync 1 million record to ES, and I am doing it using bulk API in batch of 2k.
But after inserting around 25k-32k, elastic search is giving following exception.
Unable to parse response body: org.elasticsearch.ElasticsearchStatusException
ElasticsearchStatusException[Unable to parse response body]; nested: ResponseException[method [POST], host [**********], URI [/_bulk?timeout=1m], status line [HTTP/1.1 403 Request throttled due to too many requests]
403 Request throttled due to too many requests /_bulk]; nested: ResponseException[method [POST], host [************], URI [/_bulk?timeout=1m], status line [HTTP/1.1 403 Request throttled due to too many requests]
403 Request throttled due to too many requests /_bulk];
I am using aws elastic search.
I think, I need to implement wait strategy to handle it, something like keep checking es status and call bulk insert if status all of ES okay.
But not sure how to implement it? Does ES offers anything pre-build for it?
Or Anything better way to handle this?
Thanks in advance.
Update:
I am using AWS elastic search version 6.8
Thanks #dravit for including my previous SO answer in the comment, after following the comments it seems OP wants to improve the performance of bulk indexing and want exponential backoff, which i don't think Elasticsearch provides out of the box.
I see that you are putting a pause of 1 second after every second which will not work in all the cases, and if you have large number of batches and documents to be indexed, for sure it will take a lot of time. There are few more suggestions from my side to improve the performance.
Follow my tips to improve the reindex speed in Elasticsearch and see what all things listed here is applicable and doing them improves speed by what factor.
Find a batching strategy which best suits to your environment, I am not sure but this article from #spinscale who is the developer of java high level rest client might help or you can ask a question on https://discuss.elastic.co/, I remembered he shared a very good batching strategy in one of his webinar but couldn't find the link of it.
Notice various ES metrics apart from bulk threadpool and queue size, and see if your ES still has capacity can you increase the queue size and increase the rate by which you can send requests to ES.
Check the error handling guide here
If you receive persistent 403 Request throttled due to too many requests or 429 Too Many Requests errors, consider scaling vertically. Amazon Elasticsearch Service throttles requests if the payload would cause memory usage to exceed the maximum size of the Java heap.
Scale your application vertically or increase delay between requests.

Google Places API error 502 - The server encountered a temporary error

we run a website that obtains location data through the Google Place API. We have 150k daily searches available, which we haven´t met yet as the website has been live for few weeks only. We have suddenly received a 502 error. A notification in the Console says: “The server encountered a temporary error and could not complete your request.”. Is this a temporary error? Is there any suggestions on what we can do? The website hasn’t been available for 40 minutes.
When you receive 5xx status or UNKNOWN_ERROR in the response, you should implement a retrying logic. Google has a following recommendation in their web services documentation:
In rare cases something may go wrong serving your request; you may receive a 4XX or 5XX HTTP response code, or the TCP connection may simply fail somewhere between your client and Google's server. Often it is worthwhile re-trying the request as the followup request may succeed when the original failed. However, it is important not to simply loop repeatedly making requests to Google's servers. This looping behavior can overload the network between your client and Google causing problems for many parties.
A better approach is to retry with increasing delays between attempts. Usually the delay is increased by a multiplicative factor with each attempt, an approach known as Exponential Backoff.
https://developers.google.com/maps/documentation/directions/web-service-best-practices#exponential-backoff
However, if retrying logic with Exponential Backoff doesn't help and the error persists for a long time you should file a bug in Google issue tracker
I hope this addresses your doubt!
UPDATE
There was an issue on Google side yesterday (November 6, 2017), you can refer to the following bug that explains the issue:
https://issuetracker.google.com/issues/68938173