JMeter Test Result - Connect time is 100% of request (HTTP) sample time - google-cloud-platform

I am using JMeter 5.1 to conduct performance testing (HTTP requests), the system tested is web application on Google Cloud Platform. For JMeter, I configured 10 threads(users), and got error messages for failed request: failed: Connection timed out (Connection timed out)
and check result in "View Results in Table" found Connect time = Sample time and there is no latency time.
Under this condition, What should I try to find the root cause. Is there any analyze direction or method?

You can take a look to this link it explain some causes related to your issue.
In addition, if you are using GCP HTTP(S) Load balancer, you should take a look to stackdriver logs. That will provide more clues for your issue. You may need to change the timeout values for your load balancer or your backend.

This is due to TCP connection timeout, if JMeter fails to get the response within the bounds of the TCP Handshake it will fail the associated HTTP Request sampler.
Latency (also known as TTFB) would be 0 because there is no any response from the server.
same for sent bytes, same for header size in bytes.
If for your application response times over 2 minutes is acceptable and expected you can increase Connect timeout, the relevant setting lives under "Advanced" tab of the HTTP Request sampler (or even better HTTP Request Defaults)
For example this is how you can increase the Connect timeout to 5 minutes:
With regards to root cause, the reasons are numerous, i.e.:
wrong protocol/port combination
the port is blocked by OS firewall or GCP firewall
the application or VM is overloaded and cannot respont
incorrect configuration of application server/database/other middleware
etc.

Related

libcurl: send GET requests after timeout limit is reached

Problem:
OS: Ubuntu 20.04.1 LTS
When a target URL updates its content, recently libcurl has had unexpected polling delays / timeouts anywhere between 2 and 20+ seconds between sending a GET request to the target URL and receiving any response.
I have no idea what has been causing this behaviour, and have detailed all of the strace reports, tshark results, entire libcurl C++ program, attempts to diagnose, and other terminal outputs at the following SO question, but have had no luck in diagnosing this for about four months:
libcurl: abnormal GET response delays
There seems to be something between the client server and remote server that is stopping packets from being returned, but only when the page changes its content. During this polling delay / timeout, no other requests can be sent - therefore any new data uploaded on the remote server cannot be retrieved quickly.
This issue did not exist before mid-July 2021. Given that after four months this problem still hasn't been solved, I want to attempt a workaround that will still send requests to the target when this polling delay presents itself. I won't understand what caused the polling timeouts, but hopefully I will be able to retrieve the data without delays like the program used to do.
Target URL: https://ir.eia.gov/wpsr/table4.csv
Summary questions:
Q1. Is there a timeout option with libcurl that, when exceeded, the program does not exit but instead sends another GET request to the target URL?
Q2. Since this problem only arises when the target URL makes a scheduled content update, could there be a chance the target URL changes its IP address and thus there is some delay caused by a DNS resolution server in between the client and remote side on the return leg? I am going to attempt to use a tool like pingPlotter to see if there is a delay at some specific IP address between the outbound GET request and the response.
Before any scheduled page content changes, the latency between the outbound GET request and the response is <100 ms.

Got an error reading communication packets in Google Cloud SQL

From 31th March I've got following error in Google Cloud SQL:
Got an error reading communication packets.
I have been using Google Cloud SQL for 2 years, but never faced with such problem.
I'm very worried about it.
This is detail error message:
textPayload: "2019-04-29T17:21:26.007574Z 203385 [Note] Aborted connection 203385 to db: {db_name} user: {db_username} host: 'cloudsqlproxy~{private ip}' (Got an error reading communication packets)"
While it is true that this error message often occurs after a maintenance period, it isn't necessarily a cause for concern as this is a known behavior by MySQL.
Possible explanations about why this issue is happening are :
The large increase of connection requests to the instance, with the
number of active connections increasing over a short period of time.
The freezing / unavailability of the instance can also occur due to
the burst of connections happening in a very short time interval. It
is observed that this freezing always happens with an increase of
connection requests. This increase in connections causes the
instance to be overloaded and hence unavailable to respond to
further connection requests until the number of connections
decreases or the instance stabilizes.
The server was too busy to accept new connections.
There were high rates of previous connections that were not closed
correctly.
The client terminated it abnormally.
readTimeout setting being set too low in the MySQL driver.
In an excerpt from the documentation, it is stated that:
There are many reasons why a connection attempt might not succeed.
Network communication is never guaranteed, and the database might be
temporarily unable to respond. Make sure your application handles
broken or unsuccessful connections gracefully.
Also a low Cloud SQL Proxy version can be the reason for such
incident issues. Possible upgrade to the latest version of (v1.23.0)
can be a troubleshooting solution.
IP from where you are trying to connect, may not be added to the
Authorized Networks in the Cloud SQL instance.
Some possible workaround for this issue, depending which is your case could be one of the following:
In the case that the issue is related to a high load, you could
retry the connection, using an exponential backoff to prevent
from sending too many simultaneous connection requests. The best
practice here is to exponentially back off your connection requests
and add randomized backoffsto avoid throttling, and potentially
overloading the instance. As a way to mitigate this issue in the
future, it is recommended that connection requests should be
spaced-out to prevent overloading. Although, depending on how you
are connecting to Cloud SQL, exponential backoffs may already be in
use by default with certain ORM packages.
If the issue could be related to an accumulation of long-running
inactive connections, you would be able to know if it is your case
using show full processliston your database looking for
the connections with high Time or connections where Command is
Sleep.
If this is your case you would have a few possible options:
If you are not using a connection pool you could try to update the client application logic to properly close connections immediately at the end of an operation or use a connection pool to limit your connections lifetime. In particular, it is ideal to manage the connection count by using a connection pool. This way unused connections are recycled and also the number of simultaneous connection requests can be limited through the use of the maximum pool size parameter.
If you are using a connecting pool, you could return the idle connections to the pool immediately at the end of an operation and set a shorter timeout by adjusting wait_timeout or interactive_timeoutflag values. Set CloudSQL wait_timeout flag to 600 seconds to force refreshing connections.
To check the network and port connectivity once -
Step 1. Confirm TCP connectivity on port 3306 with tcptraceroute or
netcat.
Step 2. If [Step 1] succeeded then try to check if there are any
errors in using mysql client to check timeout/error.
When the client might be terminating the connection abruptly you
could check for:
If the MySQL client or mysqld server are receiving a packet bigger
than max_allowed_packet bytes, or the client receiving a packet
too large message,if it so you could send smaller packets or
increase the max_allowed_packet flag value on both client
and server. If there are transactions that are not being properly
committed using both "begin" and "commit", there is the need to
update the client application logic to properly commit the
transaction.
There are several utilities that I think will be helpful here,
if you can install mtr and the tcpdump utilities to
monitor the packets during these connection-increasing events.
It is strongly recommended to enable the general_log in the
database flags. Another suggestion is to also enable the slow_query
database flag and output to a file. Also have a look at this
GitHub issue comment and go through the list of additional
solutions proposed for this issue here
This error message indicates a connection issue, either because your application doesn't terminate connections properly or because of a network issue.
As suggested in these troubleshooting steps for MySQL or PostgreSQL instances from the GCP docs, you can start debugging by checking that you follow best practices for managing database connections.

AWS, Load Balancer 504 error after a few requests

I am repeating a question that I posted at https://forums.aws.amazon.com/thread.jspa?threadID=275855&tstart=0
to reach out more people.
Hi,
I am trying to deploy a REST service in AWS. The current architecture is:
Domain name (Route 53) -> Load Balancer -> Single EC2 instance (bound to an Elastic IP). And I use TLS/SSL certificate issued by a Certificate Manager.
The instance is Ubuntu 16.04 machine, and the service is implemented with (bare) Vert.X (==no proxy server).
However, 504 Error (gateway timeout) occurs after a few different requests (each of which takes <1s) in a series, and then it does not respond. The requests do not reach the server instance after a few requests. I checked that it happens in the same way when I access both the domain name and the load balancer directly. I have confirmed that the exact same scenario is working with direct URL.
I run up a dummy server returning "hello world" and it's working okay with the load balancer. The problem should be caused by something no coherent between the load balancer and the server code, but I can't get where to start.
I have checked several threads complaining the 504 errors, and followed some of the instructions, but they do not work. Especially I set keep-alive option in Vert.x and set the idle time longer than the balancer's. As the delays are not longer than the idel time with the direct communication, I believe it is not the problem anyway. I have checked the Security Groups also and confirmed the right ports are open. (The first few requests are working, so it must not be the problem also.)
Does any of you have a sense where I should start looking at? Even better, know the source of the problem?
Thanks in advance.
EDIT: I just found the issue in some of the code. I've answered myself below. Thanks for reading!
Found the issue in my code. Some of the APIs (implemented by my colleague...) was not flushing the buffer of HTTP responses in the server.
In Vert.X Java, it was resp.end().
It was somehow working with direct access probably the buffer was flushed at some point, but that flush seems not caught by the load balancer.
Hope nobody experiences this, but in case...

WebRequest timed out in windows service

I use WebRequest in a client to consume a web service on Internet. Each request is triggered in a separate thread.
It works well if hosting the client in IIS. But most of the requests will get timed out error if the client is hosted in a windows service.
When I tried to debug the problem using Fiddler, the WebRequest worked well as all traffic went through 127.0.0.1:8888
Without Fiddler, the traffic goes to Internet directly through a random port, and the time out problem hits again.
The windows service runs under Local System account.
Why do I get time out if the client is in windows service without using a proxy?
Update: My original question wasn't clear. The requests are made concurrently (or at a very short interval). This is to do with the connection limit in the ServicePoint class. By default only 2 connections are allowed to the same external destination. If the destination is local, the limit will be int.Max value. That's why fiddler can magically fix the problem with the proxy. So I manually set the DefaultConnectionLimit to 100 and the requests are on wire.
Adjusting HttpWebRequest Connection Timeout in C#
The most common source of problems that is "magically" fixed by running Fiddler is when your .NET code fails to call Close() on the object returned by GetResponseStream(). See http://www.telerik.com/automated-testing-tools/blog/13-02-28/help-running-fiddler-fixes-my-app.aspx for more details.

How would Zeus(load balancer) handle the closed connection

Currently, we run into one problem with timeout issue. Our application is based on Jetty and uses Zeus as load balancing. The maxIdleTime is set as default value 30000 in jetty.xml. When a request/connection exceeds 30 seconds, the connection status will change to TIME_WAIT, but we get the HTTP 500 Internal Error in the browser side.
I guess the HTTP 500 error comes from Zeus but I want to confirm this: how would Zeus handle the closed connection?
OR
The jetty service sends 500 to Zeus? If so, how can I confirm this?
The surefire way to iron out what is happening here is to sniff the packets using something like ethereal or tcpdump between the load balancer and the jetty server, and you can use the network tooling in something like firebug or the chrome developer tools to see what is happening on that side of the connection. You can also turn on debug on the jetty side to see what it is doing specifically.
Regardless, if your hitting your timeout settings then you need to either increase those settings or decided on a proper strategy to deal with them to avoid this sort of issue, assuming you don't want that 500 error on the browser.