PoolingHttpClientConnectionManager - TTL Constructor & related issues - webservicetemplate

I was creating HttpComponentsMessageSender bean as below
#Bean
public HttpComponentsMessageSender reservationHttpComponent() {
HttpComponentsMessageSender httpComponentsMessageSender = new HttpComponentsMessageSender();
httpComponentsMessageSender.setConnectionTimeout(reservationConnectionTimeOut);
httpComponentsMessageSender.setReadTimeout(reservationReadTimeOut);
return httpComponentsMessageSender;
}
Here I was getting Intermittent Read Time Out Issue - Like if I try for 1st time after a 30 mins break I was getting read time out, afterwards all further transactions are successful. If again take 30 mins break then again 1st transaction failed with read time out issue and then all further are successful...
I tried fixing as below code -
#Bean
public HttpComponentsMessageSender reservationHttpComponent() {
RequestConfig requestBuilder = RequestConfig.custom()
.setSocketTimeout(reservationReadTimeOut)
.setConnectionRequestTimeout(reservationConnectionTimeOut)
.setConnectTimeout(reservationConnectionTimeOut)
.setCircularRedirectsAllowed(false)
.build();
org.apache.http.client.HttpClient httpClient = HttpClientBuilder.create()
.setConnectionManager(getConnManager())
.addInterceptorFirst(new HttpComponentsMessageSender.RemoveSoapHeadersInterceptor())
.setDefaultRequestConfig(requestBuilder)
.build();
HttpComponentsMessageSender messageSender = new HttpComponentsMessageSender();
messageSender.setHttpClient(httpClient);
return messageSender;
}
private PoolingHttpClientConnectionManager getConnManager() {
PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager();
connectionManager.setDefaultMaxPerRoute(connectionManagerDefaultMaxPerRoute);
connectionManager.setMaxTotal(connectionManagerMaxTotal);
connectionManager.setDefaultSocketConfig(SocketConfig.custom()
.setSoTimeout(reservationReadTimeOut).build());
return connectionManager;
}
Above has resolved the Read time out issue but then I have started getting below issue-
message:Error Occurred While Retrieving Reservation - I/O error: Timeout waiting for connection from pool; nested exception is org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
I am looking for the fix and then I tried initializing PoolingHttpClientConnectionManager with TTL -
PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager(1, TimeUnit.MINUTES)
Then I ran multiple performance jmx script where I have fired 70 - 80 TPS on my service that have above code for external dependency and all looks good.
However I am not sure if having TTL in PoolingHttpClientConnectionManager constructor is the very appropriate solution.. So here I am looking for suggestions on if this solution can cause any further issue or any this else could that be a better approach than this.

Related

Nest.js hanging after a certain time without any logs

We are currently using Nest.JS as our backend solution for almost 2 years now. It worked pretty fine at all and didn’t gave us any big issues until this last week. As the title describes, it hangs out all on a sudden, without returning us any kind of log. We use it combined with pm2 and it is hosted on a aws t3.medium machine running ubuntu.
We are using the exception filter provided by nest.js in their documentation:
import {
ExceptionFilter,
Catch,
ArgumentsHost,
HttpException,
HttpStatus,
} from '#nestjs/common';
import { HttpAdapterHost } from '#nestjs/core';
#Catch()
export class AllExceptionsFilter implements ExceptionFilter {
constructor(private readonly httpAdapterHost: HttpAdapterHost) {}
catch(exception: unknown, host: ArgumentsHost): void {
// In certain situations `httpAdapter` might not be available in the
// constructor method, thus we should resolve it here.
const { httpAdapter } = this.httpAdapterHost;
const ctx = host.switchToHttp();
const httpStatus =
exception instanceof HttpException
? exception.getStatus()
: HttpStatus.INTERNAL_SERVER_ERROR;
const responseBody = {
statusCode: httpStatus,
timestamp: new Date().toISOString(),
path: httpAdapter.getRequestUrl(ctx.getRequest()),
};
httpAdapter.reply(ctx.getResponse(), responseBody, httpStatus);
}
}
It’s possible to log the reason why the application was shutdown and the specific route that gave us problem? It would be easier to fix that way.
Edit: We also have subscribed to Sentry, but those logs doesn't reach it as well.

PubSub "Connection reset by peer" on gcp

I encountered the exception.
"System.IO.IOException: Unable to read data from the transport connection: Connection reset by peer.\n ---> System.Net.Sockets.SocketException (104): Connection reset by peer\n --- End of inner exception stack trace ---\n at Google.Cloud.PubSub.V1.SubscriberClientImpl.SingleChannel.HandleRpcFailure(Exception e)\n at Google.Cloud.PubSub.V1.SubscriberClientImpl.SingleChannel.HandlePullMoveNext(Task initTask)\n at Google.Cloud.PubSub.V1.SubscriberClientImpl.SingleChannel.StartAsync()\n at Google.Cloud.PubSub.V1.Tasks.ForwardingAwaiter.GetResult()\n at Google.Cloud.PubSub.V1.Tasks.Extensions.<>c__DisplayClass4_0.<g__Inner|0>d.MoveNext()\n--- End of stack trace from previous location ---\n
"Invoke" function was executed to pull a message from my topic per 5 seconds in the scheduler
public async Task Invoke()
{
var subscriber = await SubscriberClient.CreateAsync(CreateSubscriptionName());
await subscriber.StartAsync((msg, cancellationToken) =>
{
//....
return Task.FromResult(SubscriberClient.Reply.Ack);
});
await subscriber.StopAsync(CancellationToken.None);
}
How did I fix this ?
Thanks!
I've already checked the doc
PublisherClient and SubscriberClient are expensive to create, so when regularly publishing or subscribing to the same topic or subscription then a singleton client instance should be created and used for the lifetime of the application.
But I still don't know how to do ...
I guessed I left too many open connections ?

AWS .Net Core SDK Simple Email Service Suppression List Not Working

I am trying to retrieve the SES account-level suppression list using AWS SDK in .Net Core:
Below is my code:
public class SimpleEmailServiceUtility : ISimpleEmailServiceUtility
{
private readonly IAmazonSimpleEmailServiceV2 _client;
public SimpleEmailServiceUtility(IAmazonSimpleEmailServiceV2 client)
{
_client = client;
}
public async Task<ListSuppressedDestinationsResponse> GetSuppressionList()
{
ListSuppressedDestinationsRequest request = new ListSuppressedDestinationsRequest();
request.PageSize = 10;
ListSuppressedDestinationsResponse response = new ListSuppressedDestinationsResponse();
try
{
response = await _client.ListSuppressedDestinationsAsync(request);
}
catch (Exception ex)
{
Console.WriteLine("ListSuppressedDestinationsAsync failed with exception: " + ex.Message);
}
return response;
}
}
But it doesn't seem to be working. The request takes too long and then returns empty response or below error if I remove try/catch:
An unhandled exception occurred while processing the request.
TaskCanceledException: A task was canceled.
System.Threading.Tasks.TaskCompletionSourceWithCancellation<T>.WaitWithCancellationAsync(CancellationToken cancellationToken)
TimeoutException: A task was canceled.
Amazon.Runtime.HttpWebRequestMessage.GetResponseAsync(CancellationToken cancellationToken)
Can anyone please guide if I am missing something?
Thank you!
I have tested your code and everything works correctly.
using Amazon;
using Amazon.SimpleEmailV2;
using Amazon.SimpleEmailV2.Model;
internal class Program
{
private async static Task Main(string[] args)
{
var client = new AmazonSimpleEmailServiceV2Client("accessKeyId", "secrectAccessKey", RegionEndpoint.USEast1);
var utility = new SimpleEmailServiceUtility(client);
var result = await utility.GetSuppressionList();
}
}
<PackageReference Include="AWSSDK.SimpleEmailV2" Version="3.7.1.127" />
Things that you can check:
Try again, maybe it was a temporary problem.
Try with the latest version that I am using(if not already)
How far are you from the region that you try to get the list? Try making the same request from an EC2 instance in that region.
Finally found the issue, I was using awsConfig.DefaultClientConfig.UseHttp = true;' in startup` which was causing the issue. Removing it fixed the issue and everything seems to be working fine now.

GCP Pub/Sub: Acknowledgement deadline doesn't work when reading messages

Creating a subscription on GCP Pub Sub with "Acknowledgement deadline" set as its max value (600 seconds), my java client in spring boot, keeps receiving messages every 60 seconds in case the task is still running.
pub/sub subscription
We have a simple consumer very similar to this:
https://spring.io/guides/gs/messaging-gcp-pubsub/
We need to perform operations which can run for some minutes each, but even setting 600 seconds, if the task hasn't finished after 60 seconds, the message arrives again.
Doesn't someone experienced something similar?
Thanks
Update:
These are the main dependencies:
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.1.10.RELEASE</version>
</parent>
<spring-cloud-gcp.version>1.2.6.RELEASE</spring-cloud-gcp.version>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-gcp-starter-pubsub</artifactId>
</dependency>
And this is the setup in the consumer (it's exactly as the spring boot guide linked above explains):
#Bean
public PubSubInboundChannelAdapter messageChannelAdapter(
#Qualifier("pubsubInputChannel") MessageChannel inputChannel,
PubSubTemplate pubSubTemplate) {
LOGGER.info("pubsubInputChannel");
PubSubInboundChannelAdapter adapter =
new PubSubInboundChannelAdapter(pubSubTemplate, subscription);
adapter.setOutputChannel(inputChannel);
adapter.setAckMode(AckMode.AUTO_ACK);
return adapter;
}
#Bean
public MessageChannel pubsubInputChannel() {
return new DirectChannel();
}
#Bean
#ServiceActivator(inputChannel = "pubsubInputChannel")
public MessageHandler messageReceiver() {
return message -> {
String json = new String((byte[]) message.getPayload());
LOGGER.info("Message arrived! Payload: " + json);
//Main code here
//Operations might take some minutes to be finished
};
}

Azure Webjob, KeyVault Configuration extension, Socket Error

Need some help to determine if this is a bug in my code or in the config kevault extensions.
I have a netcore console based webjob. all working fine until a few weeks ago when we stated getting occasional startup errors which were Socket Error 10060 - Socket timed out or "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond"
These were all related to loading configuration layers (app settings, env, command line and keyvault). The errors stemmed from the keyvault once the build was executed on the hostbuilder.
I initially added the retry policy with the default HttpStatusCodeErrorDetectionStrategy and an exponential back-off but this is not executing.
finally I added my own retry policy with my own detection strategy (see below). Still not being fired.
I have stripped down the code to a hello world like example and included the messages from the webjob.
Here is the code summary:
Main
public static async Task<int> Main(string[] args)
{
var host = CreateHostBuilder(args)
.UseConsoleLifetime()
.Build();
using var serviceScope = host.Services.CreateScope();
var services = serviceScope.ServiceProvider;
//**stripped down to logging just for debug
var loggerFactory = host.Services.GetRequiredService<ILoggerFactory>();
var logger = loggerFactory.CreateLogger("Main");
logger.LogDebug("Hello Test App Started OK. Exiting.");
//**Normally lots of service calls go here to do real work**
return 0;
}
HostBuilder - why hostbuilder? We use lots of components that are built for webapi and webapps so it was convenient to use a similar services model.
public static IHostBuilder CreateHostBuilder(string[] args)
{
var host = Host
.CreateDefaultBuilder(args)
.ConfigureAppConfiguration((ctx, config) =>
{
//override with keyvault
var azureServiceTokenProvider = new AzureServiceTokenProvider(); //this is awesome - it will use MSI or Visual Studio connection
var keyVaultClient = new KeyVaultClient(new KeyVaultClient.AuthenticationCallback(azureServiceTokenProvider.KeyVaultTokenCallback));
var retryPolicy = new RetryPolicy<ServerErrorDetectionStrategy>(
new ExponentialBackoffRetryStrategy(
retryCount: 5,
minBackoff: TimeSpan.FromSeconds(1.0),
maxBackoff: TimeSpan.FromSeconds(16.0),
deltaBackoff: TimeSpan.FromSeconds(2.0)
)
);
retryPolicy.Retrying += RetryPolicy_Retrying;
keyVaultClient.SetRetryPolicy(retryPolicy);
var prebuiltConfig = config.Build();
config.AddAzureKeyVault(prebuiltConfig.GetSection("KeyVaultSettings").GetValue<string>("KeyVaultUri"), keyVaultClient, new DefaultKeyVaultSecretManager());
config.AddCommandLine(args);
})
.ConfigureLogging((ctx, loggingBuilder) => //note - this is run AFTER app configuration - whatever the order it is in.
{
loggingBuilder.ClearProviders();
loggingBuilder
.AddConsole()
.AddDebug()
.AddApplicationInsightsWebJobs(config => config.InstrumentationKey = ctx.Configuration["APPINSIGHTS_INSTRUMENTATIONKEY"]);
})
.ConfigureServices((ctx, services) =>
{
services
.AddApplicationInsightsTelemetry();
services
.AddOptions();
});
return host;
}
Event - this is never fired.
private static void RetryPolicy_Retrying(object sender, RetryingEventArgs e)
{
Console.WriteLine($"Retrying, count = {e.CurrentRetryCount}, Last Exception={e.LastException}, Delay={e.Delay}");
}
Retry Policy - only fires for the non-MSI attempt to contact the keyvault.
public class ServerErrorDetectionStrategy : ITransientErrorDetectionStrategy
{
public bool IsTransient(Exception ex)
{
if (ex != null)
{
Console.WriteLine($"Exception {ex.Message} received, {ex.GetType()?.FullName}");
HttpRequestWithStatusException httpException;
if ((httpException = ex as HttpRequestWithStatusException) != null)
{
switch(httpException.StatusCode)
{
case HttpStatusCode.RequestTimeout:
case HttpStatusCode.GatewayTimeout:
case HttpStatusCode.InternalServerError:
case HttpStatusCode.ServiceUnavailable:
return true;
}
}
SocketException socketException;
if((socketException = (ex as SocketException)) != null)
{
Console.WriteLine($"Exception {socketException.Message} received, Error Code: {socketException.ErrorCode}, SocketErrorCode: {socketException.SocketErrorCode}");
if (socketException.SocketErrorCode == SocketError.TimedOut)
{
return true;
}
}
}
return false;
}
}
WebJob Output
[SYS INFO] Status changed to Initializing
[SYS INFO] Run script 'run.cmd' with script host - 'WindowsScriptHost'
[SYS INFO] Status changed to Running
[INFO]
[INFO] D:\local\Temp\jobs\triggered\HelloWebJob\42wj5ipx.ukj>dotnet HelloWebJob.dll
[INFO] Exception Response status code indicates server error: 401 (Unauthorized). received, Microsoft.Rest.TransientFaultHandling.HttpRequestWithStatusException
[INFO] Exception A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. received, System.Net.Http.HttpRequestException
[ERR ] Unhandled exception. System.Net.Http.HttpRequestException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
[ERR ] ---> System.Net.Sockets.SocketException (10060): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
[ERR ] at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
[ERR ] --- End of inner exception stack trace ---
[ERR ] at Microsoft.Rest.RetryDelegatingHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[ERR ] at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
[ERR ] at Microsoft.Azure.KeyVault.KeyVaultClient.GetSecretWithHttpMessagesAsync(String vaultBaseUrl, String secretName, String secretVersion, Dictionary`2 customHeaders, CancellationToken cancellationToken)
[ERR ] at Microsoft.Azure.KeyVault.KeyVaultClientExtensions.GetSecretAsync(IKeyVaultClient operations, String secretIdentifier, CancellationToken cancellationToken)
[ERR ] at Microsoft.Extensions.Configuration.AzureKeyVault.AzureKeyVaultConfigurationProvider.LoadAsync()
[ERR ] at Microsoft.Extensions.Configuration.AzureKeyVault.AzureKeyVaultConfigurationProvider.Load()
[ERR ] at Microsoft.Extensions.Configuration.ConfigurationRoot..ctor(IList`1 providers)
[ERR ] at Microsoft.Extensions.Configuration.ConfigurationBuilder.Build()
[ERR ] at Microsoft.Extensions.Hosting.HostBuilder.BuildAppConfiguration()
[ERR ] at Microsoft.Extensions.Hosting.HostBuilder.Build()
[ERR ] at HelloWebJob.Program.Main(String[] args) in C:\Users\mark\Source\Repos\HelloWebJob\HelloWebJob\Program.cs:line 21
[ERR ] at HelloWebJob.Program.<Main>(String[] args)
[SYS INFO] Status changed to Failed
[SYS ERR ] Job failed due to exit code -532462766
This is an issue in the KV connectivity which is identified by the PG. Below is an official statement from Product Group:
The Microsoft Azure App Service Team has identified an issue with the
Key Vault references for App Service and Azure Functions feature
related to intermittent failure to resolve references at runtime.
Engineers identified a regression in the system that reduced the
performance and availability of our scale unit’s ability to retrieve
key vault references at runtime. A patch has been written and deployed
to our fleet of VMs to mitigate this issue.
We are continuously taking steps to improve the Azure Web App service
and our processes to ensure such incidents do not occur in the future,
and in this case, it includes (but is not limited to): Improving
detection and testing of performance and availability of the Key Vault
App Setting References feature Improvements to our platform to ensure
high availability of this feature at runtime. We apologize for any
inconvenience.
For almost everyone, updating packages to the new Microsoft.Azure packages has mitigated this issue, so trying those would be my first suggestion.
Thanks #HarshitaSingh-MSFT, makes sense though I searched for this when I had the problem and couldn't find it.
As a work around, I added some basic retry code to the startup.
Main looks like this for now:
public static async Task<int> Main(string[] args)
{
IHost host = null;
int retries = 5;
while (true)
{
try
{
Console.WriteLine("Building Host...");
var hostBuilder = CreateHostBuilder(args)
.UseConsoleLifetime();
host = hostBuilder.Build();
break;
}
catch (HttpRequestException hEx)
{
Console.WriteLine($"HTTP Exception in host builder. {hEx.Message}, Name:{hEx.GetType().Name}");
SocketException se;
if ((se = hEx.InnerException as SocketException) != null)
{
if (se.SocketErrorCode == SocketError.TimedOut)
{
Console.WriteLine($"Socket error in host builder. Retrying...");
if (retries > 0)
{
retries--;
await Task.Delay(5000);
host?.Dispose();
}
else
{
throw;
}
}
else
{
throw;
}
}
}
}
using var serviceScope = host.Services.CreateScope();
var services = serviceScope.ServiceProvider;
var transferService = services.GetRequiredService<IRunPinTransfer>();
var result = await transferService.ProcessAsync();
return result;
}