Frequent HttpRequestException when talking to Cosmos DB from Web App - web-services

We have a web service (Azure App Service) deployed to Azure that talks to our Azure Cosmos DB via the standard C# SDK for Cosmos DB/Document DB.
Both - app service and Cosmos DB account/collections - are in the same resource group and in the same location in Azure.
For certain bulk operations where the web service performs a burst of requests to Cosmos DB we are frequently getting errors in the web service when talking to the database:
System.AggregateException: One or more errors occurred. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: An attempt was made to access a socket in a way forbidden by its access permissions
at System.Net.Sockets.Socket.DoBind(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.InternalBind(EndPoint localEP)
at System.Net.Sockets.Socket.BeginConnectEx(EndPoint remoteEP, Boolean flowContext, AsyncCallback callback, Object state)
at System.Net.Sockets.Socket.UnsafeBeginConnect(EndPoint remoteEP, AsyncCallback callback, Object state)
at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Exception& exception)
--- End of inner exception stack trace ---
at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at System.Net.Http.HttpClientHandler.GetResponseCallback(IAsyncResult ar)
--- End of inner exception stack trace ---
[...]
at Microsoft.Azure.Documents.Query.ProxyDocumentQueryExecutionContext.<ExecuteNextAsync>d__0.MoveNext()<---
Each of our ApiController instances statically allocates a single repository class, which in turn fetches a IReliableReadWriteDocumentClient instance in its constructor and holds it for its entire lifetime via
IDocumentDbInitializer dbinit = new DocumentDbInitializer();
Client = dbinit.GetClient(endpointUrl, myAuthKey, connectionPolicy);
So in my understanding we should use only 2 Document DB clients for our 2 repositories in the entire web service.
Things we've tried so far:
throttle the requests on the client during the bulk operation to less than 3/s
reduce the clients ConnectionPolicy.MaxConnectionLimit from default (50) to 20
increase the apps ServicePointManager.DefaultConnectionLimit
None of these measures significantly reduced the number of exceptions we're experiencing.
Any suggestions how to avoid this error in the first place?
Additional Cosmos DB SDK functionality to tune/configure/adapt for our use case..?

Related

Spring Data Neo4J - Unable to acquire connection from pool within configured maximum time

We have a Reactive REST API using Spring Data Neo4j (SpringBoot v2.7.5) deployed to Kubernetes. When running a stress test to determine the breaking point, once the volume of requests that the service can handle has been breached, the service does not auto-recover, even after the load has dropped to a level at which the service can handle.
After the service has fallen over the Neo4J health indicator shows:
“org.neo4j.driver.exceptions.ClientException: Unable to acquire connection from the pool within configured maximum time of 60000ms”
With respect to connection/configuration settings we are using defaults configured by SDN.
Observations:
Up until the point at which the service breaks only a small number of connections are utilised, at the point at which it breaks the connections in use jumps up to the max pool size and the above mentioned error is observed. No matter how much time passes (even well beyond the max connection lifetime) the service is unable to acquire a connection from the pool. Upon manually shutting down and restarting the service/pod the service returns to a healthy state.
As an interim solution we now check the Neo4J health indicator, if the mentioned error is present the liveness state is set to down which triggers Kubernetes to restart the service automatically. However, I’m wondering if there is an underlying issue with the connections in the pool not getting ‘cleaned up’?
You can take a look at this discussion https://github.com/spring-projects/spring-data-neo4j/issues/2632
I had the same issue. The problem is that either Spring Framework or Neo4j reactive transaction manager doesn't close connections in a complex reactive flow e.g. when there are a lot of inner calls/mappings and somewhere inside an exception is thrown.
So as a workaround you can add #Transactional in such places to avoid multiple transactions to be created.

Wildfly fail to respond to some of the requests when there are too many requests

I am using Wildfly 9.0. I am developing a web program with two main parts.
One is submitting the request the server itself's webservice, and another one is a web service which receive request and work on it with the database.
The whole flow is like
1.user open a webpage rendered from the server and submit some information
2.the information is submitted to the server
3.the server then submit a HTTP request to the web service of the same server
4.the web service handle the request and reply.
5.the resulting information is then rendered to the user
For example, some of the requests are generated like this in the server to the web service...
String url = "http://127.0.0.1:8080/myPathOfRFWS/someWS";
HttpUriRequest httpReq = new HttpPost(url);
httpReq.addHeader("userKey", "someString");
try (DefaultHttpClient client = new DefaultHttpClient();) {
HttpResponse resp = client.execute(httpReq);
return resp;
} catch (Exception e) {
logException(e);
}
The code work perfectly fine under normal situation.
However, when I am performing Stress Test on the server with JMeter (maybe like submitting 60 requests in 60seconds, each with some steps of processing, so probably there may be around 8-10 on-going requests running at the same time in JMeter), I find that there are often a few requests failing to be received on the web service side, and hence no response to the client.
I have checked with the thread dump, and find that the server has submitted the request to web service, but the web service seems doesn't receive the request nor do any action after the action is passed through the filter.
Here are some part of the thread dump.
"default task-26" #298 prio=5 os_prio=0 tid=0x000000001a497000 nid=0xd3c runnable [0x000000001fa7d000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(Native Method)
at sun.nio.ch.WindowsSelectorImpl$SubSelector.poll(WindowsSelectorImpl.java:296)
at sun.nio.ch.WindowsSelectorImpl$SubSelector.access$400(WindowsSelectorImpl.java:278)
at sun.nio.ch.WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:159)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000fde172b0> (a sun.nio.ch.Util$2)
- locked <0x00000000fde172a0> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000fde17040> (a sun.nio.ch.WindowsSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
at org.xnio.nio.SelectorUtils.await(SelectorUtils.java:46)
at org.xnio.nio.NioSocketConduit.awaitReadable(NioSocketConduit.java:345)
at org.xnio.conduits.AbstractSourceConduit.awaitReadable(AbstractSourceConduit.java:66)
at io.undertow.conduits.ReadDataStreamSourceConduit.awaitReadable(ReadDataStreamSourceConduit.java:101)
at io.undertow.conduits.FixedLengthStreamSourceConduit.awaitReadable(FixedLengthStreamSourceConduit.java:272)
at org.xnio.conduits.ConduitStreamSourceChannel.awaitReadable(ConduitStreamSourceChannel.java:151)
at io.undertow.channels.DetachableStreamSourceChannel.awaitReadable(DetachableStreamSourceChannel.java:77)
at io.undertow.server.HttpServerExchange$ReadDispatchChannel.awaitReadable(HttpServerExchange.java:1997)
at org.xnio.channels.Channels.readBlocking(Channels.java:295)
at io.undertow.servlet.spec.ServletInputStreamImpl.readIntoBuffer(ServletInputStreamImpl.java:170)
at io.undertow.servlet.spec.ServletInputStreamImpl.read(ServletInputStreamImpl.java:146)
at io.undertow.servlet.spec.ServletInputStreamImpl.read(ServletInputStreamImpl.java:133)
at org.jboss.resteasy.plugins.providers.ProviderHelper.writeTo(ProviderHelper.java:124)
at org.jboss.resteasy.plugins.providers.FileProvider.readFrom(FileProvider.java:88)
at org.jboss.resteasy.plugins.providers.FileProvider.readFrom(FileProvider.java:34)
at org.jboss.resteasy.core.interception.AbstractReaderInterceptorContext.readFrom(AbstractReaderInterceptorContext.java:59)
at org.jboss.resteasy.core.interception.ServerReaderInterceptorContext.readFrom(ServerReaderInterceptorContext.java:62)
at org.jboss.resteasy.core.interception.AbstractReaderInterceptorContext.proceed(AbstractReaderInterceptorContext.java:51)
at org.jboss.resteasy.security.doseta.DigitalVerificationInterceptor.aroundReadFrom(DigitalVerificationInterceptor.java:32)
at org.jboss.resteasy.core.interception.AbstractReaderInterceptorContext.proceed(AbstractReaderInterceptorContext.java:53)
at org.jboss.resteasy.plugins.interceptors.encoding.GZIPDecodingInterceptor.aroundReadFrom(GZIPDecodingInterceptor.java:59)
at org.jboss.resteasy.core.interception.AbstractReaderInterceptorContext.proceed(AbstractReaderInterceptorContext.java:53)
at org.jboss.resteasy.core.MessageBodyParameterInjector.inject(MessageBodyParameterInjector.java:150)
at org.jboss.resteasy.core.MethodInjectorImpl.injectArguments(MethodInjectorImpl.java:89)
at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:112)
at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:296)
at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:250)
at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:237)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:356)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:179)
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:220)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:86)
at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:130)
at io.undertow.websockets.jsr.JsrWebSocketFilter.doFilter(JsrWebSocketFilter.java:151)
at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:60)
at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:132)
at com.mypackage.web.filter.AccessFilter.doFilter(AccessFilter.java:54)
Locked ownable synchronizers:
- <0x0000000084370a18> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"default task-23" #295 prio=5 os_prio=0 tid=0x000000001a494800 nid=0x3d8 runnable [0x000000001dfac000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at com.mypackage.WsLib.callUriWs(WsLib.java:241)
Locked ownable synchronizers:
- <0x0000000084363dd8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
Please advise if there is anything I can do to solve the issue or to check with the issue.

Microsoft Redis Session State Provider on AWS issue

I'm building a web farm environment for an ecommerce package. The package runs on .NET + AWS.
The package gives the ability to store session state in Redis Cache. The problem is that when I use Amazon Redis (v 2.82 I think) I get connectivity exceptions when trying to use the cache. No such issues on Azure.
the only difference that I see between the configs is that azure requires password whereas AWS does not.
System.NullReferenceException: Object reference not set to an instance of an object.
at StackExchange.Redis.ServerEndPoint.get_LastException()
at StackExchange.Redis.ExceptionFactory.GetServerSnapshotInnerExceptions(ServerEndPoint[] serverSnapshot)
at StackExchange.Redis.ExceptionFactory.NoConnectionAvailable(Boolean includeDetail, RedisCommand command, Message message, ServerEndPoint server, ServerEndPoint[] serverSnapshot)
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags)
at Microsoft.Web.Redis.StackExchangeClientConnection.<>c__DisplayClass4.<Eval>b__3()
at Microsoft.Web.Redis.StackExchangeClientConnection.RetryForScriptNotFound(Func`1 redisOperation)
at Microsoft.Web.Redis.StackExchangeClientConnection.RetryLogic(Func`1 redisOperation)
at Microsoft.Web.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs)
at Microsoft.Web.Redis.RedisConnectionWrapper.Set(ISessionStateItemCollection data, Int32 sessionTimeout)
at Microsoft.Web.Redis.RedisSessionStateProvider.SetAndReleaseItemExclusive(HttpContext context, String id, SessionStateStoreData item, Object lockId, Boolean newItem)

Queued Emails Fail in AWS VPC Environment

I have a system that sends out emails and queues them if volume increases.
If it's 1 or 2 emails per hour, it sends out fine.
On busier times, the queue gets backed up and soon email sending starts to fail.
I've load-tested the SMTP server from my local - no problems there.
However, when I try to do the load-test from my EC2 server
- which is in a private subnet of a VPC
- where outgoing traffic goes through a NAT
- and where we've set up our own DNS server
the first few go out fine - and the rest starts to throw errors.
Here's the C# exception ToString()
System.Net.Mail.SmtpException: Failure sending mail. ---> System.Net.WebExcept
ion: Unable to connect to the remote server ---> System.Net.Sockets.SocketExcept
ion: A connection attempt failed because the connected party did not properly re
spond after a period of time, or established connection failed because connected
host has failed to respond [SMTPServerIP]:25
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddre
ss socketAddress)
at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Sock
et s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state,
IAsyncResult asyncResult, Exception& exception)
--- End of inner exception stack trace ---
at System.Net.ServicePoint.GetConnection(PooledStream PooledStream, Object ow
ner, Boolean async, IPAddress& address, Socket& abortSocket, Socket& abortSocket
6)
at System.Net.PooledStream.Activate(Object owningObject, Boolean async, Gener
alAsyncDelegate asyncCallback)
at System.Net.PooledStream.Activate(Object owningObject, GeneralAsyncDelegate
asyncCallback)
at System.Net.ConnectionPool.GetConnection(Object owningObject, GeneralAsyncD
elegate asyncCallback, Int32 creationTimeout)
at System.Net.Mail.SmtpConnection.GetConnection(ServicePoint servicePoint)
at System.Net.Mail.SmtpTransport.GetConnection(ServicePoint servicePoint)
at System.Net.Mail.SmtpClient.GetConnection()
at System.Net.Mail.SmtpClient.Send(MailMessage message)
--- End of inner exception stack trace ---
at System.Net.Mail.SmtpClient.Send(MailMessage message)
at ConsoleApplication2.Program.Main(String[] args) in c:\ConsoleApplication2\
Program.cs:line 49
I'm pretty sure it's not a code issue - most likely a setting on the NAS or DNS.
Can someone point me in the right direction?
After some more googling - it seems to be a throttle on AWS' end
Source:
http://aws.amazon.com/ec2/faqs/
Q: Are there any limitations in sending email from EC2 instances?
Yes. In order to maintain the quality of EC2 addresses for sending email, we enforce default limits on the amount of email that can be sent from EC2 accounts. If you wish to send larger amounts of email from EC2, you can apply to have these limits removed from your account by filling out this form .
Better details here:
Amazon EC2/SES SMTP Timeout

web service best practice - server timeout longer than http client timeout

I am trying to build a web service on top of hbase, so the code looks roughly like:
#GET
#Path("/blabla")
#Override
public List<String> getEvents($$$params$$$) {
......
//calling hbase query the events
......
}
When Hbase service is down, the hbase Java API keeps retrying to connect to Hbase region server util eventually it times out and throws a RT Exception:
NoServerForRegionException: Unable to find region for event,,99999999999999 after 10 tries.
The logic has no problem, my issue here is that the HttpClient times out way before hbase times out the retries. Then my web service API consumer gets no response, ugly.
Question -
What's the best practice here if you have server's timeout potentially longer than the http connection itself? How to have the web service respond to client gracefully in this case?
set the cashing for you scan object to some reasonable value. another thing, since you are using a web service to show the results to your users, i am assuming that you must be showing only a few rows(or records) at a time. you can use Hbase PageFilter so that you get only a specified no of rows each time and don't have to wait to get all the rows in one shot.