Why would AWS ALB disconnect when using Socket.io - amazon-web-services

I'm trying to get an ALB/Node.js/socket.io solution working in its simplest form and I'm running into an issue where the handshake disconnects. At the moment, I am intentionally using only one node in the TargetGroup to eliminate variables related to node switching and session stickiness for now.
When connecting directly to the node via my NAT instance, it works fine. The disconnect only happens when going thru the ALB.
Here is what I have set up:
ALB with Listener HTTP 80 -> 8081 (no SSL)
2 AZs, both with routes to internet (as required for ALBs)
One socket.io EC2 node in one of the AZs
Path Pattern for /socket.io/* to socket.io target group (with my one node in it)
Default pattern is also socket.io target group
Stickiness Enabled (should not need for one node, but did it anyway)
Here is what I see in the socket.io node client:
Thu, 22 Dec 2016 20:59:26 GMT socket.io-client:manager opening ws://52.72.198.58
Thu, 22 Dec 2016 20:59:26 GMT engine.io-client:socket creating transport "websocket"
Thu, 22 Dec 2016 20:59:26 GMT engine.io-client:socket setting transport websocket
Thu, 22 Dec 2016 20:59:26 GMT socket.io-client:manager connect attempt will timeout after 20000
Thu, 22 Dec 2016 20:59:26 GMT engine.io-client:socket socket close with reason: "transport close"
And here is what I see on the socket.io node server:
Thu, 22 Dec 2016 20:59:26 GMT socket.io:socket joined room U_qmSv_7gvP_JOFsAAAL
Thu, 22 Dec 2016 20:59:26 GMT socket.io:client client close with reason transport close
Thu, 22 Dec 2016 20:59:26 GMT socket.io:socket closing socket - reason transport close
When I go thru my NAT to the same socket.io ec2 node, it all works with no transport closes.
So somehow the ALB is closing the connection immediately during a successful handshake.
Since it works via the NAT, I think the socket.io node and client are ok. And since I see the DEBUG entries in node, I know the ALB is able to reach the socket.io node ok. And since I only have one single socket.io node, there should be no issues with sessions and stickiness.
What could be contributing to the immediate disconnect when using ALB?
EDIT : I have also found that if the socket.io client making the request to the ELB is on an EC2 node, then it works. This implies something in the network path between the client and the ELB. I've yet to find a case where this works other than when the client is on an EC2. It works everywhere via the NAT, just not via the ELB.

After lots of trial and error, I was able to determine this was due to a specific port range (80-83) in my case for the port the ALB/ELB is listening on. While the HTTP portion of the handshake works, the second TCP upgrade phase disconnects.
There were no restrictions in the VPC related to this port range, so the issue is in the network between my client and the ELB.
In conclusion, the issue is not anything in AWS or how I had set up resources, it lies elsewhere outside AWS. If I find the exact cause I will post a comment back to this answer.

socket = io.connect("https://mywebsite/myroom",{'reconnect':true});
Increased the HeartbeatTimeout and the closeTimeout on initialization
socket.on('connect', function(){
socket.socket.heartbeatTimeout = 500000;
socket.socket.closeTimeout = 500000;
socket.on('disconnect', function() {
socketConnectTimeInterval = setInterval(function () {
socket.socket.reconnect();
if(socket.socket.connected) {
clearInterval(socketConnectTimeInterval);
console.log('reconnected');
location.reload();
}
}, 0);
});
});
Also increase the Idle Timeout on the Load Balancer in AWS
Hopefully that should prevent the timeout issue !

Related

Download Limit of AWS API Gateway

We have service which is used to download time series data from influxdb .We are not manipulating influx response , after updating some meta information , we push the records as such.
So there is no content length attached to response.
Want to give this service via Amazon API Gateway. Is it possible to integrate such a service with API gateway , mainly is there any limit on response size .Service not waiting for whole query results to come , but will API gateway do the same or it will wait for the whole data to be wrote to output stream.
When I tried , I observed content-length header being added by API Gateway.
HTTP/1.1 200 OK
Date: Tue, 26 Apr 2022 06:03:31 GMT
Content-Type: application/json
Content-Length: 3024
Connection: close
x-amzn-RequestId: 41dfebb4-f63e-43bc-bed9-1bdac5759210
X-B3-SpanId: 8322f100475a424a
x-amzn-Remapped-Connection: keep-alive
x-amz-apigw-id: RLKwCFztliAFR2Q=
x-amzn-Remapped-Server: akka-http/10.1.8
X-B3-Sampled: 0
X-B3-ParentSpanId: 43e304282e2f64d1
X-B3-TraceId: d28a4653e7fca23d
x-amzn-Remapped-Date: Tue, 26 Apr 2022 06:03:31 GMT
Is this means that API Gateway waits for whole response/EOF from integration?
If above case is true , then what's the maximum bytes limit api gateway buffer can hold?
Will API Gateway time out , if response from integration is too large or do not end stipulated time ?

boost::asio::connect reports success on wrong subnet

Using Boost v1.74:
int main()
{
auto ctx = boost::asio::io_context{};
auto socket = boost::asio::ip::tcp::socket{ctx};
auto ep = boost::asio::ip::tcp::endpoint{
boost::asio::ip::make_address_v4("192.168.0.52"),
80};
boost::asio::connect(socket, std::array{std::move(ep)});
std::cout << "Success!" << std::endl;
}
The IP address of my machine on my local network is 192.168.0.31/24, and so trying to connect to a non-existent address in the same subnet with the above code gives:
10:24:55: Starting /home/cmannett85/workspace/build-scratch-Desktop-Debug/scratch ...
terminate called after throwing an instance of 'boost::wrapexcept<boost::system::system_error>'
what(): connect: No route to host
10:24:59: The program has unexpectedly finished.
This is all expected. If I change the bottom octet of the subnet in the address (e.g. 192.168.1.52), then the app just waits for a few minutes - presumably because it sent messages to any routers to see if they own the requested subnet. There aren't any routers on my network, so it eventually times out:
10:27:39: Starting /home/cmannett85/workspace/build-scratch-Desktop-Debug/scratch ...
terminate called after throwing an instance of 'boost::wrapexcept<boost::system::system_error>'
what(): connect: Connection timed out
10:29:49: The program has unexpectedly finished.
Again, as expected. If I change the next octet (e.g. 192.167.0.52) instead, I would expect this to behave exactly the same as it is an equally unknown subnet as the previous. But it suceeds!
10:31:22: Starting /home/cmannett85/workspace/build-scratch-Desktop-Debug/scratch ...
Success!
This address is definitely not on my network:
$ ping 192.167.0.52
PING 192.167.0.52 (192.167.0.52) 56(84) bytes of data.
^C
--- 192.167.0.52 ping statistics ---
17 packets transmitted, 0 received, 100% packet loss, time 16368ms
So why is the code reporting that it is connected? And why is changing the second octet different to the third?
Any IP address of the form 192.168.xx.xx is a non-internet-routable network. This means no internet routers will route it. So the only way packets get routed off your subnet is if you configure a route on your own router or host. 192.167.xx.xx is an internet routable network, Presumable there is a host out there on the internet that uses the address you specified. So if you can connect your host to the internet, some internet router will get your packet to the address specified.
It's something related to my VPN. I didn't think it was relevant as the tunnel address is 10.17.0.60/16, but disabling it makes the above code work as expected.
Thanks to a suggestion by #dewaffled, Curl is showing that there is something on the otherside of this connection that is completing the TCP handshake, but after a timeout of a few minutes closes the connection.
$ curl -v http://192.167.0.52
* Trying 192.167.0.52:80...
* Connected to 192.167.0.52 (192.167.0.52) port 80 (#0)
> GET / HTTP/1.1
> Host: 192.167.0.52
> User-Agent: curl/7.74.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
I know nothing about how VPNs work, but I suspect this is an implementation detail of my particular provider. Hopefully this 'answer' will limit confusion for anyone else!

AWS API Gateway Method to Serve static content from S3 Bucket

I want to serve my lambda microservices through API Gateway which seems not to be a big problem.
Every of my microservices has a JSON-Schema specification of the resource provided. Since it is a static file, I would like to serve it from an S3 Bucket
rather than also running a lambda function to serve it.
So while
GET,POST,PUT,DELETE http://api.domain.com/ressources
should be forwarded to a lambda function. I want
GET http://api.domain.com/ressources/schema
to serve my schema.json from S3.
My naive first approach was to setup the resource and methods for "/v1/contracts/schema - GET - Integration Request" and configure it to behave as an HTTP Proxy with endpoint url pointing straight to the contracts JSON-Schema. I get a 500 - Internal Server error.
Execution log for request test-request
Fri Nov 27 09:24:02 UTC 2015 : Starting execution for request: test-invoke-request
Fri Nov 27 09:24:02 UTC 2015 : API Key: test-invoke-api-key
Fri Nov 27 09:24:02 UTC 2015 : Method request path: {}
Fri Nov 27 09:24:02 UTC 2015 : Method request query string: {}
Fri Nov 27 09:24:02 UTC 2015 : Method request headers: {}
Fri Nov 27 09:24:02 UTC 2015 : Method request body before transformations: null
Fri Nov 27 09:24:02 UTC 2015 : Execution failed due to configuration error: Invalid endpoint address
Am I on a complete wrong path or do I just miss some configurations ?
Unfortunately there is a limitation when using TestInvoke with API Gateway proxying to Amazon S3 (and some other AWS services) within the same region. This will not be the case once deployed, but if you want to test from the console you will need to use a bucket in a different region.
We are aware of the issue, but I can't commit to when this issue would be resolved.
In one of my setups I put a CloudFront distribution in front of both an API Gateway and an S3 bucket, which are both configured as origins.
I did mostly it in order to be able to make use of an SSL certificate issued by the AWS Certificate manager, which can only be set on stand-alone CloudFront distributions, and not on API Gateways.
I just had a similar error, but for a totally different reason: if the s3 bucket name contains a period (as in data.example.com or similar), the proxz request will bail out with a ssl certification issue!

server1 instance in websphere shuts down regularly

i have a WSDL web service in the server1 instance of websphere.
this server1 instance shuts down regularly. there are no error logs being generated every time the shutdown occurs.
however, whenever the server1 instance of websphere is started, these errors and exceptions are generated:
The certificate (Owner: "CN=SOAPRequester, OU=TRL, O=IBM, ST=Kanagawa, C=JP") with alias "soaprequester" from keystore "D:\IBM\WEBSPH~1\APPSER~1\etc\ws-security\samples\dsig-sender.ks" has expired: java.security.cert.CertificateExpiredException: NotAfter: Sat Oct 01 19:24:06 CST 2011
The certificate (Owner: "CN=SOAPProvider, OU=TRL, O=IBM, ST=Kanagawa, C=JP") with alias "soapprovider" from keystore "D:\IBM\WEBSPH~1\APPSER~1\etc\ws-security\samples\dsig-receiver.ks" has expired: java.security.cert.CertificateExpiredException: NotAfter: Sat Oct 01 19:30:39 CST 2011
Method createManagedConnctionWithMCWrapper caught an exception during creation of the ManagedConnection for resource jms/BPECF, throwing ResourceAllocationException. Original exception: javax.resource.spi.ResourceAdapterInternalException: createQueueConnection failed
com.ibm.mqservices.MQInternalException: MQJE001: An MQException occurred: Completion Code 2, Reason 2063
MQJE027: Queue manager security exit rejected connection with error code 23
javax.jms.JMSSecurityException: MQJMS2013: invalid security authentication supplied for MQQueueManager
my questions are:
1. is MQ required by the WSDL service?
2. are any of these 5 errors possible for causing the frequent downtimes?
As far as I understand you have WebSphere Process Server configured with WebSphere MQ as message bus.
MQ Queue might be represented as JMS binding in SOAP over JMS configuration. IBM article.
Regarding errors:
First 2 errors are simple - certificates have expired. You should update it.
I assume 3 -5 exception are 1 error - there is answer to this question stackoverflow
2063 is security related problems.

RTSP getting stream data

I have an IP camera which can give me media-data by RTSP.
I develop an application for getting media-data.
I use C++ and Qt3.
I create socket. connect it to my device IP on port=554.
I do first query
SETUP rtsp://192.168.4.160/ufirststream RTSP/1.0\r\n
CSeq: 1\r\n
Transport: RTP/AVP; client_port=554\r\n\r\n
And get an answer:
RTSP/1.0 200 OK
CSeq: 1
Date: Sat, Mar 24 2012 17:24:59 GMT
Transport: RTP/AVP;unicast;destination=192.168.4.186;source=192.168.4.160;client_port=0-1;server_port=2000-2001
Session: 413F4DDB
I parse it for gettin session value, and do next query:
PLAY rtsp://192.168.4.160/ufirststream RTSP/1.0
CSeq: 1
Session: 413F4DDB
And server says:
RTSP/1.0 200 OK
CSeq: 1
Date: Sat, Mar 24 2012 17:25:02 GMT
Session: 413F4DDB
RTP-Info: url=rtsp://192.168.4.160/ufirststream/track1;seq=6716;rtptime=406936711
And how I can get media-data??? I thought that PLAY-method makes server to give me a stream, but it only gives me an url to rtsp and other info...
I need an binary stream from camera, can you give an advise for my next step??
The Transport header of the SETUP request indicates which protocol will be used to send the stream, and the client_port the ports on which your client will be listening.
Try opening 2 consecutive UDP ports and pass that range as client_port=port1-port2 instead of 554. These two ports will be used for the RTP and the RTCP streams (video and control data).
In addition, the RTP port number should be an even number, and the RTCP port the next odd number (See that question if you want the port range to be random rather than user selected).