I fear that a client loses connection abruptly and get disconnected without leaving any info to the backend. If this happens, will the socket keep active?
I have found the answer in a issue on the django channels repository:
Using websocket auto-ping to periodically assert clients are still connected to sockets, and cutting them loose if not. This is a feature that's already in daphne master if you want to try it, and which is coming in the next release; the ping interval and time-till-considered-dead are both configurable. This solves the (more common) clients-disconnecting-uncleanly issue.
Storing the timestamp of the last seen action from a client in a database, and then pruning ones you haven't heard of in a while. This is the only approach that will also get round disconnect not being sent because a Daphne server was killed, but it's more specialised, so Channels can't implement it directly.
Link to the issue on github
Related
From 31th March I've got following error in Google Cloud SQL:
Got an error reading communication packets.
I have been using Google Cloud SQL for 2 years, but never faced with such problem.
I'm very worried about it.
This is detail error message:
textPayload: "2019-04-29T17:21:26.007574Z 203385 [Note] Aborted connection 203385 to db: {db_name} user: {db_username} host: 'cloudsqlproxy~{private ip}' (Got an error reading communication packets)"
While it is true that this error message often occurs after a maintenance period, it isn't necessarily a cause for concern as this is a known behavior by MySQL.
Possible explanations about why this issue is happening are :
The large increase of connection requests to the instance, with the
number of active connections increasing over a short period of time.
The freezing / unavailability of the instance can also occur due to
the burst of connections happening in a very short time interval. It
is observed that this freezing always happens with an increase of
connection requests. This increase in connections causes the
instance to be overloaded and hence unavailable to respond to
further connection requests until the number of connections
decreases or the instance stabilizes.
The server was too busy to accept new connections.
There were high rates of previous connections that were not closed
correctly.
The client terminated it abnormally.
readTimeout setting being set too low in the MySQL driver.
In an excerpt from the documentation, it is stated that:
There are many reasons why a connection attempt might not succeed.
Network communication is never guaranteed, and the database might be
temporarily unable to respond. Make sure your application handles
broken or unsuccessful connections gracefully.
Also a low Cloud SQL Proxy version can be the reason for such
incident issues. Possible upgrade to the latest version of (v1.23.0)
can be a troubleshooting solution.
IP from where you are trying to connect, may not be added to the
Authorized Networks in the Cloud SQL instance.
Some possible workaround for this issue, depending which is your case could be one of the following:
In the case that the issue is related to a high load, you could
retry the connection, using an exponential backoff to prevent
from sending too many simultaneous connection requests. The best
practice here is to exponentially back off your connection requests
and add randomized backoffsto avoid throttling, and potentially
overloading the instance. As a way to mitigate this issue in the
future, it is recommended that connection requests should be
spaced-out to prevent overloading. Although, depending on how you
are connecting to Cloud SQL, exponential backoffs may already be in
use by default with certain ORM packages.
If the issue could be related to an accumulation of long-running
inactive connections, you would be able to know if it is your case
using show full processliston your database looking for
the connections with high Time or connections where Command is
Sleep.
If this is your case you would have a few possible options:
If you are not using a connection pool you could try to update the client application logic to properly close connections immediately at the end of an operation or use a connection pool to limit your connections lifetime. In particular, it is ideal to manage the connection count by using a connection pool. This way unused connections are recycled and also the number of simultaneous connection requests can be limited through the use of the maximum pool size parameter.
If you are using a connecting pool, you could return the idle connections to the pool immediately at the end of an operation and set a shorter timeout by adjusting wait_timeout or interactive_timeoutflag values. Set CloudSQL wait_timeout flag to 600 seconds to force refreshing connections.
To check the network and port connectivity once -
Step 1. Confirm TCP connectivity on port 3306 with tcptraceroute or
netcat.
Step 2. If [Step 1] succeeded then try to check if there are any
errors in using mysql client to check timeout/error.
When the client might be terminating the connection abruptly you
could check for:
If the MySQL client or mysqld server are receiving a packet bigger
than max_allowed_packet bytes, or the client receiving a packet
too large message,if it so you could send smaller packets or
increase the max_allowed_packet flag value on both client
and server. If there are transactions that are not being properly
committed using both "begin" and "commit", there is the need to
update the client application logic to properly commit the
transaction.
There are several utilities that I think will be helpful here,
if you can install mtr and the tcpdump utilities to
monitor the packets during these connection-increasing events.
It is strongly recommended to enable the general_log in the
database flags. Another suggestion is to also enable the slow_query
database flag and output to a file. Also have a look at this
GitHub issue comment and go through the list of additional
solutions proposed for this issue here
This error message indicates a connection issue, either because your application doesn't terminate connections properly or because of a network issue.
As suggested in these troubleshooting steps for MySQL or PostgreSQL instances from the GCP docs, you can start debugging by checking that you follow best practices for managing database connections.
In my application (c++) I have a service exposed as:
grpc foo(stream Request) returns (Reply) { }
The issue is that when the server goes down (CTRL-C) the stream on the client side keeps going indeed the
grpc::ClientWriter::Write
doesn't return false. I can confirm that using netstat I don't see any connection between the client and the server (apart a TIME_WAIT one that after a while goes away) and the client keeps calling that Write without errors.
Is there a way to see if the underlying connection is still up instead to rely on the Write return value ? I use grpc version 1.12
update
I discovered that the underlying channel goes in status IDLE but still the ClientWriter::Write doesn't report the error, I don't know if this is intended. During the streaming I'm trying now to reestablish a connection with the server every time the channel status is not GRPC_CHANNEL_READY
This could happen in a few scenarios but the most common element is a connection issue. We have KEEPALIVE support in gRPC to tackle exactly this issue. For C++, please refer to https://github.com/grpc/grpc/blob/master/doc/keepalive.md on how to set this up. Essentially, endpoints would send pings at certain intervals and expect a reply within a certain timeframe.
It seems with Django Channels each time anything happens on the websocket there is no persistent state. Even within the same websocket connection, you can not preserve anything between each call to receive() on a class based consumer. If it can't be serialized into the channel_session, it can't be stored.
I assumed that the class based consumer would be persisted for the duration of the web socket connection.
What I'm trying to build is a simple terminal emulator, where a shell session would be created when the websocket connects. Read data would be passed as input to the shell and the shell's output would be passed out the websocket.
I can not find a way to persist anything between calls to receive(). It seems like they took all the bad things about HTTP and brought them over to websockets. With each call to conenct(), recieve(), and disconnect() the whole Consumer class is reinstantiated.
So am I missing something obvious. Can I make another thread and have it read from a Group?
Edit: The answers to this can be found in the comments below. You can hack around it. Channels 3.0 will not instantiate the Consumers on every receive call.
The new version of Channels does not have this limitation. Consumers stay in memory for the duration of the websocket request.
I am trying to build a scalable chat server in Clojure. I am using http-kit, compojure and redis pub/sub to communicate between diffrent nodes (fan-out approach). The server will use websockets for connection b/w client-server with a fallback to long polling. A single user can have multiple connections to chat with one connection per tabs in the browser and message should be delivered to the all the connections.
So basically when the user connects I store the channel in a atom with a random uuid as
{:userid1 [{:socketuuid "random uuid#1 for uerid1" :socket "channel#1 for userid1"}
{:socketuuid "random uuid#2" :socket "channel#2"}]
:userid2 [{:socketuuid "random uuid#1 for userid2" :socket "channel#1 for userid2}]}
the message is POSTed to a common route for both websockets and long polling channels, the message structure looks like
{:from "userid1" :to "userid2" :message "message content"}
the server finds all the channels in the atom for the :from and :to user ids and send the message to the connected channels for the respective users, also it publishes the message over the redis server where the connected nodes look for channels stored in their own atom and deliver message to the respective users.
So the problem I am running into is, how to properly implement presence. Basically http-kit send you a status when a channel disconnects the status can be "server-close" or "client-close", while I can handle server disconnects (the client will reconnect automatically) but I am running into problem when the disconnect happens from client side, for eg. the user navigates to another page and will connect after a few seconds. How do I decide that the user has went offline when the client disconnects. Also I am concerned about message arrival b/w reconnects in long polling mode (my long polling timeout is 30 seconds).
Also please suggest a good presence mechanism for the above architecture. Thanks.
Please comment if you need more info. Thanks
Edit #1:
Can you recommend a good tutorial/ material on implementing presence in a chat server, I cant seem to find anything on it.
My current solution -> I am currently maintaining a global count and a last connected timestamp for the connected channels of a particular userid and when a user disconnects the count is decreased, and a timeout is implemented for 10 seconds which will check if the user has reconnected again (i.e. the last connected stamp is 10 seconds old and count is still zero), if not then the user is said to have gone offline, would you recommend this solution, If not why, or any improvements or better methods are appreciated.
Also please note I am using the timer/scheduled-task in http-kit, would these timeout significant performance effects?
There are two different cases here from client side.
Long Polling. I cannot see how this is a problem, if the client window closes, there wont be no polling anymore. One client less which asks for data.
Websockets. There is a close method available in the protocol. The client should send a notification if you implement it correctly. See here: Closing WebSocket correctly (HTML5, Javascript) for instance.
Does that answer your question?
System Background:
Its basically a client/server application. Server is an embedded device and Client is a windows app developed in C++.
Issue: After a runtime of about a week, communication breaks between client/server,
because of this the server is not able to connect back to the client and needs a restart to recover. Looks like System is experiencing Socket re-connection problem. Also The network sometimes experiences intermittent failures.
Abrupt Termination at remote end
Port locking
Want some suggestions on how to cleanup the socket or shutdown cleanly so that re-connection happens properly. Other alternate solutions?
Thanks,
Hussain
It does not sound like you are in a position to easily write a stress test app to reproduce this more quickly out of band, which is what I would normally suggest. A pragmatic solution might be to periodically restart the server and client at a time when you think the system is least busy, or when problems arise. This sounds like cheating but many production systems I have been involved with take this approach to maximize system uptime.
My preferred solution here would be to abstract the server and client socket code (hopefully your design allows this to be done without too much work) and use it to implement client and server test apps that can be used to stress test only the socket code by simulating a lot of normal socket traffic in a short space of time - this helps identify timing windows and edge cases that could cause problems over time, and might speed up the process of obtaining a debuggable repro - you can simulate network error in your test code by dropping the socket on the client or server periodically.
A further step to take on the strategic front would be to ensure that you have good diagnostics in your socket handlers on client and server side. Track socket open and close, with special focus on your socket error and reconnect paths given you know the network is unreliable. Make sure the logs are output sequential with a timestamp. Something as simple as this might quickly show you what error or conditions trigger your problems. You can quickly make sure the logs are correct and complete using the test apps I mentioned above.
One thing you might want to check is that you are not being hit by lack of ability to reuse addresses. Sometimes when a socket gets closed, it cannot be immediately reused for a reconnect attempt as there is still residual activity on one or other end. You may be able to get around this (based on my Windows/Winsock experience) by experimenting with SO_REUSEADDR and SO_LINGER on your sockets. however, my first focus in your case would be on ensuring the socket code on client and server handles all errors and mainline cases correctly, before worrying about this.
A common issue is that when a connection is dropped, it is kept opened by the OS in TIME_WAIT state. If you want to restart the server socket, it will not be able to reopen the same port directly because it is still present for the OS.
To avoid that, you need to set the parameter SO_REUSEADDR so that the OS allows you to reuse the port if it is in TIME_WAIT state for a server socket.
Example:
int optval=1;
// set SO_REUSEADDR on a socket to true (1):
setsockopt(s1, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof optval);
I'm experiencing something similar with encrypted connections. I believe in my case it is because the client dropped the connection and reconnected in less than the 4 minute FIN_WAIT period. The initial connection is recycled (by the os) and the server doesn't see the drop out. The SSL authentication is lost when the client loses connection so the client tries to re-authenticate. This is during what the servers considers the middle of a conversation. The server then hangs up on the client. I think the server ssl code considers this a man in the middle attack or just gets confused and closes the connection.