How to debug "could not receive data from client: Connection reset by peer" - django

I'm running a django-celery application on Ubuntu-12.04.
When I run a celery task from my web interface, I get the following error, taken form postgresql-9.3 logfile (maximum level of log):
2013-11-12 13:57:01 GMT tss_usr 8113 LOG: could not receive data from client: Connection reset by peer
tss_usr is the postgresql user of the django application database and (in this example) 8113 is the pid of the process who killed the connection, I guess.
Have you got any idea on why this happens or at least how to debug this issue?
To make things work again I need to restart postgresql which is extremely uncomfortable.

I know this is an older post, but I just found it because I had the same error today in my postgres logs. I narrowed it down to a PDO select statement. I'm using Zend Framework 1.10.3 on Ubuntu Precise.
The following pdo statement generated an error if $opinion is a long text string. The column opinion is type Text in my postgres table. The query succeeds if $opinion is under a certain number of characters. 1000 characters works fine. 2000 characters fails with "could not receive data from client: Connection reset by peer".
$select = $this->db->select()
->from( 'datauserstopics' )
->where("opinion = ?",trim($opinion))
->where("datatopicsid = ?",trim($tid))
->where("datausersid= ?",$datausersid);
$stmt = $this->db->query($select);
I circumvented the problem by using:
->where("substr(opinion,1,100) = ?",trim(substr($opinion,1,100)))
This is not a perfect solution, but for my purposes, the select statement using substr() suffices.
Note that I have no problem inserting long strings into the same table/column. The disconnect problem only appears for me on the PDO select with relatively long text strings.

I'm getting it in 2017 with 9.4, I have no text fields, don't know what a PDO is. My select statement is about 50 bytes long, I'm trying to fetch an int4 and a double precision. I suspect the error message can mean multiple things.
I've since found https://dba.stackexchange.com/questions/142350/postgres-could-not-receive-data-from-client-connection-reset-by-peer which indicates it could be a problem with the client configuration. My client is libpg and PQconnectdb() is giving me a CONNECTION_OK return. It works at least partly.

For me, restarting the hypervisor where both the Postgres and the application using it helped. I've seen stack traces in dmesg before, though.

Related

Poco::Data::MySQL 'Got packets out of order' error

I got an ER_NET_PACKETS_OUT_OF_ORDER error when running a multithreaded C++ app using Poco::Data::MySQL and Poco::Data::SessionPool. The error message looks like this:
MySQL: [MySQL]: [Comment]: mysql_stmt_prepare error [mysql_stmt_error]: Got packets out of order [mysql_stmt_errno]: 1156 [mysql_stmt_sqlstate]: 08S01 [statemnt]: ...
The app is making queries from multiple threads every 100ms. The connections are provided by a common SessionPool.
I got around this problem by adding reset=true to the connection string. However, as stated in the official docs, adding this option may result in problems with encoding.

Twisted ssh - session execCommand implementation

Good day. I apologize for asking for obvious things because I'm writing in PHP and I know Python at the level "I started learning this yesterday". I've already spent a few days on this - but to no avail.
I downloaded twisted example of the SSH server for version 20.3 from here https://docs.twistedmatrix.com/en/twisted-20.3.0/conch/examples/. Line 162 has an execCommand method that I need to implement to make it work. Then I noticed a comment in this method "We don't support command execution sessions". Therefore, the question: Is this comment apply only to the example, or twisted library entirely. Ie, is it possible to implement this method to make the example server will work as I need?
More information. I don't think that this info is required to answer my questions above.
Why do I need it? I'm trying to compile an environment for writing functional (!) tests (there would be no such problems with the unit tests, I guess). Our API uses the SSH client (phpseclib / SSH2) by 30%+ of endpoints. Whatever I do, I had only 3 options of the results depending on how did I implement this method: (result: success, response: "" - empty; result: success, response: "1"; result: failed, response: "Unable to fulfill channel request at… SSH2.php:3853"). Those were for an SSH2 Client. If the error occurs (3rd case), the server shows logs in the terminal:
[SSHServerTransport, 0,127.0.0.1] Got remote error, code 11 reason: ""
[SSHServerTransport, 0,127.0.0.1] connection lost
I just found this works:
def execCommand(self, protocol, cmd):
protocol.write('Some text to return')
protocol.session.conn.sendEOF(protocol.session)
If I don't send EOF the client throws a timeout error.

Poloniex & websockets

===SIMPLE & SHORT===
Does anybody have working application that talks with Poloniex through WAMP in these days (January, 2018)?
===MORE SPECIFIC===
I used several info sources to make it work using combo: autobahn-cpp & C++. Windows 10 OS.
I was able to connect to wss://api.poloniex.com, realm1. Plus I was able to subscribe and get subscription ID. But I never got any events even when everything established.
===RESEARCH===
During research in the web I saw a lot of controversial information:
1. Claims, that wss://api2.poloniex.com should be used, and channels names are actually numbers - How to connect to poloniex.com websocket api using a python library
2. This answer gave me base code, but I am getting anything more than just connections, also by following this answer - wss://api.poloniex.com is correct address - Connecting to Poloniex Push-API
3. I saw post (sorry, lost the link), there were comments made that websockets implementation are basically broken on poloniex. They were posted 6 months ago.
===SPECS===
1. Windows 10
2. Autobahn-Cpp
3. wss://api.poloniex.com:443 ; realm1
4. Different subscriptions: ticker, BTC_ETH, 148, 1002, etc..
5. Source code I got from here
===WILL HELP AS WELL===
Is there any way to get all valid subscriptions or, probably, those, that have more than 0 subscribers? I mean, does WAMP have a way to do that?
Is there any known issues with Autobahn-Cpp and poloniex combo?
Is there any simpler way to test WAMP elsewhere to make sure Autobahn isn't a problem? Like any other well documented & supported online projects that accept WAMP websocket communication?
I can receive the correct tick order book data from wss://api2.poloniex.com use python3
but sometime The channel 1002 may stop sending the new tick info.
wss://api.poloniex.com:443 ; realm1
This may be the issue as I've been using api2 and here is the code that works, and has been working for the past 2 quarters non-stop. Its in python, but should be easy enough to port to C++.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import websocket
import json
def on_error(ws, error):
print(error)
def on_close(ws):
print("### closed ###")
connection.close()
def on_open(ws):
print("ONOPEN")
ws.send(json.dumps({'command':'subscribe','channel':'BTC_ETH'}))
def on_message(ws, message):
message = json.loads(message)
print(message)
websocket.enableTrace(True)
ws = websocket.WebSocketApp("wss://api2.poloniex.com/",
on_message = on_message,
on_error = on_error,
on_close = on_close)
ws.on_open = on_open
ws.run_forever()
the code is pretty much self-explanatory (You can check all channels/pairs on Poloniex API website), just save it and run in terminal
python3 fileName.py
should provide You with BTCETH raw stream of orders and trades on console output.
Playing with the message/subscriptions You can then do as You please with it.
It seems that websockets in Poloniex are unstable. Therefore I can stop my attempts make Autobahn-Cpp work with it at least by now and move on.

Jetty 8.1 flooding the log file with "Dispatched Failed" messages

We are using Jetty 8.1 as an embedded HTTP server. Under overload conditions the server sometimes starts flooding the log file with these messages:
warn: java.util.concurrent.RejectedExecutionException
warn: Dispatched Failed! SCEP#76107610{l(...)<->r(...),d=false,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=1r}...
The same message is repeated thousands of times, and the amount of logging appears to slow down the whole system. The messages itself are fine, our request handler ist just to slow to process the requests in time. But the huge number of repeated messages makes things actually worse and makes it more difficult for the system to recover from the overload.
So, my question is: is this a normal behaviour, or are we doing something wrong?
Here is how we set up the server:
Server server = new Server();
SelectChannelConnector connector = new SelectChannelConnector();
connector.setAcceptQueueSize( 10 );
server.setConnectors( new Connector[]{ connector } );
server.setThreadPool( new ExecutorThreadPool( 32, 32, 60, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>( 10 )));
The SelectChannelEndPoint is the origin of this log message.
To not see it, just set your named logger of org.eclipse.jetty.io.nio.SelectChannelEndPoint to LEVEL=OFF.
Now as for why you see it, that is more interesting to the developers of Jetty. Can you detail what specific version of Jetty you are using and also what specific JVM you are using?

Debugging livelock in Django/Postgresql

I run a moderately popular web app on Django with Apache2, mod_python, and PostgreSQL 8.3 with the postgresql_psycopg2 database backend. I'm experiencing occasional livelock, identifiable when an apache2 process continually consumes 99% of CPU for several minutes or more.
I did an strace -ppid on the apache2 process, and found that it was continually repeating these system calls:
sendto(25, "Q\0\0\0SSELECT (1) AS \"a\" FROM \"account_profile\" WHERE \"account_profile\".\"id\" = 66201 \0", 84, 0, NULL, 0) = 84
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
poll([{fd=25, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recvfrom(25, "E\0\0\0\210SERROR\0C25P02\0Mcurrent transaction is aborted, commands ignored until end of transaction block\0Fpostgres.c\0L906\0Rexec_simple_query\0\0Z\0\0\0\5E", 16384, 0, NULL, NULL) = 143
This exact fragment repeats continually in the trace, and was running for over 10 minutes before I finally killed the apache2 process. (Note: I edited this to replace my previous strace fragment with a new one that shows full the full string contents rather than truncated.)
My interpretation of the above is that django is attempting to do an existence check on my table account_profile, but at some earlier point (before I started the trace) something went wrong (SQL parse error? referential integrity or uniqueness constraint violation? who knows?), and now Postgresql is returning the error "current transaction is aborted". For some reason, instead of raising an Exception and giving up, it just keeps retrying.
One possibility is that this is being triggered in a call to Profile.objects.get_or_create. This is the model class that maps to the account_profile table. Perhaps there is something in get_or_create that is designed to catch too broad a set of exceptions and retry? From the web server logs, it appears that this livelock might have occurred as a result of a double-click on the POST button in my site's registration form.
This condition has occurred a couple of times over the past few days on the live site, and results in a significant slowdown until I intervene, so pretty much anything other than infinite deadlock would be an improvement! :)
This turned out to be entirely my fault. I found the spot where the select (1) as 'a' statement seemed to originate (in django/models/base.py) and hacked it to log a traceback, which pointed clearly at my code.
I had some code that makes up a unique email "key" for each Profile. These keys are randomly generated, so because there is some possibility of overlap, I run it in a try/except within a while loop. My assumption was that the database's unique constraint would cause the save to fail if the key was not unique, and I'd be able to try again.
Unfortunately, in Postgresql you cannot simply try again after an integrity error. You have to issue a COMMIT or ROLLBACK command (even if you're in autocommit mode, apparently) before you can try again. So I had an infinite loop of failing save attempts where I was ignoring the error message.
Now I look for a more specific exception (django.db.IntegrityError) and run a limited number of attempts so that the loop is not infinite.
Thanks to everyone for viewing/answering.
Your analysis sounds pretty good. Clearly it's not picking up the fact that the transaction is aborted. I suggest you report this as a bug to the django project...