I'm trying to get stats for my celery Que (rabbitmq). I'm using celery.app.control.Inspect().stats() API. I'm doing this on a web server, I can get the stats only one time. If I refresh the page I'm getting "[Errno 104] Connection reset by peer" Error. how can I deal with this.
/init.py
celtasks = Celery(app.name,"rabbit mq url")
/helpers.py
get_stats():
stats = celtasks.control.Inspect().stats()
return stats
whenever there is a request "get_stats" function is hit. It is only working for the first request after this, it says connection reset by peer error.
If I go by connection has been reset and try to create the connection again, I get error
updated /helpers.py
get_stats():
celtasks = Celery(app.name,"rabbit mq url")
stats = celtasks.control.Inspect().stats()
return stats
Rabbitmq logs
=WARNING REPORT==== 10-Jul-2017::14:11:54 ===
closing AMQP connection <0.29185.6> (10.246.170.70:48618 -> 10.24.83.115:5672):
connection_closed_abruptly
=WARNING REPORT==== 10-Jul-2017::14:11:54 ===
closing AMQP connection <0.29197.6> (10.246.170.70:48620 -> 10.24.83.115:5672):
connection_closed_abruptly
"rabbit#oser000300.log-20170625" 9054L, 361662C
AT most times , CONNECTION RESET BY PEER is because the server close the connection itself, however the client does not know . When client want to communicate to sever through this broke connection, it receive this ERROR. In your case , maybe the hang time (time interval between two stats()) is too long, and server think this connection is useless and close it .
We have a small testing Kubernetes cluster running on AWS, using CoreOS, as per the instructions here. Currently this consists of only a master and a worker node. In the past couple of weeks we've been running this cluster we've noticed that the worker instance occasionally fails. The first time this happened the instance was subsequently killed and restarted by the auto-scaling group it is in. Today the same thing happened, but we were able to login to the instance before it was shut down and retrieve some information, but it remains unclear to me exactly what has caused this problem.
The node failure seems to happen on an irregular basis, and there is no evidence that there is anything abnormal happening which would precipitate this (external load etc).
Subsquent to the failure (kubernetes node status Not Ready) the instance was still running, but had inactive kubelet and docker services (start failed with result 'dependency'). The flanneld service was running, but with a restart time after the time the node failure was seen.
Logs from around the time of the node failure don't seem to show anything clearly pointing to a cause of the failure. There's a couple of kubelet-wrapper errors at about the time the failure was seen:
`Jul 22 07:25:33 ip-10-0-0-92.ec2.internal kubelet-wrapper[1204]: E0722 07:25:33.121506 1204 kubelet.go:2745] Error updating node status, will retry: nodes "ip-10-0-0-92.ec2.internal" cannot be updated: the object has been modified; please apply your changes to the latest version and try again`
`Jul 22 07:25:34 ip-10-0-0-92.ec2.internal kubelet-wrapper[1204]: E0722 07:25:34.557047 1204 event.go:193] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"ip-10-0-0-92.ec2.internal.1462693ef85b56d8", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"4687622", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-0-92.ec2.internal", UID:"ip-10-0-0-92.ec2.internal", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientDisk", Message:"Node ip-10-0-0-92.ec2.internal status is now: NodeHasSufficientDisk", Source:api.EventSource{Component:"kubelet", Host:"ip-10-0-0-92.ec2.internal"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63604448947, nsec:0, loc:(*time.Location)(0x3b1a5c0)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63604769134, nsec:388015022, loc:(*time.Location)(0x3b1a5c0)}}, Count:2, Type:"Normal"}': 'events "ip-10-0-0-92.ec2.internal.1462693ef85b56d8" not found' (will not retry!)
Jul 22 07:25:34 ip-10-0-0-92.ec2.internal kubelet-wrapper[1204]: E0722 07:25:34.560636 1204 event.go:193] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"ip-10-0-0-92.ec2.internal.14626941554cc358", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"4687645", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-0-92.ec2.internal", UID:"ip-10-0-0-92.ec2.internal", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeReady", Message:"Node ip-10-0-0-92.ec2.internal status is now: NodeReady", Source:api.EventSource{Component:"kubelet", Host:"ip-10-0-0-92.ec2.internal"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63604448957, nsec:0, loc:(*time.Location)(0x3b1a5c0)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63604769134, nsec:388022975, loc:(*time.Location)(0x3b1a5c0)}}, Count:2, Type:"Normal"}': 'events "ip-10-0-0-92.ec2.internal.14626941554cc358" not found' (will not retry!)`
followed by what looks like some etcd errors:
`Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,721 [WARNING][1305/140149086452400] calico.etcddriver.driver 810: etcd watch returned bad HTTP status topoll on index 5237916: 400
Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,721 [ERROR][1305/140149086452400] calico.etcddriver.driver 852: Error from etcd for index 5237916: {u'errorCode': 401, u'index': 5239005, u'message': u'The event in requested index is outdated and cleared', u'cause': u'the requested history has been cleared [5238006/5237916]'}; triggering a resync.
Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,721 [INFO][1305/140149086452400] calico.etcddriver.driver 916: STAT: Final watcher etcd response time: 0 in 630.6s (0.000/s) min=0.000ms mean=0.000ms max=0.000ms
Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,721 [INFO][1305/140149086452400] calico.etcddriver.driver 916: STAT: Final watcher processing time: 7 in 630.6s (0.011/s) min=90066.312ms mean=90078.569ms max=90092.505ms
Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,721 [INFO][1305/140149086452400] calico.etcddriver.driver 919: Watcher thread finished. Signalled to resync thread. Was at index 5237916. Queue length is 1.
Jul 22 07:27:04 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:27:04,743 [WARNING][1305/140149192694448] calico.etcddriver.driver 291: Watcher died; resyncing.`
and a few minutes later a large number of failed connections to the master (10.0.0.50):
`Jul 22 07:36:41 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:36:37,641 [WARNING][1305/140149086452400] urllib3.connectionpool 647: Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7700b85b90>: Failed to establish a new connection: [Errno 113] Host is unreachable',)': http://10.0.0.50:2379/v2/keys/calico/v1?waitIndex=5239006&recursive=true&wait=true
Jul 22 07:36:41 ip-10-0-0-92.ec2.internal rkt[1214]: 2016-07-22 07:36:37,641 [INFO][1305/140149086452400] urllib3.connectionpool 213: Starting new HTTP connection (2): 10.0.0.50`
Although these errors are presumably related to the node/instance failure, these don't really mean a lot to me, and certainly don't seem to suggest the underlying cause - but if anyone can see anything here that would suggest a possible cause of the node/instance failure (and how we can go about rectifying this) that would be greatly appreciated!
Something in your description and log confuse me, you said that you use docker runtime which there is rkt in your log; you said that you use flannel in your cluster which there is calico in your log...
Anyway, from the log you provide, it's more like your etcd is down... which makes kubelet and calico can't update their state, and apiserver will regard they are down. There is not enough information here, I could only suggest that you need to backup etcd's log next time you see this...
Another suggestion is that better not use the same etcd for both kubenetes cluster and calico...
I connected to a remote gremlin server via gremlin groovy shell. Connection succeeded. But for any remote command I try to execute it gives timeout error. Even for command :> 1+1
gremlin> :remote connect tinkerpop.server conf/senthil.yaml
==>Connected - 10.40.40.65/10.40.40.65:50080
gremlin> :> 1+1
Host did not respond in a timely fashion - check the server status and submit again.
Display stack trace? [yN]
org.apache.tinkerpop.gremlin.groovy.plugin.RemoteException: Host did not respond in a timely fashion - check the server status and submit again.
at org.apache.tinkerpop.gremlin.console.groovy.plugin.DriverRemoteAcceptor.submit(DriverRemoteAcceptor.java:120)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
at org.apache.tinkerpop.gremlin.console.commands.SubmitCommand.execute(SubmitCommand.groovy:41)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
at org.codehaus.groovy.tools.shell.Shell.execute(Shell.groovy:101)
at org.codehaus.groovy.tools.shell.Groovysh.super$2$execute(Groovysh.groovy)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
This is my conf file: remote.yaml
hosts: [10.40.40.65]
port: 50080
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
I'm using dynamodb + titan.
You might not have a truly successful connection. The console (and underlying driver) is optimistic in that it really doesn't fail a connection until a request is sent as it expects the server may come online "later". I would go back to investigating if the server is running, if you have the right IP, if the host property is not set to something like "localhost" if you are connecting remotely, if the port is open, that you are using a compatible version of TinkerPop, etc.
I am using openshift-django17 to bootstrap my application on Openshift. Before I moved to Django 1.7, I was using authors previous repository for openshift-django16 and I did not have the problem which I will describe next. After running successfully for approximately 6h I get the following error:
Service Temporarily Unavailable The server is temporarily unable to
service your request due to maintenance downtime or capacity problems.
Please try again later.
After I restart the application it works without any problem for some hours, then I get this error again. Now gears should never enter idle mode, as I am posting some data every 5 minutes through RESTful POST API from outside of the app. I have run rhc tail command and I think the error lies in HAproxy:
==> app-root/logs/haproxy.log <== [WARNING] 081/155915 (497777) : config : log format ignored for proxy 'express' since it has no log
address. [WARNING] 081/155915 (497777) : Server express/local-gear is
DOWN, reason: Layer 4 connection problem, info: "Connection refused",
check duration: 0ms. 0 active and 0 backup servers left. 0 sessions
active, 0 requeued, 0 remaining in queue. [ALERT] 081/155915 (497777)
: proxy 'express' has no server available! [WARNING] 081/155948
(497777) : Server express/local-gear is UP, reason: Layer7 check
passed, code: 200, info: "HTTP status check returned code 200", ch eck
duration: 11ms. 1 active and 0 backup servers online. 0 sessions
requeued, 0 total in queue. [WARNING] 081/170359 (127633) : config :
log format ignored for proxy 'stats' si nce it has no log address.
[WARNING] 081/170359 (127633) : config : log format ignored for proxy
'express' since it has no log address. [WARNING] 081/170359 (497777) :
Stopping proxy stats in 0 ms. [WARNING] 081/170359 (497777) : Stopping
proxy express in 0 ms. [WARNING] 081/170359 (497777) : Proxy stats
stopped (FE: 1 conns, BE: 0 conns). [WARNING] 081/170359 (497777) :
Proxy express stopped (FE: 206 conns, BE: 312 co
I also run some CRON job once a day, but I am 99% sure it does not have to do anything with this. It looks like a problem on Openshift side, right? I have posted this issue on the github of the authors repository, where he suggested I try stackoverflow.
It turned out this was due to a bug in openshift-django17 setting DEBUG in settings.py to True even though it was specified in environment variables as False (pull request for fix here). The reason 503 Service Temporarily Unavailable appeared was because of Openshift memory limit violations due to DEBUG being turned ON as stated in Django settings documentation for DEBUG:
It is also important to remember that when running with DEBUG turned on, Django will remember every SQL query it executes. This is useful when you’re debugging, but it’ll rapidly consume memory on a production server.
Where can I go look to find the source of a connection reset error? Here are the details:
I have a Clojure applet that uses clj-http.client.
I need to track down what is sending the following error
Feb 14, 2013 5:16:04 PM
org.apache.http.impl.client.DefaultRequestDirector execute
INFO: I/O exception (java.net.SocketException)
caught when processing request: Connection reset
Feb 14, 2013 5:16:04 PM
org.apache.http.impl.client.DefaultRequestDirector execute
INFO: Retrying request
We have looked through the server's IIS logs, and cannot find any error indicating a connection reset. We've also looked at the server's Event Logs, and cannot find an error that matches the error I'm getting in the client. As a matter of fact, the IIS logs look OK. I can see my address verification "GET" requests right in the log.
It's just a guess, though I often get that error message when the web server is configured to respond to the wrong host name. If it is serving for www.example.com/my/service and I open a connection to 1.2.3.4/my/service then it hangs up with "connection reset".