OpenStack HAproxy issues - django

I am using openshift-django17 to bootstrap my application on Openshift. Before I moved to Django 1.7, I was using authors previous repository for openshift-django16 and I did not have the problem which I will describe next. After running successfully for approximately 6h I get the following error:
Service Temporarily Unavailable The server is temporarily unable to
service your request due to maintenance downtime or capacity problems.
Please try again later.
After I restart the application it works without any problem for some hours, then I get this error again. Now gears should never enter idle mode, as I am posting some data every 5 minutes through RESTful POST API from outside of the app. I have run rhc tail command and I think the error lies in HAproxy:
==> app-root/logs/haproxy.log <== [WARNING] 081/155915 (497777) : config : log format ignored for proxy 'express' since it has no log
address. [WARNING] 081/155915 (497777) : Server express/local-gear is
DOWN, reason: Layer 4 connection problem, info: "Connection refused",
check duration: 0ms. 0 active and 0 backup servers left. 0 sessions
active, 0 requeued, 0 remaining in queue. [ALERT] 081/155915 (497777)
: proxy 'express' has no server available! [WARNING] 081/155948
(497777) : Server express/local-gear is UP, reason: Layer7 check
passed, code: 200, info: "HTTP status check returned code 200", ch eck
duration: 11ms. 1 active and 0 backup servers online. 0 sessions
requeued, 0 total in queue. [WARNING] 081/170359 (127633) : config :
log format ignored for proxy 'stats' si nce it has no log address.
[WARNING] 081/170359 (127633) : config : log format ignored for proxy
'express' since it has no log address. [WARNING] 081/170359 (497777) :
Stopping proxy stats in 0 ms. [WARNING] 081/170359 (497777) : Stopping
proxy express in 0 ms. [WARNING] 081/170359 (497777) : Proxy stats
stopped (FE: 1 conns, BE: 0 conns). [WARNING] 081/170359 (497777) :
Proxy express stopped (FE: 206 conns, BE: 312 co
I also run some CRON job once a day, but I am 99% sure it does not have to do anything with this. It looks like a problem on Openshift side, right? I have posted this issue on the github of the authors repository, where he suggested I try stackoverflow.

It turned out this was due to a bug in openshift-django17 setting DEBUG in settings.py to True even though it was specified in environment variables as False (pull request for fix here). The reason 503 Service Temporarily Unavailable appeared was because of Openshift memory limit violations due to DEBUG being turned ON as stated in Django settings documentation for DEBUG:
It is also important to remember that when running with DEBUG turned on, Django will remember every SQL query it executes. This is useful when you’re debugging, but it’ll rapidly consume memory on a production server.

Related

Cloud Run finishes but Cloud Scheduler thinks that job has failed

I have a Cloud Run service setup and I have a Cloud Scheduler task that calls an endpoint on that service. When the task completes (http handler returns), I'm seeing the following error:
The request failed because the HTTP connection to the instance had an error.
However, the actual handler returns HTTP 200 and successfully exists. Does anyone know what this error means and under what circumstances it shows up?
I'm also attaching a screenshot of the logs.
Does your job take longer than 120 seconds? I was having the same issue and figured out node versions prior to 13 has 120 seconds server.timeout limit. I installed node 13 on docker and problem is gone.
Error 503 is returned by the Google Frontend (GFE). The Cloud Run service either has a transient issue, or the GFE has determined that your service is not ready or not working correctly.
In your log entries, I see a POST request. 7 ms later is the error 503. This tells me your Cloud Run application is not yet ready (in a ready state determined by Cloud Run).
One minute, 8 seconds before, I see ReplaceService. This tells me that your service is not yet in a running state and that if you retry later, you will see success.
I've run an incremental sleep test on my FLASK endpoint which returns 200 within 1 min, 2 min and 10 min of waiting time. Having triggered the endpoint via the Cloud Scheduler, the job failed only in the 10 min test. I've found that it was one of the properties of my Cloud Scheduler job causing the failure. The following solved my issue.
gcloud scheduler jobs describe <my_test_scheduler>
There, you'll see a property called 'attemptDeadline' which was set to 180 seconds by default.
You can update that property using:
gcloud scheduler jobs update http <my_test_scheduler> --attempt-deadline 1000s
Ref: scheduler update

Web Deploy Failed Publish

While I publish in VS its filed with message "Publish Failed".
the Output:
Start Web Deploy Publish the Application/package to https:// -myStite-...
D:\VisualStudio\MSBuild\Microsoft\VisualStudio\v15.0\Web\Microsoft.Web.Publishing.targets(4292,5): Error : Web deployment task failed. ((30/10/2017 08:19:07) An error occurred when the request was processed on the remote computer.)
(30/10/2017 08:19:07) An error occurred when the request was processed on the remote computer.
Value cannot be null.
at System.Version.Parse(String input)
at System.Version..ctor(String version)
at Microsoft.Web.Deployment.DeploymentAgentWorkerRequest.get_MaximumSupportedVersion()
at Microsoft.Web.Deployment.DeploymentAgent.HandleClientServerVersionMismatch(DeploymentAgentWorkerRequest workerRequest)
at Microsoft.Web.Deployment.DeploymentAgent.HandleRequestWorker(DeploymentAgentAsyncData asyncData)
at Microsoft.Web.Deployment.DeploymentAgent.HandleRequest(DeploymentAgentAsyncData asyncData)
at Microsoft.Web.Deployment.DeploymentAgent.BeginProcessRequest(DeploymentAgentWorkerRequest workerRequest, AsyncCallback callback, Object extraData)
Publish failed to deploy.
========= Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
========= Publish: 0 succeeded, 1 failed, 0 skipped ==========
in the EventViwer on the server, this exception:
User: -my user-
Client IP: 10.0.0.138
Content-Type:
Version: 9.0.0.0
MSDeploy.VersionMin:
MSDeploy.VersionMax:
MSDeploy.Method:
MSDeploy.RequestId:
MSDeploy.RequestCulture:
MSDeploy.RequestUICulture:
ServerVersion: 9.0.1955.0
A tracing deployment agent exception occurred that was propagated to the client. Request ID ''. Request Timestamp: '30/10/2017 08:02:11'. Error Details:
System.ArgumentNullException: Value cannot be null.
Parameter name: input
at System.Version.Parse(String input)
at System.Version..ctor(String version)
at Microsoft.Web.Deployment.DeploymentAgentWorkerRequest.get_MaximumSupportedVersion()
at Microsoft.Web.Deployment.DeploymentAgent.HandleClientServerVersionMismatch(DeploymentAgentWorkerRequest workerRequest)
at Microsoft.Web.Deployment.DeploymentAgent.HandleRequestWorker(DeploymentAgentAsyncData asyncData)
at Microsoft.Web.Deployment.DeploymentAgent.HandleRequest(DeploymentAgentAsyncData asyncData)
In the past he worked very well (same saved configuration), and now it fails sometimes.
I had the exact same problem and finally solved it after hours of experimenting. The problem was being caused by my ISP blocking/interfering with just that type of traffic. Everything else seemed to work on the ISP, but not publishing my website from Visual Studio (MsDeploy). Fortunately I have other ISPs providing broadband at my home so I was able to connect to a different ISP immediately and then publish successfully.
The problem occurred on the mobile broadband service provider Three.co.uk. Other broadband providers that I tried have not had this problem.
It was the following discussion (I translated it from Chinese) which made me realise that my ISP was causing the issue.
https://t.codebug.vip/questions-453963.htm
There is post there from someone with the exact same problem, who has a web server behind CloudFlare and that is what was causing the problem for them. I am guessing that CloudFlare was put there by their ISP.
I also found another message by someone who also tracked the problem down to their ISP:
https://social.msdn.microsoft.com/Forums/en-US/7b345fc9-5e3d-4015-99a9-26746ca3d84f/socket-error-10054-when-deploying-web-app-to-azure-via-visual-studio?forum=windowsazurewebsitespreview
Running the publish from behind a VPN has solved the issue for them.
So just try publishing via a different ISP or from behind a VPN. If you only have one fixed line broadband provider, then perhaps try tethering to your phone's broadband provider and publish over that connection.

Route53 Domain Transfer - Registry error - 2400 : Command failed (421 SESSION TIMEOUT)

I am trying to transfer a domain using Route53 and after a few minutes I receive an email with the following error.
Registry error - 2400 : Command failed (421 SESSION TIMEOUT)
Anyone have any ideas what this means or how to get around it?
I have never seen your error. There is a document on transferring domains with error messages. The reason that I am responding is that I have seen domain transfers fail going to Route 53 without every learning why they failed. Maybe this will help you.
NSI Registry Registrar Protocol (RRP)
421 Command failed due to server error. Client should try again A
transient server error has caused RRP command failure. A subsequent
retry may produce successful results.

Spark - Remote Akka Client Disassociated

I am setting up Spark 0.9 on AWS and am finding that when launching the interactive Pyspark shell, my executors / remote workers are first being registered:
14/07/08 22:48:05 INFO cluster.SparkDeploySchedulerBackend: Registered executor:
Actor[akka.tcp://sparkExecutor#ip-xx-xx-xxx-xxx.ec2.internal:54110/user/
Executor#-862786598] with ID 0
and then disassociated almost immediately, before I have the chance to run anything:
14/07/08 22:48:05 INFO cluster.SparkDeploySchedulerBackend: Executor 0 disconnected,
so removing it
14/07/08 22:48:05 ERROR scheduler.TaskSchedulerImpl: Lost an executor 0 (already
removed): remote Akka client disassociated
Any idea what might be wrong? I've tried adjusting the JVM options spark.akka.frameSize and spark.akka.timeout, but I'm pretty sure this is not the issue since (1) I'm not running anything to begin with, and (2) my executors are disconnecting a few seconds after startup, which is well within the default 100s timeout.
Thanks!
Jack
I had a very similar problem, if not the same.
It started to work for me once the workers were connecting to master by using the very same name as the master thought it had.
My log messages were something like:
ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker#idc1-hrm1.heylinux.com:7078] -> [akka.tcp://sparkMaster#vagrant-centos64.vagrantup.com:7077]: Error [Association failed with [akka.tcp://sparkMaster#vagrant-centos64.vagrantup.com:7077]].
ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker#192.168.121.127:7078] -> [akka.tcp://sparkMaster#idc1-hrm1.heylinux.com:7077]: Error [Association failed with [akka.tcp://sparkMaster#idc1-hrm1.heylinux.com:7077]]
WARN util.Utils: Your hostname, idc1-hrm1 resolves to a loopback address: 127.0.0.1; using 192.168.121.187 instead (on interface eth0)
So check the log of the master and see what name it thinks it has.
Then use that very same name on the workers.

how to set WMWARE VAPP runtime lease to NEVER EXPIRE via java api

How can I set to a vapp the runtime lease to NEVER EXPIRE on deployment of the vapp?
I'm using the vmware vcloud java api
When I deploy the vapp I use this code :
_vapp.deploy(false, 1000000, false).waitForTask(0);
the second parameter affects the runtime lese, I've tried 0 or 1 but had no effect, I got an error in deployment. How can I set this to NEVER EXPIRE??
Setting the value to 0 will result in an infinite lease assuming your org's lease settings allow such a lease.