504 Gateway Time-out from google cloud platform, but only sometimes - flask

I'm hosting a CouchBase single node cluster in GCP and a flask backend OpenShift cluster which supports angular frontend. The problem is, when a post is called in my flask by my angular, it is taking too much time to get connected the VM (couchbase) and hence flask has to return a "504 Gateway Time-out". But this happens only sometimes. Sometimes it just works very well with proper speed. Not able to troubleshoot. The total data size is less than 100M, and everything is 100% memory resident in Couchbase. So I guess this is not a problem with Couchbase. Just the connection latency to GCP.

My guess is that the first time your flask backend is trying to connect for the first time to your VM, it's taking more than usual as it needs to establish the connection, authenticate and possibly do other things depending on your use-case.
This is a common problem when hosting your app on App Engine or something similar and the solution there is to use "warm-up requests". This basically spins up the whole connection (and in app engine case the instance) and makes a test connection just so when the desired connection comes, everything is already set up.
So I suggest that you check how warm-up requests work and configure something similar between your flask and VM. So basically a route in flask with the only purpose of establishing a test connections with a test package. This way your next connection will be up to speed with no 504 errors.

try to clear to cache of load balancer in GCP console
I already faced same kind of issue and resolved it using above technique

Related

Django extremely slow page loads when using remote database

I have a working Django application that is running locally using an sqlite3 database without problem. However, when I change the Django database settings to use my external AWS RDS database all my pages start taking upwards of 40 seconds to load. I have checked my AWS metrics and my instance is not even close to being fully utilized. When I make a request to a view with no database read/write operations I also get the same problem. My activity monitor shows my local CPU spiking with each request. It shows a process named 'WindowsServer' using most of the CPU during each request.
I am aware more latency is expected when using a remote database but I don't think this should result in 40 second page lags. What other problems that could be causing this behaviour?
AWS database monitoring
Local machine
So your computer has connection to the server in Amazon, that's the problem with latency. Production servers should be in the same place as DB servers(or should have very very good connection, so the latency is lowered as much as possible.)
--edit--
So we need more details. What is your ISP? What is your connection properties? Uplink, downlink? What are pings to servers in AWS?

504 gateway timeout for any requests to Nginx with lot of free resources

We have been maintaining a project internally which has both web and mobile application platform. The backend of the project is developed in Django 1.9 (Python 3.4) and deployed in AWS.
The server stack consists of Nginx, Gunicorn, Django and PostgreSQL. We use Redis based cache server to serve resource intensive heavy queries. Our AWS resources include:
t1.medium EC2 (2 core, 4 GB RAM)
PostgreSQL RDS with one additional read-replica.
Right now Gunicorn is set to create 5 workers (by following the 2*n+1 rule). Load wise, there are like 20-30 mobile users making requests in every minute and there are 5-10 users checking the web panel every hour. So I would say, not very much load.
Now this setup works alright for 80% days. But when something goes wrong (for example, we detect a bug in the live system and we had to switch off the server for maintenance for few hours. In the mean time, the mobile apps have a queue of requests ready in their app. So when we make the backend live, a lot of users hit the system at the same time.), the server stops behaving normally and started responding with 504 gateway timeout error.
Surprisingly every time this happened, we found the server resources (CPU, Memory) to be free by 70-80% and the connection pool in the databases are mostly free.
Any idea where the problem is? How to debug? If you have already faced a similar issue, please share the fix.
Thank you,

jetty 404 error page on hot deployment

I am currently using Jetty 9.1.4 on Windows.
When I deploy the war file without hot deployment config, and then restart the Jetty service. During that 5-10 seconds starting process, all client connections to my Jetty server are waiting for the server to finish loading. Then clients will be able to view the contents.
Now with hot deployment config on, the default Jetty 404 error page shows within that 5-10 second loading interval.
Is there anyway I can make the hot deployment has the same behavior as the complete restart - clients connections will wait instead seeing the 404 error page ?
Unfortunately this does not seem to be possible currently after talking with the Jetty developers on IRC #jetty.
One solution I will try to use are two Jetty instances with a loadbalancing reverse proxy (e.g. nginx) before them and taking one instance down for deployment.
Of course this will instantly lead to new requirements (session persistence/sharing) which need to be handled. So in conclusion: much work to do in the Java world for zero downtime on deployments.
Edit: I will try this, seems like a simple enough solution http://rafaelsteil.com/zero-downtime-deploy-script-for-jetty/ Github: https://github.com/rafaelsteil/jetty-zero-downtime-deploy

Move to 2 Django physical servers (front and backend) from a single production server?

I currently have a growing Django production server that has all of the front end and backend services running on it. I could keep growing that server larger and larger, but instead I want to try and leave that main server as my backend server and create multiple front end servers that would run apache/nginx and remotely connect to the main production backend server.
I'm using slicehost now, so I don't think I can benefit from having the multiple servers run on an intranet. How do I do this?
The first step in scaling your server is usually to separate the database server. I'm assuming this is all you meant by "backend services", unless you give us any more details.
All this needs is a change to your settings file. Change DATABASE_HOST from localhost to the new IP of your database server.
If your site is heavy on static content, creating a separate media server could help. You may even look into a CDN.
The first step usually is to separate the server running actual Python code and the database server. Any background jobs that does processing would probably run on the database server. I assume that when you say front end server, you actually mean a server running Python code.
Now, as every request will have to do a number of database queries, latency between the webserver and the database server is very important. I don't know if Slicehost has some feature to allow you to create two virtual machines that are "close" in terms of network latency(a quick google search did not find anything). They seem like nice guys, so maybe you could ask them if they have such a service or could make an exception.
Anyway, when you do have two machines on Slicehost, you could check the latency between them by simply pinging between them. When you have the result you will probably know if this is at all feasible or not.
Further steps depends on your application. If it is media heavy, then maybe using a separate media server would make sense. Otherwise the normal step is to add more web servers.
--
As a side note, I personally think it makes more sense to invest in real dedicated servers with dedicated network equipment for this kind of setup. This of course depends on what budget you are on.
I would also suggest looking into Amazon EC2 where you can provision servers that are magically close to each other.

BizTalkServerIsolatedHost disappeared from one server in multi-server group

Afternoon all,
We have a group of four BizTalk servers: two orchestration hosts and two adapter hosts. We have a number of orchestrations exposed as web services, and for the purposes of this question, it is important to note that these web services are hosted on the adapter servers, and run under the BizTalkServerIsolatedHost host instance.
This morning, we started seeing odd errors on both of the adapter servers when SOAP calls came into the web services, like this:
The Messaging Engine failed to
register the adapter for “SOAP” for
the receive location blahblahblah.
Please verify that the receive
location exists, and that the isolated
adapter runs under an account that has
access to the BizTalk databases.
We restarted IIS on both servers, which fixed the errors on ONE server, but the other server continued to fail. The errors continued after a reboot as well.
After chasing our tails for a while, we eventually discovered that the BizTalkServerIsolatedHost host instance on the still-failing server was gone. Just... gone. These applications have been in production for months. Everything had been working swimmingly through the morning, until this just happened.
I don't want to muddy the waters, because I think the problems are unrelated, but in the interest of providing enough information, this problem exactly coincided with a problem in our load-balancing network hardware. The load balancer, which provides a single URL to consumers, and round-robins between the two adapter servers, just stopped working. This problem has not been resolved, so I don't know what happened, but it certainly made troubleshooting more interesting...
So, I have two questions:
Has anyone seen this before, where a host instance disappears?
We cannot find anything in the event viewer or anywhere else that says the host instance was deleted. Is this logged somewhere?
Thanks,
Jason