I have a working Django application that is running locally using an sqlite3 database without problem. However, when I change the Django database settings to use my external AWS RDS database all my pages start taking upwards of 40 seconds to load. I have checked my AWS metrics and my instance is not even close to being fully utilized. When I make a request to a view with no database read/write operations I also get the same problem. My activity monitor shows my local CPU spiking with each request. It shows a process named 'WindowsServer' using most of the CPU during each request.
I am aware more latency is expected when using a remote database but I don't think this should result in 40 second page lags. What other problems that could be causing this behaviour?
AWS database monitoring
Local machine
So your computer has connection to the server in Amazon, that's the problem with latency. Production servers should be in the same place as DB servers(or should have very very good connection, so the latency is lowered as much as possible.)
--edit--
So we need more details. What is your ISP? What is your connection properties? Uplink, downlink? What are pings to servers in AWS?
Related
We use single-tenant architecture for our instances. Each instance contains 3 Django Apps i.e Django, Celery{worker, beat} and few other things that don't interact with the database. We deploy cloudsql-proxy as a sidecar for these django containers which are running as Pod in Google Kubernetes Engine. We are using CloudSQL (Postgres 9.6) by Google and it has Public IP address.
The problem is that we are getting Operational Errors on Django side i.e
OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
and when we check the Pod's logs at the same time when the OperationalError occurred we see the following log error from cloudsql-proxy container
couldn't connect to db_instance: dial tcp our_db_instance_public_ip:3307: connect: connection timed out
It is not that the connection to database doesn't work. It works most of the time but sometime it throws the above errors, which is kind of a pain because we run celery tasks every other minute and they fail due to this. Sometime, this occurs when the end user is interacting with our application and their requests fails.
Our application isn't under very high load. We set the maximum connection of our database to 1000. And the peak number of connection is around 35 (sum of all instance's connections). I checked the stats of Database and it seems pretty happy i.e CPU utilization almost never goes above 50%, Disk is 30% used, Memory usage is around 50%.
I can provide more details if needed. Would appreciate any help!
I'm hosting a CouchBase single node cluster in GCP and a flask backend OpenShift cluster which supports angular frontend. The problem is, when a post is called in my flask by my angular, it is taking too much time to get connected the VM (couchbase) and hence flask has to return a "504 Gateway Time-out". But this happens only sometimes. Sometimes it just works very well with proper speed. Not able to troubleshoot. The total data size is less than 100M, and everything is 100% memory resident in Couchbase. So I guess this is not a problem with Couchbase. Just the connection latency to GCP.
My guess is that the first time your flask backend is trying to connect for the first time to your VM, it's taking more than usual as it needs to establish the connection, authenticate and possibly do other things depending on your use-case.
This is a common problem when hosting your app on App Engine or something similar and the solution there is to use "warm-up requests". This basically spins up the whole connection (and in app engine case the instance) and makes a test connection just so when the desired connection comes, everything is already set up.
So I suggest that you check how warm-up requests work and configure something similar between your flask and VM. So basically a route in flask with the only purpose of establishing a test connections with a test package. This way your next connection will be up to speed with no 504 errors.
try to clear to cache of load balancer in GCP console
I already faced same kind of issue and resolved it using above technique
We have been maintaining a project internally which has both web and mobile application platform. The backend of the project is developed in Django 1.9 (Python 3.4) and deployed in AWS.
The server stack consists of Nginx, Gunicorn, Django and PostgreSQL. We use Redis based cache server to serve resource intensive heavy queries. Our AWS resources include:
t1.medium EC2 (2 core, 4 GB RAM)
PostgreSQL RDS with one additional read-replica.
Right now Gunicorn is set to create 5 workers (by following the 2*n+1 rule). Load wise, there are like 20-30 mobile users making requests in every minute and there are 5-10 users checking the web panel every hour. So I would say, not very much load.
Now this setup works alright for 80% days. But when something goes wrong (for example, we detect a bug in the live system and we had to switch off the server for maintenance for few hours. In the mean time, the mobile apps have a queue of requests ready in their app. So when we make the backend live, a lot of users hit the system at the same time.), the server stops behaving normally and started responding with 504 gateway timeout error.
Surprisingly every time this happened, we found the server resources (CPU, Memory) to be free by 70-80% and the connection pool in the databases are mostly free.
Any idea where the problem is? How to debug? If you have already faced a similar issue, please share the fix.
Thank you,
I have a django app without views, I only use it to provide a REST API using django-piston package.
Since I have deployed it to amazon-ec2 with mod-wsgi, after some requests it freezes, and the CPU goes to 100% of usage divided by python and httpd processes.
I'm using Postgres 8.4, Python 2.5 and Django 'ENGINE': 'django.contrib.gis.db.backends.postgis'.
Logs don't show me any problem. How can I debug the problem?
Sounds like you're in a micro instance. Micro instances are able to burst large amounts of cpu for a VERY short amount of time, after that they must drop to very low background levels for an extended duration or else amazon with harshly throttle it. If you're getting concurrent requests most likely even a lightly cpu intensive app would cause the throttling to kick in.
Micro instances are only usable for very very light traffic on something like a very basic blog and that's like it.
Their user guide goes into this in detail: Micro Instance guide.
Let me first explain that I am very new when it comes to use AppFabric for improving the Responsiveness of your application. I am trying to configure the Server Cluster with 2 Nodes over XML provider over Network Shared location.
My requirement is that the cached data should be created on both the Hosts so that If One of the host is down my other host in the Cluster should be able to serve the request and provide the cached data. As I said I have 2 Host in my Cluster and one of them is defined as Lead Host. Now when I am saving the data in cache I could not see the data in both the hosts (Not sure is there any specific command where you can see the data in a specific host). So what I want to test is that I’ll stop one of the Cache host and try to see if still I able to get the data from the second cache host.
thanks in advance
-Nitin
What you're talking about here is High Availability. To enable this, you'll need to be running Windows Server Enterprise Edition - if you're on Standard Edition then you just can't do it. You also really need a minimum of three hosts, so that if one goes down there are still two copies of your cached data to provide failover. If you can meet these requirements then the only extra step to create a highly-available cache is to set the Secondaries flag when you call new-cache e.g.
new-cache myHACache -Secondaries 1
There's no programmatic way to query what data is held on a specific host, because you only ever address the logical cache, not an individual physical host.
From our experience, using SQL authentication to the database does not work. Its clearly stated that only Integrated Security option is supported. Also we faced issues with the service running with "Integrated Security" since our SQL cluster was running under a domain account and AppFabric needs to run under "Network service" and we couldnt successfully connect to the sql cluster from AppFabric service.
This was a painful experience for us and I hope AppFabric caching improves the way it sends out "error messages and error codes". And also allows us to decide how we want to connect to the sql. KInd of stupid having to undergo this pain of "has to run as Network Service" and "no SQL authentication".