Remote (localhost) connection and performance of AWS RDS is very slow - amazon-web-services

I am experiencing the same issue as many of others have had (based on my research) but none of them seemed to have figured out a solution.
I am connecting from localhost to my AWS database (server located in Dublin - eu-west-1) and I am located in Denmark.
From my investigation I came to this conclusion that none of the above are the culprit:
Internet speed I am using is 1GB/s so it is definitely not a problem!
Bandwidth download - 500 GB, upload - 150 GB.
Max connections to database is 312.
Latency shouldn't be a problem as it is within the continent.
I referred to this post but couldn't find any of the posted solutions that fits my case AWS RDS painfully slow when connecting from local machine
My queries are taking way too long (40+ seconds).
What could be causing it?!
EDIT:
I currently am using PDO.
I have tested it with ODBC connection, same problem there.
If you are thinking that the amount of data is affecting it, that's wrong. I am fetching the same amount of data from the server hosted web but with a lighting speed.

Related

Django extremely slow page loads when using remote database

I have a working Django application that is running locally using an sqlite3 database without problem. However, when I change the Django database settings to use my external AWS RDS database all my pages start taking upwards of 40 seconds to load. I have checked my AWS metrics and my instance is not even close to being fully utilized. When I make a request to a view with no database read/write operations I also get the same problem. My activity monitor shows my local CPU spiking with each request. It shows a process named 'WindowsServer' using most of the CPU during each request.
I am aware more latency is expected when using a remote database but I don't think this should result in 40 second page lags. What other problems that could be causing this behaviour?
AWS database monitoring
Local machine
So your computer has connection to the server in Amazon, that's the problem with latency. Production servers should be in the same place as DB servers(or should have very very good connection, so the latency is lowered as much as possible.)
--edit--
So we need more details. What is your ISP? What is your connection properties? Uplink, downlink? What are pings to servers in AWS?

GCP Compute Engine limits download to 50 K/s?

From some reason download traffic from virtual machine on GCP (Google Cloud Platform) with Debian 9 is limited to 50K/s? Upload seems to be fine, inline with my local upload link.
It is the same with scp or https download. Any suggestions what might be wrong, where to search?
Machine type
n1-standard-1 (1 vCPU, 3.75 GB memory)
CPU platform
Intel Skylake
Zone
europe-west4-a
Network interfaces
Premium tier
Thanks,
Mihaelus
Simple test:
wget https://hrcki.primasystems.si/Nova/assets/download.test.html
Output:
--2018-10-18 15:21:00-- https://hrcki.primasystems.si/Nova/assets/download.test.html Resolving
hrcki.primasystems.si (hrcki.primasystems.si)... 35.204.252.248
Connecting to hrcki.primasystems.si
(hrcki.primasystems.si)|35.204.252.248|:443... connected. HTTP request
sent, awaiting response... 200 OK Length: 541422592 (516M) [text/html]
Saving to: `download.test.html.1' 0% [] 1,073,152 48.7K/s eta
2h 59m
Always good to minimize variables when trying to diagnose. So while it is unlikely the use of HTTP is why things are that very slow, you might consider using netperf or iperf3 to measure TCP bulk transfer performance between your VM in GCP and your local system. You can do that either "by hand" or via PerfKit Benchmarker https://cloud.google.com/blog/products/networking/perfkit-benchmarker-for-evaluating-cloud-network-performance
It can be helpful to have packet traces - from both ends when possible - to look at. You want the packet traces to be started before the test - it is important to see the packets used to establish the TCP connection(s). They do not need to be "full packet" traces, and often you don't want them to be. Capturing just the first 96 bytes of each packet would be sufficient for this sort of investigating.
You might also consider taking snapshots of the network statistics offered by the OSes running in your GCP VM and local system. For example, if running *nix taking a snapshot of "netstat -s" before and after the test. And perhaps a traceroute from each end towards the other.
Network statistics and packet traces, along with as many details about the two endpoints as possible are among the sorts of things support organizations are likely to request when looking to help resolve an issue of this sort.

504 gateway timeout for any requests to Nginx with lot of free resources

We have been maintaining a project internally which has both web and mobile application platform. The backend of the project is developed in Django 1.9 (Python 3.4) and deployed in AWS.
The server stack consists of Nginx, Gunicorn, Django and PostgreSQL. We use Redis based cache server to serve resource intensive heavy queries. Our AWS resources include:
t1.medium EC2 (2 core, 4 GB RAM)
PostgreSQL RDS with one additional read-replica.
Right now Gunicorn is set to create 5 workers (by following the 2*n+1 rule). Load wise, there are like 20-30 mobile users making requests in every minute and there are 5-10 users checking the web panel every hour. So I would say, not very much load.
Now this setup works alright for 80% days. But when something goes wrong (for example, we detect a bug in the live system and we had to switch off the server for maintenance for few hours. In the mean time, the mobile apps have a queue of requests ready in their app. So when we make the backend live, a lot of users hit the system at the same time.), the server stops behaving normally and started responding with 504 gateway timeout error.
Surprisingly every time this happened, we found the server resources (CPU, Memory) to be free by 70-80% and the connection pool in the databases are mostly free.
Any idea where the problem is? How to debug? If you have already faced a similar issue, please share the fix.
Thank you,

Websocket performance on AWS EC2

I have issues with websocket performance on AWS EC2.
I use websockets to listen to a server with incoming network rate 100-300 Kb/sec. Just listening, not sending. On EC2, every 10-20 minutes, I get disconnected (code 1006 - abnormal connection loss - no reason given). I have tested with t2.micro (which I believe should be more than enough for such a small task) and t2.large. I use US East, which should be close to the source.
This is to be compared with only one disconnection every few hours when I run the same app on my personal computer, in a different country. I have used two different libraries (Python aiohttp and websockets) to confirm that I have the same issues.
This points to an issue with network quality on EC2. However I'm not sure if this websockets task is demanding, so this is surprising.
Did anyone experience this before? What other diagnostics can I do to better understand the root cause?

Strange apache lag in requests

I have an Apache2 and Django (mod_wsgi) setup that provides a RESTful API. I have a set of automated tests for this, that executes ~1000 API requests (pure http GET/POST/PUT/DELETE) in sequential order.
The problem is, for every 80 requests or so, I get a strange lag/timeout for exactly 5s or 10s. See timestamp examples here:
Request 1: 2013-08-30T03:49:20.915
Response 1: 2013-08-30T03:49:30.940
Request 2: 2013-08-30T03:50:32.559
Response 2: 2013-08-30T03:50:37.597
I can't figure out why this happens. I have an apache config with KeepAlive Off (recommended setup setting for Django) but otherwise standard install for Ubuntu 12.04 LTS.
I'm running the tests from the same server where the webserver is, I first thought this was some kind of DNS cache thing, but I've added the hostname I'm requesting to /etc/hosts but the problem persists.
The system is idle and have lots of cpu and mem when this lag/timeouts happens.
The lag is not specific to a certain request (URL), it seems kinda random.
Considering that it's always exactly to the millisecond 5s or 10s, it feels like this is some specific setting somewhere causing this.
In case it provides some insight, watch my talk from PyCon US.
http://lanyrd.com/2013/pycon/scdyzk/
The talk deals with things like process churn and startup costs. One thing you shouldn't do is set maximum requests if you don't really need it.
Also consider trying New Relic to help diagnose where the issue is. That will save a lot of guessing if it is a web application of backend service infrastructure issue.
As far as seeing how such monitoring can help, watch another one of my PyCon talks.
http://lanyrd.com/2012/pycon/spcdg/
This was a DNS issue, adding the domainname I used locally to /etc/hosts actually solved the problem. I just hadn't reboot the server for the changes to take effect, thought restarting networking would take care of that, but apparently not.