Game server - high latency - google-cloud-platform

I'm trying to host a Spigot Minecraft 1.12.2 Server using Ubuntu, The server has been properly set up and is working properly, The ping however isn't really great, I am playing from India and the server VM instance region has been set to Germany-Frankfurt, I should be getting anywhere between 130-200ms latency but It's always above 300 or even 1000 at times, I did tracert using windows CMD terminal and the packets seem to go to U.S.A first and then to Germany, I asked several of my friends to ping the server and they all get the same result. How can I fix this? Is there any way to route packets straight to Germany Instead of going to the U.S first?
Made a new Instance in Mumbai Region, India, which is where I live, I'm getting 3 Ping while on the server select menu, but upon joining it jumps to 200.
I expect around 130-160 ping, which is what I get on other servers on that region, Other players who live near Germany are getting high pings, I can't make this server public with a major issue like this.

Have a look at the network map on this page: https://cloud.google.com/about/locations/#network-tab
As you can see, Google's network is not connected between Europe and India - therefore traffic has to take a detour around the other side of the world through Asia and the US.
Within a region, so from Germany to Germany and from India to India, you should however achieve low latency.

Probably you're experiencing this issue due to instance's machine type and CPU's count.
As stated in the documentation:
"Outbound or egress traffic from a virtual machine is subject to maximum network egress throughput caps. These caps are dependent on the number of vCPUs that a virtual machine instance has. Each core is subject to a 2 Gbits/second (Gbps) cap for peak performance. Each additional core increases the network cap, up to a theoretical maximum of 16 Gbps for each virtual machine".
Having so little information about your setup i cannot help you further unfortunately.
Please provide more information about your setup and your customer's needs.
For example, who will your customers be? From which country? Is this the reason why you're using an European region for your services while you live in India?

Related

GCP Network Egress charges are growing exponentially

Our Network egress charges are growing month on month. Going by the cost, we are egressing upwards of 800GB in a month, to the tune of 300KB/s on avg (600Kb/s daytime and 200kb/s night time)
I analyzed all possible scripts which are sending out data. But none of them is sending out data at this volume. I turned them off one by one but it didn't make much difference.
I momentarily turned on VPC logs, downloaded and analyzed the logs.. it is all distributed across IPs.. about 300 different IPs in a min with average of 10-12kb so about 33Mb/min. there were no IPs which stood out.
I noticed most of them using port 443.
When I use nethogs to identify the process which is doing most of the egress.. it only gave Apache & only showed me 50Kb/s. Where is the rest of the egress??
I mooted the possibility of a DDoS attack but that should show up in the Apache access logs. Apache access logs do not show any suspicious IP/url.
Looking for hints/direction I should take. Apologize if I am missing to give any crucial detail for you to analyze the issue. I will keep adding more details to the question.
If you suspect the DDos attack happened already i would recommend to use the Cloud Armour but before that see if you have followed all the mitigations to avoid the DDos attack.
https://cloud.google.com/files/GCPDDoSprotection-04122016.pdf
What you're experiencing is most probably a DDOS attack just as sankar wrote.
According to you nothing stands out particurarily in the logs which makes DDOS theory more probable.
Using Cloud Armor seems the easiest way to protect your server/app out of the box without too much effort since one of it's key features is Adaptive Protection;
Google Cloud Armor Adaptive Protection helps you protect your Google
Cloud applications, websites, and services against L7 distributed
denial-of-service (DDoS) attacks such as HTTP floods and other
high-frequency layer 7 (application-level) malicious activity.
Adaptive Protection builds machine-learning models that do the
following:
Detect and alert on anomalous activity
Generate a signature describing the potential attack
Generate a custom Google Cloud Armor WAF rule to block the signature
This way you will be able to avoid most of that kind of attacks and save money. Even the fact that you pay for this feature should be beneficial to you in terms of money - not to speak that your server will be a lot more secure and you can focus more on other things.
---------- U P D A T E --------------
There may be one more reason.
A rootkit typically patches the kernel or other software libraries to alter the behavior of the operating system. Once this is happening, you cannot trust anything that the operating system tells you.
This way typical tools won't show the traffic or any suspicious processes.
Have a look at the list of tools that may be helpful to detect any rootkits.

GCP Compute Engine limits download to 50 K/s?

From some reason download traffic from virtual machine on GCP (Google Cloud Platform) with Debian 9 is limited to 50K/s? Upload seems to be fine, inline with my local upload link.
It is the same with scp or https download. Any suggestions what might be wrong, where to search?
Machine type
n1-standard-1 (1 vCPU, 3.75 GB memory)
CPU platform
Intel Skylake
Zone
europe-west4-a
Network interfaces
Premium tier
Thanks,
Mihaelus
Simple test:
wget https://hrcki.primasystems.si/Nova/assets/download.test.html
Output:
--2018-10-18 15:21:00-- https://hrcki.primasystems.si/Nova/assets/download.test.html Resolving
hrcki.primasystems.si (hrcki.primasystems.si)... 35.204.252.248
Connecting to hrcki.primasystems.si
(hrcki.primasystems.si)|35.204.252.248|:443... connected. HTTP request
sent, awaiting response... 200 OK Length: 541422592 (516M) [text/html]
Saving to: `download.test.html.1' 0% [] 1,073,152 48.7K/s eta
2h 59m
Always good to minimize variables when trying to diagnose. So while it is unlikely the use of HTTP is why things are that very slow, you might consider using netperf or iperf3 to measure TCP bulk transfer performance between your VM in GCP and your local system. You can do that either "by hand" or via PerfKit Benchmarker https://cloud.google.com/blog/products/networking/perfkit-benchmarker-for-evaluating-cloud-network-performance
It can be helpful to have packet traces - from both ends when possible - to look at. You want the packet traces to be started before the test - it is important to see the packets used to establish the TCP connection(s). They do not need to be "full packet" traces, and often you don't want them to be. Capturing just the first 96 bytes of each packet would be sufficient for this sort of investigating.
You might also consider taking snapshots of the network statistics offered by the OSes running in your GCP VM and local system. For example, if running *nix taking a snapshot of "netstat -s" before and after the test. And perhaps a traceroute from each end towards the other.
Network statistics and packet traces, along with as many details about the two endpoints as possible are among the sorts of things support organizations are likely to request when looking to help resolve an issue of this sort.

Websocket performance on AWS EC2

I have issues with websocket performance on AWS EC2.
I use websockets to listen to a server with incoming network rate 100-300 Kb/sec. Just listening, not sending. On EC2, every 10-20 minutes, I get disconnected (code 1006 - abnormal connection loss - no reason given). I have tested with t2.micro (which I believe should be more than enough for such a small task) and t2.large. I use US East, which should be close to the source.
This is to be compared with only one disconnection every few hours when I run the same app on my personal computer, in a different country. I have used two different libraries (Python aiohttp and websockets) to confirm that I have the same issues.
This points to an issue with network quality on EC2. However I'm not sure if this websockets task is demanding, so this is surprising.
Did anyone experience this before? What other diagnostics can I do to better understand the root cause?

What does amazon AWS mean by "network performance"?

When choosing an amazon aws instance type to launch, there is a property of each type which is "Network Performance" which is either "Low", "Moderate", or "High".
I'm wondering what this exactly means. Will my ping be lower if I choose low? Or will it be ok as long as many users aren't logged in at once?
I'm launching a real time multiplayer game and I am so I am curious as to exactly what is meant under "network performance". I actually need fairly low memory and processing power, but instances with those criteria usually have "low" network performance.
Has anyone experience with the different network performances or have more information?
It's not official, but Serhiy Topchiy did a benchmark with different instance types:
http://epamcloud.blogspot.com.br/2013/03/testing-amazon-ec2-network-speed.html
For US-EAST-1, it seems that LOW corresponds to 50Mb/s, Moderate corresponds to 300Mb/s and High corresponds to 1Gb/s.
My recent experience here: https://serverfault.com/questions/1094608/benchmarking-aws-outbound-internet-bandwidth-egress-up-to-25-gbps
We ran a live video broadcast on two AWS EC2 servers, hosting 500 viewers, that degraded catastrophically after 10 minutes.
We reproduced the outbound bandwidth throttling with iperf (see link above).
I believe it was mentioned at the reInvent 2013 conference that the different properties are related to the underlying network connection: Some servers have 10GB connections (High) some have 1GB (Moderate) and some have 100MB (Low).
I cannot find any on-line documentation to confirm this, however.
Edit: There is an interesting article on Packet per second limit available here
Since this question was first posed, AWS has released more information on the networking stack, and many of the newer instance families can support up to 25Gbps with the appropriate ENA drivers. It looks like much of the increased performance is due to the new Nitro system.

Winsock IOCP Server Stress Test Issue

I have a winsock IOCP server written in c++ using TCP IP connections. I have tested this server locally, using the loopback address with a client simulator. I have been able to get upwards of 60,000 clients no sweat. The issue I am having, is when I run the server at my house and the client simulator at a friends house. Everything works fine up until we hit around 3700 connections, after that every call to connect() fails from the client side with a return of 10060 (this is the winsock timed out error). Last night this number was 3700, but it has been around 300 before, and we also saw it near 1000. But whatever the number is, every time we try to simulate it, it will fail right around that number (within 10 or so).
Both computers are using Windows 7 Ultimate. We have also both modified the TCPIP registry setting MaxTcpConnections to around 16 million. We also changed the MaxUserPort setting from its 5000 default to 65k. No useful information is showing up in the event viewer. We also both watched our resource monitor, and we havent even gotten to 1% network utilization, the CPU is also close to 0% usage as well.
We just got off the phone with our ISP, and they are saying that they are not limiting us in any way but the guy was kinda unsure and ended up hanging up on us anyway after a 30 minute hold time...
We are trying everything to figure this issue out, but cannot come up with the solution. I would be very greatful if someone out there could give us a hand with this issue.
P.S. Both computers are on Verizon FIOS with the same verizon router. Another thing to note, the server is using WSAAccept and NOT AcceptEx. The client simulator is attempting to connect over many seconds though, so I am pretty sure the connects are not getting backlogged. We have tried to change the speed at which the client simulator connects, and no matter what speed it is set to it fails right around the same number each time.
UPDATE
We simulated 2 separate clients (on 2 separate machines) on network A. The server was running on network B. Each client was only able to connect half (about 1600) connections to the server. We were initially using a port below 1,000, this has been changed to above 50,000. The router log on both machines showed nothing. We are both using the Actiontec MI424WR verizon FIOS router. This leads me to believe the problem is not with the client code. The server throws no errors and has no unexpected behavior. Could this be an ISP/Router issue?
UPDATE
The solution has been found. The verizon router we were using (MI424WR revision C) is unable to handle any more than 3700 connections, we tested this with a separate set of networks. Thanks for the help guys!
Thanks
- Rick
I would have guessed that this was a MaxUserPort issue, but you say you've changed that. Did you reboot after changing it?
Run the test on the exact same computers on your local network (this will take the computers out of the equation).
The issue could be one of your routers not being up to the job?