What does amazon AWS mean by "network performance"? - amazon-web-services

When choosing an amazon aws instance type to launch, there is a property of each type which is "Network Performance" which is either "Low", "Moderate", or "High".
I'm wondering what this exactly means. Will my ping be lower if I choose low? Or will it be ok as long as many users aren't logged in at once?
I'm launching a real time multiplayer game and I am so I am curious as to exactly what is meant under "network performance". I actually need fairly low memory and processing power, but instances with those criteria usually have "low" network performance.
Has anyone experience with the different network performances or have more information?

It's not official, but Serhiy Topchiy did a benchmark with different instance types:
http://epamcloud.blogspot.com.br/2013/03/testing-amazon-ec2-network-speed.html
For US-EAST-1, it seems that LOW corresponds to 50Mb/s, Moderate corresponds to 300Mb/s and High corresponds to 1Gb/s.

My recent experience here: https://serverfault.com/questions/1094608/benchmarking-aws-outbound-internet-bandwidth-egress-up-to-25-gbps
We ran a live video broadcast on two AWS EC2 servers, hosting 500 viewers, that degraded catastrophically after 10 minutes.
We reproduced the outbound bandwidth throttling with iperf (see link above).

I believe it was mentioned at the reInvent 2013 conference that the different properties are related to the underlying network connection: Some servers have 10GB connections (High) some have 1GB (Moderate) and some have 100MB (Low).
I cannot find any on-line documentation to confirm this, however.
Edit: There is an interesting article on Packet per second limit available here
Since this question was first posed, AWS has released more information on the networking stack, and many of the newer instance families can support up to 25Gbps with the appropriate ENA drivers. It looks like much of the increased performance is due to the new Nitro system.

Related

what will happen if my virtual machine too slow

i have a newbie question in here, but i'm new to clouds and linux, i'm using google cloud now and wondering when choosing a machine config
what if my machine is too slow? will it make the app crash? or just slow it down
how fast should my vm be? in the image bellow
last 6 hours of a python scripts i'm running and it's cpu usage, it's obviously running for less than %2 of the cpu for most of it's time, but there's a small spike, should i care about the spike? and also, how much should my cpu usage be max before i upgrade? if a script i'm running is using 50-60% of the cpu most of the i assume i'm safe, or what's the max before you upgrade?
what if my machine is too slow? will it make the app crash? or just
slow it down
It depends.
Some applications will just respond slower. Some will fail if they have timeout restrictions. Some applications will begin to thrash which means that all of a sudden the app becomes very very slow.
A general rule, which varies among architects, is to never consume more than 80% of any resource. I use the rule 50% so that my service can handle burst traffic or denial of service attempts.
Based on your graph, your service is fine. The spike is probably normal system processing. If the spike went to 100%, I would be concerned.
Once your service consumes more than 50% of a resource (CPU, memory, disk I/O, etc) then it is time to upgrade that resource.
Also, consider that there are other services that you might want to add. Examples are load balancers, Cloud Storage, CDNs, firewalls such as Cloud Armor, etc. Those types of services tend to offload requirements from your service and make your service more resilient, available and performant. The biggest plus is your service is usually faster for the end user. Some of those services are so cheap, that I almost always deploy them.
You should choose machine family based on your needs. Check the link below for details and recommendations.
https://cloud.google.com/compute/docs/machine-types
If CPU is your concern you should create a managed instance group that automatically scales based on CPU usage. Usually 80-85% is a good value for a max CPU value. Check the link below for details.
https://cloud.google.com/compute/docs/autoscaler/scaling-cpu
You should also consider the availability needed for your workload to keep costs efficient. See below link for other useful info.
https://cloud.google.com/compute/docs/choose-compute-deployment-option

GCP Network Egress charges are growing exponentially

Our Network egress charges are growing month on month. Going by the cost, we are egressing upwards of 800GB in a month, to the tune of 300KB/s on avg (600Kb/s daytime and 200kb/s night time)
I analyzed all possible scripts which are sending out data. But none of them is sending out data at this volume. I turned them off one by one but it didn't make much difference.
I momentarily turned on VPC logs, downloaded and analyzed the logs.. it is all distributed across IPs.. about 300 different IPs in a min with average of 10-12kb so about 33Mb/min. there were no IPs which stood out.
I noticed most of them using port 443.
When I use nethogs to identify the process which is doing most of the egress.. it only gave Apache & only showed me 50Kb/s. Where is the rest of the egress??
I mooted the possibility of a DDoS attack but that should show up in the Apache access logs. Apache access logs do not show any suspicious IP/url.
Looking for hints/direction I should take. Apologize if I am missing to give any crucial detail for you to analyze the issue. I will keep adding more details to the question.
If you suspect the DDos attack happened already i would recommend to use the Cloud Armour but before that see if you have followed all the mitigations to avoid the DDos attack.
https://cloud.google.com/files/GCPDDoSprotection-04122016.pdf
What you're experiencing is most probably a DDOS attack just as sankar wrote.
According to you nothing stands out particurarily in the logs which makes DDOS theory more probable.
Using Cloud Armor seems the easiest way to protect your server/app out of the box without too much effort since one of it's key features is Adaptive Protection;
Google Cloud Armor Adaptive Protection helps you protect your Google
Cloud applications, websites, and services against L7 distributed
denial-of-service (DDoS) attacks such as HTTP floods and other
high-frequency layer 7 (application-level) malicious activity.
Adaptive Protection builds machine-learning models that do the
following:
Detect and alert on anomalous activity
Generate a signature describing the potential attack
Generate a custom Google Cloud Armor WAF rule to block the signature
This way you will be able to avoid most of that kind of attacks and save money. Even the fact that you pay for this feature should be beneficial to you in terms of money - not to speak that your server will be a lot more secure and you can focus more on other things.
---------- U P D A T E --------------
There may be one more reason.
A rootkit typically patches the kernel or other software libraries to alter the behavior of the operating system. Once this is happening, you cannot trust anything that the operating system tells you.
This way typical tools won't show the traffic or any suspicious processes.
Have a look at the list of tools that may be helpful to detect any rootkits.

How to reduce the bandwidth cost on (Open)Vpn server?

I have a general question. I made a VPN accessible in mobile platforms, it’s hosted on AWS and everything works fine.
The only problem is it costs a lot of money, (Amazon bill is over $1000) and not because of CPU or memory usage, which are negligible, but because of bandwidth usage.
For our Devops friends- It’s not that I need an orchestrator like Kubernetes, because it works the same way if there are 300 people connected or only 1, everything works fine on a single instance (only the bills changes)
Is there any way to reduce the cost? Did I make the VPN in the correct way? How can I reduce the bandwidth per user?
I used hosted OpenVPN servers on Amazon, Ubuntu 18.04 machine and the clients are mobile applications.
You could probably save a little bit of bandwidth (between the client and server) by enabling compression:
–compress [algorithm]
Enable a compression algorithm.The algorithm parameter may be “lzo”,
“lz4”, or empty. LZO and LZ4 are different compression algorithms,
with LZ4 generally offering the best performance with least CPU usage.
For backwards compatibility with OpenVPN versions before v2.4, use
“lzo” (which is identical to the older option “–comp-lzo yes”).
https://openvpn.net/community-resources/reference-manual-for-openvpn-2-4/

AWS EC2 Performance explanation

I have a REST API web server, built in .NetCore, that has data heavy APIs.
This is hosted on AWS EC2, I have noticed that the average response time for certain APIs are ~4 seconds and if I turn up the AWS-EC2 specs, the response time goes down to a few milliseconds. I guess this is expected, what I don't understand is that even when I load test the APIs on a lower end CPU, the server never crosses 50% utilization of memory/CPU. So what is the correct technical explanation that makes the APIs perform faster if the lower end CPU never reaches a 100% utilization of memory/CPU?
There is no simple answer, there are so many ec2 variations you need to first figure out what is slowing down your API.
When you 'turn up' your ec2 instance, you are getting some combination of more memory, faster cpu, faster disk and more network bandwidth - and we can't tell which one of those 'more' features are improving your performance. Different instance classes ar optimized for different problems.
It could be as simple as the better network bandwidth, or it could be that your application is disk-bound and the better instance you chose is optimized for i/O performance.
Depending on what feature your instance is lacking, it would help you decide which type of instance to upgrade to - or as you have found out, just upgrade to something 'bigger' and be happy with the performance (at the tradeoff of being more expensive).

GCP Compute Engine limits download to 50 K/s?

From some reason download traffic from virtual machine on GCP (Google Cloud Platform) with Debian 9 is limited to 50K/s? Upload seems to be fine, inline with my local upload link.
It is the same with scp or https download. Any suggestions what might be wrong, where to search?
Machine type
n1-standard-1 (1 vCPU, 3.75 GB memory)
CPU platform
Intel Skylake
Zone
europe-west4-a
Network interfaces
Premium tier
Thanks,
Mihaelus
Simple test:
wget https://hrcki.primasystems.si/Nova/assets/download.test.html
Output:
--2018-10-18 15:21:00-- https://hrcki.primasystems.si/Nova/assets/download.test.html Resolving
hrcki.primasystems.si (hrcki.primasystems.si)... 35.204.252.248
Connecting to hrcki.primasystems.si
(hrcki.primasystems.si)|35.204.252.248|:443... connected. HTTP request
sent, awaiting response... 200 OK Length: 541422592 (516M) [text/html]
Saving to: `download.test.html.1' 0% [] 1,073,152 48.7K/s eta
2h 59m
Always good to minimize variables when trying to diagnose. So while it is unlikely the use of HTTP is why things are that very slow, you might consider using netperf or iperf3 to measure TCP bulk transfer performance between your VM in GCP and your local system. You can do that either "by hand" or via PerfKit Benchmarker https://cloud.google.com/blog/products/networking/perfkit-benchmarker-for-evaluating-cloud-network-performance
It can be helpful to have packet traces - from both ends when possible - to look at. You want the packet traces to be started before the test - it is important to see the packets used to establish the TCP connection(s). They do not need to be "full packet" traces, and often you don't want them to be. Capturing just the first 96 bytes of each packet would be sufficient for this sort of investigating.
You might also consider taking snapshots of the network statistics offered by the OSes running in your GCP VM and local system. For example, if running *nix taking a snapshot of "netstat -s" before and after the test. And perhaps a traceroute from each end towards the other.
Network statistics and packet traces, along with as many details about the two endpoints as possible are among the sorts of things support organizations are likely to request when looking to help resolve an issue of this sort.