How to reduce the bandwidth cost on (Open)Vpn server? - amazon-web-services

I have a general question. I made a VPN accessible in mobile platforms, it’s hosted on AWS and everything works fine.
The only problem is it costs a lot of money, (Amazon bill is over $1000) and not because of CPU or memory usage, which are negligible, but because of bandwidth usage.
For our Devops friends- It’s not that I need an orchestrator like Kubernetes, because it works the same way if there are 300 people connected or only 1, everything works fine on a single instance (only the bills changes)
Is there any way to reduce the cost? Did I make the VPN in the correct way? How can I reduce the bandwidth per user?
I used hosted OpenVPN servers on Amazon, Ubuntu 18.04 machine and the clients are mobile applications.

You could probably save a little bit of bandwidth (between the client and server) by enabling compression:
–compress [algorithm]
Enable a compression algorithm.The algorithm parameter may be “lzo”,
“lz4”, or empty. LZO and LZ4 are different compression algorithms,
with LZ4 generally offering the best performance with least CPU usage.
For backwards compatibility with OpenVPN versions before v2.4, use
“lzo” (which is identical to the older option “–comp-lzo yes”).
https://openvpn.net/community-resources/reference-manual-for-openvpn-2-4/

Related

what will happen if my virtual machine too slow

i have a newbie question in here, but i'm new to clouds and linux, i'm using google cloud now and wondering when choosing a machine config
what if my machine is too slow? will it make the app crash? or just slow it down
how fast should my vm be? in the image bellow
last 6 hours of a python scripts i'm running and it's cpu usage, it's obviously running for less than %2 of the cpu for most of it's time, but there's a small spike, should i care about the spike? and also, how much should my cpu usage be max before i upgrade? if a script i'm running is using 50-60% of the cpu most of the i assume i'm safe, or what's the max before you upgrade?
what if my machine is too slow? will it make the app crash? or just
slow it down
It depends.
Some applications will just respond slower. Some will fail if they have timeout restrictions. Some applications will begin to thrash which means that all of a sudden the app becomes very very slow.
A general rule, which varies among architects, is to never consume more than 80% of any resource. I use the rule 50% so that my service can handle burst traffic or denial of service attempts.
Based on your graph, your service is fine. The spike is probably normal system processing. If the spike went to 100%, I would be concerned.
Once your service consumes more than 50% of a resource (CPU, memory, disk I/O, etc) then it is time to upgrade that resource.
Also, consider that there are other services that you might want to add. Examples are load balancers, Cloud Storage, CDNs, firewalls such as Cloud Armor, etc. Those types of services tend to offload requirements from your service and make your service more resilient, available and performant. The biggest plus is your service is usually faster for the end user. Some of those services are so cheap, that I almost always deploy them.
You should choose machine family based on your needs. Check the link below for details and recommendations.
https://cloud.google.com/compute/docs/machine-types
If CPU is your concern you should create a managed instance group that automatically scales based on CPU usage. Usually 80-85% is a good value for a max CPU value. Check the link below for details.
https://cloud.google.com/compute/docs/autoscaler/scaling-cpu
You should also consider the availability needed for your workload to keep costs efficient. See below link for other useful info.
https://cloud.google.com/compute/docs/choose-compute-deployment-option

GCP Network Egress charges are growing exponentially

Our Network egress charges are growing month on month. Going by the cost, we are egressing upwards of 800GB in a month, to the tune of 300KB/s on avg (600Kb/s daytime and 200kb/s night time)
I analyzed all possible scripts which are sending out data. But none of them is sending out data at this volume. I turned them off one by one but it didn't make much difference.
I momentarily turned on VPC logs, downloaded and analyzed the logs.. it is all distributed across IPs.. about 300 different IPs in a min with average of 10-12kb so about 33Mb/min. there were no IPs which stood out.
I noticed most of them using port 443.
When I use nethogs to identify the process which is doing most of the egress.. it only gave Apache & only showed me 50Kb/s. Where is the rest of the egress??
I mooted the possibility of a DDoS attack but that should show up in the Apache access logs. Apache access logs do not show any suspicious IP/url.
Looking for hints/direction I should take. Apologize if I am missing to give any crucial detail for you to analyze the issue. I will keep adding more details to the question.
If you suspect the DDos attack happened already i would recommend to use the Cloud Armour but before that see if you have followed all the mitigations to avoid the DDos attack.
https://cloud.google.com/files/GCPDDoSprotection-04122016.pdf
What you're experiencing is most probably a DDOS attack just as sankar wrote.
According to you nothing stands out particurarily in the logs which makes DDOS theory more probable.
Using Cloud Armor seems the easiest way to protect your server/app out of the box without too much effort since one of it's key features is Adaptive Protection;
Google Cloud Armor Adaptive Protection helps you protect your Google
Cloud applications, websites, and services against L7 distributed
denial-of-service (DDoS) attacks such as HTTP floods and other
high-frequency layer 7 (application-level) malicious activity.
Adaptive Protection builds machine-learning models that do the
following:
Detect and alert on anomalous activity
Generate a signature describing the potential attack
Generate a custom Google Cloud Armor WAF rule to block the signature
This way you will be able to avoid most of that kind of attacks and save money. Even the fact that you pay for this feature should be beneficial to you in terms of money - not to speak that your server will be a lot more secure and you can focus more on other things.
---------- U P D A T E --------------
There may be one more reason.
A rootkit typically patches the kernel or other software libraries to alter the behavior of the operating system. Once this is happening, you cannot trust anything that the operating system tells you.
This way typical tools won't show the traffic or any suspicious processes.
Have a look at the list of tools that may be helpful to detect any rootkits.

AWS EC2 Performance explanation

I have a REST API web server, built in .NetCore, that has data heavy APIs.
This is hosted on AWS EC2, I have noticed that the average response time for certain APIs are ~4 seconds and if I turn up the AWS-EC2 specs, the response time goes down to a few milliseconds. I guess this is expected, what I don't understand is that even when I load test the APIs on a lower end CPU, the server never crosses 50% utilization of memory/CPU. So what is the correct technical explanation that makes the APIs perform faster if the lower end CPU never reaches a 100% utilization of memory/CPU?
There is no simple answer, there are so many ec2 variations you need to first figure out what is slowing down your API.
When you 'turn up' your ec2 instance, you are getting some combination of more memory, faster cpu, faster disk and more network bandwidth - and we can't tell which one of those 'more' features are improving your performance. Different instance classes ar optimized for different problems.
It could be as simple as the better network bandwidth, or it could be that your application is disk-bound and the better instance you chose is optimized for i/O performance.
Depending on what feature your instance is lacking, it would help you decide which type of instance to upgrade to - or as you have found out, just upgrade to something 'bigger' and be happy with the performance (at the tradeoff of being more expensive).

Cloud Redis latency causes (vs. local redis on macbook pro)

Redis can give sub millisecond response times. That's a great promise. I'm testing heroku redis and I get 1ms up to about 8ms, for a zincrby. I'm using microtime() in php to wrap the call. This heroku redis (I'm using the free plan) is a shared instance and there is resource contention so I expect response times for identical queries to vary, and they certainly do.
I'm curious as to the cause of the difference in performance vs. redis installed on my macbook pro via homebrew. There's obviously no network latency there. What I'm curious about is does this mean that any cloud redis (i.e. connecting over the network, say within aws), is always going to be quite a bit slower than if I were to have one cloud server and run a redis inside the same physical machine, thus eliminating network latency?
There is also resource contention in these cloud offerings, unless a private server is chosen which costs a lot more.
Some numbers: my local macbook pro consistently gives 0.2ms for the identical zincrby that takes between 1ms & 8ms on the heroku redis.
Is network latency the cause of this?
No, probably not.
The typical latency of a 1 Gbit/s network is about 200us. That's 0.2ms.
What's more, in aws you're probably on 10gbps at least.
As this page in the redis manual explains, the main cause of the latency variation between these two environments will almost certainly be a result of the higher intrinsic latency (there's a redis command to test this on any particular system: redis-cli --intrinsic-latency 100, see the manual page above) arising from being run in a linux container.
i.e., network latency is not the dominant cause of the variation seen here.
Here is a checklist (from redis manual page linked above).
If you can afford it, prefer a physical machine over a VM to host the server.
Do not systematically connect/disconnect to the server (especially true for web based applications). Keep your connections as long lived
as possible.
If your client is on the same host than the server, use Unix domain sockets.
Prefer to use aggregated commands (MSET/MGET), or commands with variadic parameters (if possible) over pipelining.
Prefer to use pipelining (if possible) over sequence of roundtrips.
Redis supports Lua server-side scripting to cover cases that are not suitable for raw pipelining (for instance when the result of a command
is an input for the following commands).

What does amazon AWS mean by "network performance"?

When choosing an amazon aws instance type to launch, there is a property of each type which is "Network Performance" which is either "Low", "Moderate", or "High".
I'm wondering what this exactly means. Will my ping be lower if I choose low? Or will it be ok as long as many users aren't logged in at once?
I'm launching a real time multiplayer game and I am so I am curious as to exactly what is meant under "network performance". I actually need fairly low memory and processing power, but instances with those criteria usually have "low" network performance.
Has anyone experience with the different network performances or have more information?
It's not official, but Serhiy Topchiy did a benchmark with different instance types:
http://epamcloud.blogspot.com.br/2013/03/testing-amazon-ec2-network-speed.html
For US-EAST-1, it seems that LOW corresponds to 50Mb/s, Moderate corresponds to 300Mb/s and High corresponds to 1Gb/s.
My recent experience here: https://serverfault.com/questions/1094608/benchmarking-aws-outbound-internet-bandwidth-egress-up-to-25-gbps
We ran a live video broadcast on two AWS EC2 servers, hosting 500 viewers, that degraded catastrophically after 10 minutes.
We reproduced the outbound bandwidth throttling with iperf (see link above).
I believe it was mentioned at the reInvent 2013 conference that the different properties are related to the underlying network connection: Some servers have 10GB connections (High) some have 1GB (Moderate) and some have 100MB (Low).
I cannot find any on-line documentation to confirm this, however.
Edit: There is an interesting article on Packet per second limit available here
Since this question was first posed, AWS has released more information on the networking stack, and many of the newer instance families can support up to 25Gbps with the appropriate ENA drivers. It looks like much of the increased performance is due to the new Nitro system.