Meaning of I/O requests on Amazon RDS - amazon-web-services

I really tryed but I can not find this information over internet. I'm using the AWS Free Usage Tier, I have one EC2 (t1.micro) instance with Windows Server 2008 and one RDS (t1.micro) instance. After one month of usage with a really small website (about 30 visits per day), it has just one contact form, about 6 inserts on the database per day, now the billings:
EC2:
2,000,000 I/O = free tier
1,118,431 I/O = $0.16 exceeded
**3,118,431 I/O TOTAL - I have no ideia why so many I/O
RDS:
10,000,000 I/O = free tier
123,715,372 I/O = $17.32 exceeded
**133,715,372 IO TOTAL - I have no ideia why so many I/O
I really want to know what one I/O means, I thought that one I/O = one request, but now I think that is not it. Somebody can clarify me?

Finally, after more than one month I figure out where was the problem:
I found a thread on AWS Forums about MS SQL dev DB continuously at 100%, an Amazon employee talked about "canceled queries that must is still running". Their answer:
"then your only option here is to try to Reboot your RDS
instance which should stop the query from being executed"
So, I did. Now my I/O requests was reduced on 90%.

Related

Seeing RDS Deadlock errors, could this be tied to IOPS limit

I'm seeing some errors on our AWS RDS MySQL server:
General error: 1205 Lock wait timeout exceeded; try restarting transaction
Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
Looked at the RDS console monitoring tab, and there seems read IOPS is cut off 1, perhaps indicating that the disk IO is not keeping up with the requests. Funny thing is that write IOPS does not seem to be cut off 2. In general there's very few app server requests that fail due to the database error, but would like to get this sorted.
CPU load on the RDS server peaks around 50%. This makes me think the db.t3.small RDS size is sufficient.
The database is tiny, just 20GB and was created some years ago, so it's on magnetic storage. Have read that this means there's a limit of 200 IOPS, which matches the approx 150 + 50 IOPS peaks seen. I am therefore thinking about moving to General Purpose SSD. However for the small db this will only provide 100 IOPS as baseline performance according to the docs, but according to the docs, a burst load of 3000IOPS is possible.
Does this sound like a good move, and any other suggestions on what to do?
I have been running with General Purpose SSD for a couple of days now. The MySQL deadlock errors have not been seen since, so in case someone else finds the question, change from Magnetic to General Purpose SSD in RDS is certainly something to try out if you have similar problems.

Load Testing SQL Alchemy: "TimeoutError: QueuePool limit of size 3 overflow 0 reached, connection timed out, timeout 30"

I have a SQL-Alchemy based web-application that is running in AWS.
The webapp has several c3.2xlarge EC2 instances (8 CPUs each) behind an ELB which take web requests and then query/write to the shared database.
The Database I'm using is and RDS instance of type: db.m4.4xlarge.
It is running MariaDB 10.0.17
My SQL Alchemy settings are as follows:
SQLALCHEMY_POOL_SIZE = 3
SQLALCHEMY_MAX_OVERFLOW = 0
Under heavy load, my application starts throwing the following errors:
TimeoutError: QueuePool limit of size 3 overflow 0 reached, connection timed out, timeout 30
When I increase the SQLALCHEMY_POOL_SIZE from 3 to 20, the error goes away for the same load-test. Here are my questions:
How many total simultaneous connections can my DB handle?
Is it fair to assume that Number of Number of EC2 instances * Number of Cores Per instance * SQLALCHEMY_POOL_SIZE can go up to but cannot exceed the answer to Question #1?
Do I need to know any other constraints regarding DB connection pool
sizes for a distributed web-app like mine?
MySQL can handle virtually any number of "simultaneous" connections. But if more than a few dozen are actively running queries, there may be trouble.
Without knowing what your queries are doing, one cannot say whether 3 is a limit or 300.
I recommend you turn on the slowlog to gather information on which queries are the hogs. A well-tuned web app can easily survive 99% of the time on 3 connections.
The other 1% -- well, there can be spikes. Because of this, 3 is unreasonably low.

How does Amazon Tier/Instance time works?

I want to get a free tier + cheap medium instance which I was told that I can get cheaply for $5 and I can use it only for 100 hours. Here's what I'm confused about, does that 100 hours counts from the time when the instance start working and responding or where I only use it, for example I use it for 2 hours today and a week later I have 98 hours left to use ?
Kind Regards
When you start a server, you are "using" it until you shut it down. You are charged for the entire time the server is running (in 1 hour increments). Just because you aren't logged into the server and actively using it is irrelevant in regards to billing. As long as the server exists it is using resources and you will be billed for those resources.

Amazon Elasticache Failover

We have been using AWS Elasticache for about 6 months now without any issues. Every night we have a Java app that runs which will flush DB 0 of our redis cache and then repopulate it with updated data. However we had 3 instances between July 31 and August 5 where our DB was successfully flushed and then we were not able to write the new data to the database.
We were getting the following exception in our application:
redis.clients.jedis.exceptions.JedisDataException:
redis.clients.jedis.exceptions.JedisDataException: READONLY You can't
write against a read only slave.
When we look at the cache events in Elasticache we can see
Failover from master node prod-redis-001 to replica node
prod-redis-002 completed
We have not been able to diagnose the issue and since the app was running fine for the past 6 months I am wondering if it is something related to a recent Elasticache release that was done on the 30th of June.
https://aws.amazon.com/releasenotes/Amazon-ElastiCache
We have always been writing to our master node and we only have 1 replica node.
If someone could offer any insight it would be much appreciated.
EDIT: This seems to be an intermittent problem. Some days it will fail other days it runs fine.
We have been in contact with AWS support for the past few weeks and this is what we have found.
Most Redis requests are synchronous including the flush so it will block all other requests. In our case we are actually flushing 19m keys and it takes more then 30 seconds.
Elasticache performs a health check periodically and since the flush is running the health check will be blocked, thus causing a failover.
We have been asking the support team how often the health check is performed so we can get an idea of why our flush is only causing a failover 3-4 times a week. The best answer we can get is "We think its every 30 seconds". However our flush consistently takes more then 30 seconds and doesn't consistently fail.
They said that they may implement the ability to configure the timing of the health check however they said this would not be done anytime soon.
The best advice they could give us is:
1) Create a completely new cluster for loading the new data on, and
instead of flushing the previous cluster, re-point your application(s)
to the new cluster, and remove the old one.
2) If the data that you are flushing is an update version of the data,
consider not flushing, but updating and overwriting new keys?
3) Instead of flushing the data, set the expiry of the items to be
when you would normally flush, and let the keys be reclaimed (possibly
with a random time to avoid thundering herd issues), and then reload
the data.
Hope this helps :)
Currently for Redis versions from 6.2 AWS ElastiCache has a new feature of thread monitoring. So the health check doesn't happen in the same thread as all other actions of Redis. Redis can continue to proceed a long command / lua script, but will still considered healthy. Because of this new feature failovers should happen less.

Amazon EC2 EBS i/o costs

I am hosting my application on amazon ec2 , on one of their micro linux instances.
It costs (apart from other costs) $0.11 per 1 million I/O requests . I was wondering how much I/O requests does it take when I have say 1000 users using it for say 1 hours per day for 1 month ?
I guess my main concern is : if a hacker keeps hitting my login page (simple html) , will it increase the I/O request count ? I guess yes, as every time the server needs to do something to server that page.
There are a lot of factors that will impact your IO requests, as #datasage says, try it and see how it behaves under your scenario. Micro Linux instances are incredible cheap to begin with, but if you are really concerned, setup a billing alert that will notify you when your usage passes a pre-determined threshold - if it suddenly spikes up, you can take some action to shut it down if that is what you want.
https://portal.aws.amazon.com/gp/aws/developer/account?ie=UTF8&action=billing-alerts
Take a look at CloudWatch, and (for free) set up a VolumeWriteOps and VolumeReadOps alarm to work with Amazon Simple Notification Service (SNS) to send you a text message and eMail notice right away if things get too busy, before the bill gets high! (A billing alert will let you know too late - after it has reached the threshold.)
In general though, from my experience, you will not have the problem you outline. Scan the EC2 Discussion Forum at forums.aws.amazon.com where you would find evidence of this kind of problem if were prevalent; it does not seem to be happening.
#Dilpa yes you are right. If some brute force attack will occur to your website eg: somebody hitting to your loginn page then it will increase the server I/O if you enable loging for your webserver. Webserver will keep log to it's log files of every event and that will increase your I/O. Just verify your webserver log for such kind of attack and you can prevent them.