Does below Dynamodb scenario valid? - amazon-web-services

Can we get exceptions because of network failures While writing the data to dynamodb post data written to the tables ?

Yes. This can happen:
You post the data to dynamo successfully
Dynamo successfully writes the data
Dynamo servers return a 200 OK
A network error causes the TCP connection to time out or fail in some other way before the data makes it back from the dynamo servers to your client
In this case, the data exists happily on the server, but your code was never notified about it.
See the Two Generals' Problem: https://en.wikipedia.org/wiki/Two_Generals%27_Problem

Related

How should I handle asynchronous processes that occur after API calls in AWS?

I'm designing the backend for a website that uses API Gateway and Lambda to handle API requests, many of which target a MySQL DB on RDS. Some processes need to happen asynchronously but I'm debating which is best practice or cleaner.
In the given scenario, every time a user creates a new row in a certain table, let's say an email also needs to be sent asynchronously. There are many other scenarios similar to this but this will set precedent.
Option 1: In the lambda that handles the API request, first write to the MySQL instance to add the new row. When the response from MySQL comes back successful, write to something like SQS which will later be read from another lambda that sends an email. When the response from SQS is successful that the record was added to the queue, send a 201 response saying the REST API call was successful.
Option 2: In the lambda that handles the API request, write to the MySQL instance to add the new row. When the response from the MySQL comes back successful, send a 201 response saying the REST API call was successful. Then set up a DMS (data migration service) task that runs indefinitely to send database modification binlogs to a kinesis stream which will trigger a lambda that will handle all DB changes, read the change as a new row in a certain table, and send an email.
Option 1:
less infrastructure
more direct tracking of logic from an API call
1 extra http call (to sqs) delaying response times for an api for a web page
Option 2:
more infrastructure (dms task, replication instance)
scaling out shards may mean loss of ordering when processes binlog events if ordering is a requirement (it is)
side question: Are you able to choose hash key for kinesis for dms tasks from mysql?
a single codebase for reacting to all modifications in the DB may actually make following logic in code simpler
Is this the tradeoff or am I missing something? What is best practice in this scenario?
Option 1 in my view seems most logical, but I would replace SQS and second lambda with SNS. So, modified option 1 could be:
Option 1: In the lambda that handles the API request, first write to the MySQL instance to add the new row. When the response from MySQL comes back successful, publish confirmation message to SNS that sends an email. When the response from SNS is successful send a 201 response saying the REST API call was successful.
This should be faster, cheaper and easier to implement then using SQS and second lambda for sending email.

mysql lost connection error

Currently, I am working on a project to integrate mysql with the IOCP server to collect sensor data and verify the collected data from the client.
However, there is a situation where mysql misses a connection.
The query itself is a simple query that inserts a single row of records or gets the average value between date intervals.
The data of each sensor flows into the DB at the same time every 5 seconds. When the messages of the sensors come on occasionally or overlap with the message of the client, the connection is disconnected.
lost connection to mysql server during query
In relation to throwing the above message
max_allowed_packet Numbers changed.
interactive_timeout, net_read_timeout, net_write_timeout, wait_timeout
It seems that if there are overlapping queries, an error occurs.
Please let me know if you know the solution.
I had a similar issue in a MySQL server with very simple queries where the number of concurrent queries were high. I had to disable the query cache to solve the issue. You could try disabling the query cache using following statements.
SET GLOBAL query_cache_size = 0;
SET GLOBAL query_cache_type = 0;
Please note that a server restart will enable the query cache again. Please put the configuration in MySQL configuration file if you need to have it preserved.
Can you run below command and check the current timeouts?
SHOW VARIABLES LIKE '%timeout';
You can change the timeout, if needed -
SET GLOBAL <timeout_variable>=<value>;

What is the state of a new object in DynamoDB immediately after the client application is returned a 200 OK?

I am trying to learn how writes/updates work internally in DynamoDB. This is what I could find.
AWS Tutorial Link
"When your application writes data to a DynamoDB table and receives an HTTP 200 response (OK), all copies of the data are updated. The data will eventually be consistent across all storage locations, usually within one second or less."
For ex: If my DynamoDB has 50 partitions and it is replicated across 3 availability zones in a region, what happens in DynamoDB
After it receives an API request to create an item
After it sends the 200 OK response to the client
I would really appreciate any document that talks about this or hear from you directly.
Thanks
Dynamodb as per this replicates its data in 3 availability zones within the region.
So the question is how it manages the availability of the data.
Assume there is one receiver which will receive the request from the users.
The receiver for write request will have m/n value for consistency of data.
n is the number of availability zones
m would be ((n+1)/2) to maintain consistency.
In this case, it is 2/3.
Now when a receiver receives any request it will send the command to write data to all 3 zones but will wait for 2 zones to respond. When 2 of the zones has written the value the receiver will send 200 OK to user without waiting for zone 3 to respond.
Let say that user now immediately want to retrieve the data which was written.
For read request the receiver will use 1/(number of availability zones), In this case it is 1/3.
So receiver will request all the zone for data, Let say that zone A respond, This respond will be immediately sent to user.
Assuming 2/3 write request the data is stored in Zone A and Zone B currently, Zone C is still not updated.
Now when we read data if Zone A or B respond then we will have the value if Zone C respond then it will result in not found, this is the reason AWS say dynamoDB is eventual consistent.
When we query data with strongly consistent read, the value change to 2/3 which will make sure that updated value will be sent to user because at a time 2 availability zone will have the newest value
Note: This is just a simplified explanation and I am not associated with Amazon, they might be using some other things behind the scene.
Hope that helps

DynamoDB - is there a need to call shutdown()?

Considering this code:
QuerySpec spec = new QuerySpec()
.withKeyConditionExpression("#1 = :v1")
.withNameMap(new NameMap().with("#1", "tableKey"))
.withValueMap(new ValueMap().withString(":v1", "none.json"));
//connect DynamoDB instance over AWS
DynamoDB dynamoDB = new DynamoDB(Regions.US_WEST_2);
//get the table instance
String tableName = "WFMHistoricalProcessedFiles";
Table table = dynamoDB.getTable(tableName);
ItemCollection<QueryOutcome> items = table.query(spec);
//getting over the results
Iterator<Item> it = items.iterator();
Item item = null;
while (it.hasNext()) {
item = it.next();
System.out.println(item.toJSONPretty());
}
While using DynamoDB to make any Query or a Scan like in the example above.
Is there an actual need to call shutdown() in order to close the connection?
The documentation seems pretty clear.
shutdown
void shutdown()
Shuts down this client object, releasing any resources that might be held open. This is an optional method, and callers are not expected to call it, but can if they want to explicitly release any open resources. Once a client has been shutdown, it should not be used to make any more requests.
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/AmazonDynamoDB.html#shutdown--
But to clarify what you specifically asked about:
in order to close the connection?
There is not exactly a "connection" to DynamoDB. It's accessed over HTTPS, statelessly, when requests are sent... so where your code says // connect DynamoDB instance over AWS, that really isn't accurate. You're constructing an object that will not acually connect until around the time you call table.query().
That connection might later be kept-alive for a short time for reuse, but even if true, it isn't "connected to" DynamoDB in any meaningful sense. At best, it's connected to a front-end system inside AWS that is watching for the next request and will potentially forward that request to DynamoDB if it's syntactically valid and authorized.
But this idle connection, if it exists, isn't consuming any DynamoDB resources in a way that should degrade performance of your application or others accessing the same DynamoDB table.
Good practice, of course, suggests that if you have the option of cleaning something up, it's potentially a good idea to do so, but it seems clearly optional.

How to survive a database outage?

I have a web service that is made using spring, hibernate and c3p0. I also have a service wide cache(which has the results of requests ever made to the service) which can be used to return results when the service isn't able to return(due to whatever reason). The cache might return stale results when the database is out but that's ok.
I recently faced a database outage and my service came to a crashing halt.
I want the clients of my service to survive database outages happening ever again in future.
For that, I need my service to:
Handle new incoming requests like this: quickly say that the database is down and throw some exception(fast-fail).
Requests already being processed: Don't last longer than x seconds. How do I make the thread handling the request be interrupted somehow.
Cache the whole database in memory for read-only purposes(Is this insane?).
There are some observations that I made:
If there is one or more connection(s) with status ESTABLISHED, then an attempt to checkout a new connection is not made. Seems like any one connection with status ESTABLISED is handed over to the thread receiving the request. Now, this thread just hangs till the time the database comes back up.
I would want to make this request fast-fail by knowing before handling over a connection to a thread whether db is up or not. If no, the service should throw exception instead of hanging up.
If there's no connection with status ESTABLISHED, then the request fails in 10 secs with the exception that "Could not checkout a new connection". This is due to my checkout timeout being set for 10s.
If the service was processing some request, now the db goes and then the service makes a call to db, the thread making the call to db gets stuck forever. It resumes execution only after the db comes back.
I would like to interrupt the thread after say x seconds whether or not it was able to complete the request.
Are there ways to accomplish what I seek?
Thanks in advance.