I'm exploring moving an application built on top of MySQL into Spanner and am not sure if I can replicate certain functionality from our MySQL db.
basically a simplified version of our mysql schema would look like this
users idnamebalanceuser_transactionsiduser_idexternal_idamountuser_locksuser_iddate
when the application receives a transaction for a user the app starts a mysql transaction, updates the user_lock for that user, checks if the user has sufficient balance for the transaction, creates a new transaction, and then updates the balance. It is possible the application receive transactions for a user at the same time and so the lock forces them to be sequential.
Is it possible to replicate this in Spanner? How would I do so? Basically If the application receives two transactions at the same time I want to ensure that they are given an order and that the changed data from the first transaction is propagated to the second transaction.
Cloud Spanner would do this by default since it provides serializability which means that all transactions appear to have occurred in serial order. You can read more about the transaction semantics here:
https://cloud.google.com/spanner/docs/transactions#rw_transaction_semantics
Related
I have an application that needs to update the UI with the results of an Amplify Datastore query. I am making the query as soon as the component mounts/renders, but the results of the query are empty even though I know there is available data. If I add a timeout of 1 second or greater before making the query, then the query returns the expected data. My hunch is that this is because the query is returning an empty set of data before the response from the delta sync table, which shows there is data to be fetched, is returned.
Is there any type of event provided by Datastore that would allow me to wait until the data store is initialized or has data to query before making the query?
I understand that I could use the .observe functionality of datastore for a similar effect, but this is currently not an option.
First, if you do not use the Datastore start method then sync from the backend starts when the first query is submitted. Queries are run against the local store so data won't be there yet.
Second, Datastore publishes events on the amplify hub so that you can monitor changes, such as a set of data being synced, Datastore being ready and even Datastore being ready and all data synced locally.
See the documentation on Datastore.start
and the documentation for Datastore events for more information.
I'm not sure how to achieve consistent read across multiple SELECT queries.
I need to run several SELECT queries and to make sure that between them, no UPDATE, DELETE or CREATE has altered the overall consistency. The best case for me would be something non blocking of course.
I'm using MySQL 5.6 with InnoDB and default REPEATABLE READ isolation level.
The problem is when I'm using RDS DataService beginTransaction with several executeStatement (with the provided transactionId). I'm NOT getting the full result at the end when calling commitTransaction.
The commitTransaction only provides me with a { transactionStatus: 'Transaction Committed' }..
I don't understand, isn't the commit transaction fonction supposed to give me the whole (of my many SELECT) dataset result?
Instead, even with a transactionId, each executeStatement is returning me individual result... This behaviour is obviously NOT consistent..
With SELECTs in one transaction with REPEATABLE READ you should see same data and don't see any changes made by other transactions. Yes, data can be modified by other transactions, but while in a transaction you operate on a view and can't see the changes. So it is consistent.
To make sure that no data is actually changed between selects the only way is to lock tables / rows, i.e. with SELECT FOR UPDATE - but it should not be the case.
Transactions should be short / fast and locking tables / preventing updates while some long-running chain of selects runs is obviously not an option.
Issued queries against the database run at the time they are issued. The result of queries will stay uncommitted until commit. Query may be blocked if it targets resource another transaction has acquired lock for. Query may fail if another transaction modified resource resulting in conflict.
Transaction isolation affects how effects of this and other transactions happening at the same moment should be handled. Wikipedia
With isolation level REPEATABLE READ (which btw Aurora Replicas for Aurora MySQL always use for operations on InnoDB tables) you operate on read view of database and see only data committed before BEGIN of transaction.
This means that SELECTs in one transaction will see the same data, even if changes were made by other transactions.
By comparison, with transaction isolation level READ COMMITTED subsequent selects in one transaction may see different data - that was committed in between them by other transactions.
Let's say that I have several AWS Lambda functions that make up my API. One of the functions reads a specific value from a specific key on a single Redis node. The business logic goes as follows:
if the key exists:
serve the value of that key to the client
if the key does not exist:
get the most recent item from dynamoDB
insert that item as the value for that key, and set an expiration time
delete that item from dynamoDB, so that it only gets read into memory once
Serve the value of that key to the client
The idea is that every time a client makes a request, they get the value they need. If the key has expired, then lambda needs to first get the item from the database and put it back into Redis.
But what happens if 2 clients make an API call to lambda simultaneously? Will both lambda processes read that there is no key, and both will take an item from a database?
My goal is to implement a queue where a certain item lives in memory for only X amount of time, and as soon as that item expires, the next item should be pulled from the database, and when it is pulled, it should also be deleted so that it won't be pulled again.
I'm trying to see if there's a way to do this without having a separate EC2 process that's just keeping track of timing.
Is redis+lambda+dynamoDB a good setup for what I'm trying to accomplish, or are there better ways?
A Redis server will execute commands (or transactions, or scripts) atomically. But a sequence of operations involving separate services (e.g. Redis and DynamoDB) will not be atomic.
One approach is to make them atomic by adding some kind of lock around your business logic. This can be done with Redis, for example.
However, that's a costly and rather cumbersome solution, so if possible it's better to simply design your business logic to be resilient in the face of concurrent operations. To do that you have to look at the steps and imagine what can happen if multiple clients are running at the same time.
In your case, the flaw I can see is that two values can be read and deleted from DynamoDB, one writing over the other in Redis. That can be avoided by using Redis's SETNX (SET if Not eXists) command. Something like this:
GET the key from Redis
If the value exists:
Serve the value to the client
If the value does not exist:
Get the most recent item from DynamoDB
Insert that item into Redis with SETNX
If the key already exists, go back to step 1
Set an expiration time with EXPIRE
Delete that item from DynamoDB
Serve the value to the client
We have sysdig running on our WSO2 API gateway machine and we notice that it fires a large number of SQL queries to the database for a minute, than waits a minute and repeats.
The query looks like this:
Every minute it goes wild, waits for a minute and goes wild again with a request of the following format:
SELECT REG_PATH, REG_USER_ID, REG_LOGGED_TIME, REG_ACTION, REG_ACTION_DATA
FROM REG_LOG
WHERE REG_LOGGED_TIME>'2016-02-29 09:57:54'
AND REG_LOGGED_TIME<'2016-03-02 11:43:59.959' AND REG_TENANT_ID=-1234
There is no load on the server. What is causing this? What can we do to avoid this?
screen shot sysdig api gateway process
This particular query is the result of the registry indexing task that runs in the background. The REG_LOG table is being queried periodically to retrieve the latest registry actions. The indexing task cannot be stopped. However, one can configure the frequency of the indexing task through the following parameter that is in the registry.xml. See [1] for more information.
indexingFrequencyInSeconds
If this table is filled up, one can clean the data using a simple SQL query. However, when deleting the records, one must be careful not to delete all the data. The latest records of each resource path should be left in the REG_LOG table since reindexing of data requires at least one reference of each resource path.
Also, if required, before clearing up the REG_LOG table, you can take a dump of the data in case you do not want to loose old records. Hope this answer provides information you require.
[1] - https://docs.wso2.com/display/Governance510/Configuration+for+Indexing
I work with SQL Server database with ODBC, C++. I want to detect modifications in some tables of the database: another application inserts or updates rows and I have to detect all these modifications. It does not have to be the immediate trigger, it is acceptable to use polling to periodically check database tables for modifications.
Below is the way I think this can be done, and need your opinions whether this is the standard/right way of doing this, or any better approaches exist.
What I've thought of is this: I add triggers in SQL Server, which, on any modification, will insert the identifiers of modified/added rows into special table, which I will check periodically from my application. Suppose there are 3 tables: Customers, Products, Services. i will make three additional tables: Change_Customers, Change_Products, Change_Services, and will insert the identifiers of modified rows of the respective tables. Then I will read these Change_* tables from my application periodically and delete processed records.
Now if you agree that above solution is right, I have another question: Is it better to have separate Change_* tables for each of my tables I wish to monitor, or is it better to have one fat Changes table which will contain the changes from all tables.
Query Notifications is the technology designed to do exactly what you're describing. You can leverage Query Notifications from managed clients via the well known SqlDependency class, but there are native Ole DB and ODBC ways too. See Working with Query Notifications, the paragraphs about SSPROP_QP_NOTIFICATION_MSGTEXT (OleDB) and SQL_SOPT_SS_QUERYNOTIFICATION_MSGTEXT (ODBC). See The Mysterious Notification for an explanation how Query Notifications work.
This is the only polling-free solution that work with any kind of updates. Triggers and polling for changes has severe scalability and performance issues. Change Data Capture and Change Tracking are really covering a different topic (synchronizing datasets for occasionally connected devices, eg. Sync Framework).
Change Data Capture(CDC)--http://msdn.microsoft.com/en-us/library/cc645937.aspx
First you will need to enable CDC in database
::
USE db_name
GO
EXEC sys.sp_cdc_enable_db
GO
Enable CDC on table then
:: sys.sp_cdc_enable_table
Then you can query changes
If your version of Sql Server is 2005 - you may use Notification Services
If your Sql Server is 2008+ - there is most preferrable way to use triggers and log changes to log tables and periodically poll these tables from application to see the changes