I can not update when I use replication? - google-cloud-platform

In the Cloud SQL documentation, specifically on this link, "https://cloud.google.com/sql/docs/mysql/features", mention the following:
unsupported statements for second generation instances
The following statements are not compatible because the second-generation instances use GTID replication:
statements CREATE TABLE ... SELECT
CREATE TEMPORARY TABLE statements within transactions
Transactions or statements that update transactional and non-transactional tables
I often make updates to my data, so, that would be a problem for my solution.
In the google cloud documentation, in the same link mentioned above, it refers to the official documentation of mysql, which mentions that the restriction of updates are only with non-transactional engines.
Maybe I'm getting confused, but I really want to know if I can update my data or not. Sorry for the lousyness of English, it is obvious to say that I do not master it.

The way I interpret this line: Transactions or statements that update both transactional and nontransactional tables is that transactions or statements that update BOTH transactional AND nontransactional tables at the same time, is not supported. BOTH being the key word.
I would like to think that you can still update transactional tables, and in a separately operation, update non-transactional tables.

Related

DynamoDB Indexing Assistance and Getting My Data Out

I preface all of this to say I’m still actively learning DynamoDB, and I think an answer to my question will help me understand a few things.
I have an analytics microservice that I’m pushing custom (internal) analytics events into a DynamoDB table. Columns in our Dynamo rows/items include data like:
User ID
IP Address
Event Action
Timestamp
Split Test ID
Split Test Value
One of the main questions we want to pull from this db is:
"How many users saw split test x with values y?"
I’m struggling to understand how I should index my database to account for this kind of requests? I set up a “Keys Only” index targeting Split Test ID, and the query to gather these are fairly efficient, but it only pulls UserID and Split Test ID. Ideally I want an efficient query that returns multiple other associated values as well…
How do I achieve this? Do I need to be doing something much differently? Additionally, if any of my understanding of Dynamo, based on my explanations, sounds completely lacking in some regard, please point me in the right direction!
You're thinking of DynamoDB as a schema-less database, which it obviously is. However, that does not mean that a schema is not important. Schemas in NoSQL databases are usually more important than they are in SQL databases and they are usually less straightforward.
The most important thing to determine how you will store your data is how you will access it. You will have to take into account all the ways that you will want to access your data and ensure it is possible by creating the necessary data columns and necessary indexes. In this case, if you want to know how many times two values are combined in a certain way, you could easily add a column that has these combined values (e.g., splitId#splitValue ) and use that in your indexes.
If you want to know more about advanced patterns and such, I advise you to watch this pretty famous re:invent talk by Rick Houlihan or to read the DynamoDB book.
As a last note, I want to add that switching to a SQL server usually is not the solution. Picking NoSQL over SQL is usually based on non-functional requirements. There is a reason NoSQL databases are used in applications that require very low-latency retrieval of data in huge datasets, but as with everything, trade-offs are the name of the game.

Automating dynamodb scripts

Like we used to do with rdbms sql scripts. I wanted to do a similar thing with my dynamodb table.
Currently its very difficult to track changes from environment to environment(dev - qa -prod). We are directly making changes via the console.
What I want to do is, Keep the table data/json in the git version control and whenever any dev makes a change, we should be able to just run a script that will be able migrate the respective changes to on the dynamodb table eg. update/create/delete the tables, add/remove/update the records.
But I am not able to find a proper way/guide to achieve this currently. I am using javascript/nodejs as our base language.
Any help regarding this scenario will be appriciable.
Thanks
ref : https://forums.aws.amazon.com/thread.jspa?threadID=342538
As far as I can tell you described to separate issues:
Changing the tables "structure"
Updating records after the update
Before I go into my answer, remember that DynamoDB is a NoSQL database and your previous RDBMS was a relational database. Operational tasks can differ very much for both types of databases.
1. Automating changes to the "structure" of the table
For this you can check out Infrastructure as Code tools like Terraform, CloudFormation or Pulumi.
But since DynamoDB is a NoSQL database, you only can do a few things like setting your hash and sort key etc and defining indices. Adding "fields" to the DynamoDB is not done with those tools, because except for the hash and sort key, there are no fields. Everything else does not follow a explicit (sql) schema.
2. Updating records after an update
If you do not have a lot of records, you could write yourself a simple tool or script to do the relevant work using the AWS SDK and run that during your CI/CD pipeline. A simple approach would be to have a "migrations" folder and if there is a file in it, the pipeline will execute it. So after the migration is done, just remove the file again. Not great, but pragmatic.
If you have a lot of records this won't work that great anymore, at least if you want to have a downtime-less deployment. In that case you will have update your software to be able to work with the old and new versions of the records structure, while you gradually update all records in the background (using a script etc.). Once all the records are updated, you can remove the code paths that handle the old structure.

GCP CLOUD SQL denies permission for pre aggregation

I am trying to use pre aggregations over CLOUD SQL on Google Cloud Platform but the database is denying access and giving error Statement violates GTID consistency.
Any help is appreciated.
Cube.js done pre-aggregation by CREATE TABLE ... SELECT, but you are using MySQL on top of Google SQL with --enforce-gtid-consistency (has limitations).
Since only transactionally safe statements can be logged, there is a limitation to use CREATE TABLE ... SELECT (and some another SQL), because this statement is actually logged as two separate events.
There are two ways how to solve this issue:
1. Use pre-aggregations to an external database. (recommended way).
https://cube.dev/docs/pre-aggregations/#read-only-data-source-pre-aggregations
2. Use not documented flag loadPreAggregationWithoutMetaLock
Attention: This flag is an experimental and can be removed or changed in the feature..
Take a look at the source code
You can pass it directly in the driver constructor. This will produce two SQL statements to pass the limitation:
CREATE TABLE
INSERT INTO
Thanks

How to do "Select COUNT(*)" in DynamoDB from the AWS management console or any other 3rd party GUI?

Hi Im trying to query some table in DynamoDB. However from what I read I can only do it using some code or form the CLI. Is there a way to do complex queries from the GUI? I tried playing with it but can't seem to figure out how to do a simple COUNT(*). Please help.
Go to DynamoDB Console;
Select the table that you want to count
Go to "overview" page/tab
In table properties, click on Manage Live Count
Click Start Scan
This will give you the count of items of the table at that moment. Just be warned that this count is eventually consistent; what means that if someone is performing changes in the table at that exact moment your end result will not be exact (but probably very close to reality).
Digressing a little bit (only in case you're new to DynamoDB):
DynamoDB is a NoSQL database. It doesn't support the same commands that are common in SQL databases. Mainly because it doesn't support the same consistency model provided by SQL databases.
In SQL databases, when you send a count(*) query your RDMS make some very educated guesses and take some short paths to discover the number of lines in the table. It does that because reading your entire table to give you this answer would take too much time.
DynamoDB doesn't have means to make these educated guesses. When you want to know how many items one table have the only option it has is to read all of them counting one by one. That is the exact task that the command mentioned in the beginning of this answer does. It scans the entire table counting all the items one by one.
Because of that, when you perform this task it will bill you the entire table read (DynamoDB bills you per reads and writes). And maybe after you started the scan someone put another item in the the table while you are still counting. In that case it will not restart the count because by design DynamoDB is eventually consistent.

How to monitor database updates from application?

I work with SQL Server database with ODBC, C++. I want to detect modifications in some tables of the database: another application inserts or updates rows and I have to detect all these modifications. It does not have to be the immediate trigger, it is acceptable to use polling to periodically check database tables for modifications.
Below is the way I think this can be done, and need your opinions whether this is the standard/right way of doing this, or any better approaches exist.
What I've thought of is this: I add triggers in SQL Server, which, on any modification, will insert the identifiers of modified/added rows into special table, which I will check periodically from my application. Suppose there are 3 tables: Customers, Products, Services. i will make three additional tables: Change_Customers, Change_Products, Change_Services, and will insert the identifiers of modified rows of the respective tables. Then I will read these Change_* tables from my application periodically and delete processed records.
Now if you agree that above solution is right, I have another question: Is it better to have separate Change_* tables for each of my tables I wish to monitor, or is it better to have one fat Changes table which will contain the changes from all tables.
Query Notifications is the technology designed to do exactly what you're describing. You can leverage Query Notifications from managed clients via the well known SqlDependency class, but there are native Ole DB and ODBC ways too. See Working with Query Notifications, the paragraphs about SSPROP_QP_NOTIFICATION_MSGTEXT (OleDB) and SQL_SOPT_SS_QUERYNOTIFICATION_MSGTEXT (ODBC). See The Mysterious Notification for an explanation how Query Notifications work.
This is the only polling-free solution that work with any kind of updates. Triggers and polling for changes has severe scalability and performance issues. Change Data Capture and Change Tracking are really covering a different topic (synchronizing datasets for occasionally connected devices, eg. Sync Framework).
Change Data Capture(CDC)--http://msdn.microsoft.com/en-us/library/cc645937.aspx
First you will need to enable CDC in database
::
USE db_name
GO
EXEC sys.sp_cdc_enable_db
GO
Enable CDC on table then
:: sys.sp_cdc_enable_table
Then you can query changes
If your version of Sql Server is 2005 - you may use Notification Services
If your Sql Server is 2008+ - there is most preferrable way to use triggers and log changes to log tables and periodically poll these tables from application to see the changes