mysqldump concurrency

mysqldump concurrency - concurrency

if I started mysqldump on a database, and then created a new table with new data, will this table be dumped? what's the concurrency behavior here?

Well, that is not sure, from Mysql Manual:
--single-transaction
This option sends a START TRANSACTION
SQL statement to the server before
dumping data. It is useful only with
transactional tables such as InnoDB
and BDB, because then it dumps the
consistent state of the database at
the time when BEGIN was issued without
blocking any applications.
When using this option, you should
keep in mind that only InnoDB tables
are dumped in a consistent state. For
example, any MyISAM or MEMORY tables
dumped while using this option may
still change state.
While a --single-transaction dump is
in process, to ensure a valid dump
file (correct table contents and
binary log coordinates), no other
connection should use the following
statements: ALTER TABLE, CREATE TABLE,
DROP TABLE, RENAME TABLE, TRUNCATE
TABLE. A consistent read is not
isolated from those statements, so use
of them on a table to be dumped can
cause the SELECT that is performed by
mysqldump to retrieve the table
contents to obtain incorrect contents
or fail.
The --single-transaction option and
the --lock-tables option are mutually
exclusive because LOCK TABLES causes
any pending transactions to be
committed implicitly.
This option is not supported for MySQL
Cluster tables; the results cannot be
guaranteed to be consistent due to the
fact that the NDBCLUSTER storage
engine supports only the
READ_COMMITTED transaction isolation
level. You should always use NDB
backup and restore instead.
To dump large tables, you should
combine the --single-transaction
option with --quick.
If you want to backup/move your live DB, you should consider MySQL replication

Related

Does a CREATE OR REPLACE statement in BigQuery create a downtime for the table?

Say table sample exists. We then run a CREATE OR REPLACE TABLE sample AS (...). Will that table be accessible throughout the process? For example while we perform the CREATE OR REPLACE statement and we also have another job querying the table. Is there a chance that we get a 404 Not found exception?

No, you won't. The write is atomic and immediately available, no downtime. However, all the running queries that already read the data and continue to process them won't have the new data (because they live in memory and are no longer read from the BigQuery storage)

How to achieve consistent read across multiple SELECT using AWS RDS DataService (Aurora Serverless)

I'm not sure how to achieve consistent read across multiple SELECT queries.
I need to run several SELECT queries and to make sure that between them, no UPDATE, DELETE or CREATE has altered the overall consistency. The best case for me would be something non blocking of course.
I'm using MySQL 5.6 with InnoDB and default REPEATABLE READ isolation level.
The problem is when I'm using RDS DataService beginTransaction with several executeStatement (with the provided transactionId). I'm NOT getting the full result at the end when calling commitTransaction.
The commitTransaction only provides me with a { transactionStatus: 'Transaction Committed' }..
I don't understand, isn't the commit transaction fonction supposed to give me the whole (of my many SELECT) dataset result?
Instead, even with a transactionId, each executeStatement is returning me individual result... This behaviour is obviously NOT consistent..

With SELECTs in one transaction with REPEATABLE READ you should see same data and don't see any changes made by other transactions. Yes, data can be modified by other transactions, but while in a transaction you operate on a view and can't see the changes. So it is consistent.
To make sure that no data is actually changed between selects the only way is to lock tables / rows, i.e. with SELECT FOR UPDATE - but it should not be the case.
Transactions should be short / fast and locking tables / preventing updates while some long-running chain of selects runs is obviously not an option.
Issued queries against the database run at the time they are issued. The result of queries will stay uncommitted until commit. Query may be blocked if it targets resource another transaction has acquired lock for. Query may fail if another transaction modified resource resulting in conflict.
Transaction isolation affects how effects of this and other transactions happening at the same moment should be handled. Wikipedia
With isolation level REPEATABLE READ (which btw Aurora Replicas for Aurora MySQL always use for operations on InnoDB tables) you operate on read view of database and see only data committed before BEGIN of transaction.
This means that SELECTs in one transaction will see the same data, even if changes were made by other transactions.
By comparison, with transaction isolation level READ COMMITTED subsequent selects in one transaction may see different data - that was committed in between them by other transactions.

Push from one sql server to another autonomously

I have an application that requires me to pull certain information from DB#1 and push it to DB#2 every time a certain entry in a table from DB#1 is updated. The polling rate doesn't need to be extremely fast, but it probably shouldn't be any slower than 1 second.
I was planning on writing a small service using the C++ Connector library, but I am worried about putting too much load on DB#1. Is there a more efficient way of doing this? Such as built in functionality within an SQL script?

There are many methods to accomplish this, so it may be other factors you prefer that drive the approach.
If the SQL Server databases are on the same server instance:
Trigger on the DB1 tables that push to the DB2 tables
Stored procedure (in DB1 or DB2) that uses MERGE to identify changes and sync them to DB2, then use SQL job to call the procedure on your schedule
Enable Change Tracking on database and desired tables, then use stored proc + SQL job to send changes without any queries on source tables
If on different instances or servers (can also work if on same instance though):
SSIS Package to identify changes and push to DB2 (bonus can work with change data capture)
Merge Replication to synchronize changes
AlwaysOn Availability Groups to synchronize entire dbs
Microsoft Sync Framework
Knowing nothing about your preferences or comfort levels, I would probably start with Merge Replication - can be a bit tricky and tedious to setup, but performs very well.

You can create a trigger in DB1 and dblinks in between DB1 and DB2. So you can natively invoke trigger within DB1 and transfer data directly to DB2.

PostgreSQL: update table with new records from the same table on remote server

We have a PostgreSQL server running in production and a plenty of workstations with an isolated development environments. Each one has its own local PostgreSQL server (with no replication with the production server). Developers need to receive updates stored in production server periodically.
I am trying to figure out how to dump the contents of several selected tables from server in order to update the tables on development workstations. The biggest challenge is that the tables I'm trying to synchronize may be diverged (developers may add - but not delete - new fields to the tables through the Django ORM, while schema of the production database remains unchanged for a long time).
Therefore the updated records and new fields of the tables stored on workstations must be preserved against the overwriting.
I guess that direct dumps (e.g. pg_dump -U remote_user -h remote_server -t table_to_copy source_db | psql target_db) are not suitable here.
UPD: If possible I would also like to avoid the use of third (intermediate) database while transferring the data from production database to the workstations.

I would recommend the following approach.
I'll outline example based on a single table customer.
We want to copy some entries from this table on production. Obviously, full table dump will break new stuff that exists on development envs;
Therefore, create a table with the similar structure, but a different name, say customer_$. Another way is to create a dedicated schema for such “copying” tables. You might also want to include a couple of extra columns there, like copy_id and/or copy_stamp;
Now you can INSERT INTO customer_$ SELECT ... to populate your copying table with wanted data. You might need to think of the way how to do this, though. In the tool we use here we can supply predicate data via the -w switch, like -w "customer_id IN (SELECT id FROM cust2copy)";
After you've populated your copying table(s), you can dump them. Make sure to use the following switches to the pg_dump:
--column-inserts to explicitly list target columns, for on development env copying table might have changed it's structure. This might be “slow” for big volumes though;
--table / -t to specify tables to dump.
On the target env, make sure to (1) empty copying tables and (2) prevent parallel activities of similar nature;
Load date into the copying tables;
The most interesting part comes: you need to check, that data you're bout to INSERT into the main tables will not conflict with any of the constraints defined on the tables. You might have:
PRIMARY KEY violations. You can (1) replace existing entries or (2) merge entries together or (3) skip entries from the copying tables or (4) choose to assign different ID in the copying tables;
UNIQUE KEY violations, most likely you'll have to UPDATE some columns in the copying tables;
FOREIGN KEY violations, you'll have either to give up on such entries, or to copy over missing stuff from the production as well;
CHECK violations, you'll have to investigate this ones manually.
After checks are done and data in the copying tables is fixed, you can copy it into the main tables.
This is a very formal description of the approach. Say, for step #7 we have a huge pile of extra tools to do ID or ID ranges remapping, to manipulate data in the copying tables, adjusting security settings, ownership, some defaults, etc.
Also, we have a so-called catalogue for this tool, which allows us to group logically tied tables under common names. Say, to copy customers from production we have to check round 50 tables in order to satisfy all possible dependencies.
I haven't seen similar tools in the wild though so far.

How to Execute 2 or more insert statements using CFQuery in coldfusion?

Is it possible to Execute 2 insert or Update Statements using cfquery?
If yes how?
if no, what is the best way to execute multiple queries in Coldfusion, by opening only one Connection to DB.
I think every time we call cfquery we are opening new connection DB

Is it possible to Execute 2 insert or
Update Statements using cfquery?
Most likely yes. But whether you can run multiple statements is determined by your database type and driver/connection settings. For example, when you create an MS SQL datasource, IIRC multiple statements are allowed by default. Whereas MySQL drivers often disable multiple statements by default. That is to help avoid sql injection. So in that case you must to enable multiple statements explicitly in your connection settings. Otherwise, you cannot use multiple statements. There are also some databases (usually desktop ones like MS Access) that do not support multiple statements at all. So I do not think there is a blanket answer to this question.
If the two insert/update statements are related, you should definitely use a cftransaction as Sam suggested. That ensures the statements are treated as a single unit: ie Either they all succeed or they all fail. So you are not left with partial or inconsistent data. In order to accomplish that, a single connection will be used for both queries in the transaction.
I think every time we call cfquery we
are opening new connection DB
As Sam mentioned, that depends on your settings and whether you are using cftransaction. If you enable Maintain Connections (under Datasource settings in the CF Administrator) CF will maintain a pool of open connections. So when you run a query, CF just grabs an open connection from the pool, rather than opening a new one each time. When using cftransaction, the same connection should be used for all queries. Regardless of whether Maintain Connections is enabled or not.

Within the data source settings you can tell it whether to keep connections open or not with the Maintain Connections setting.
Starting with, I believe, ColdFusion 8 datasources are set up to run only one query at a time due to concerns with SQL injection. To change this you would need to modify with the connection string.
Your best bet is to turn on Maintain Connections and if needed use cftransaction:
<cftransaction>
<cfquery name="ins" datasource="dsn">
insert into table1 values(<cfqueryparam value="#url.x#">)
</cfquery>
<cfquery name="ins" datasource="dsn">
insert into table2 values(<cfqueryparam value="#url.x#">)
</cfquery>
</cftransaction>
And always, always use cfqueryparam for values submitted by users.

the mySQL driver in CF8 does now allow multiple statements.
as Sam says, you can use to group many statements together
or in the coldfusion administrator | Data & Services | Data sources,
add
allowMultiQueries=true
to the Connection String field

I don't have CF server to try, but it should work fine IIRC.
something like:
<cfquery name="doubleInsert" datasource="dsn">
insert into table1 values(x,y,z)
insert into table1 values(a,b,c)
</cfquery>
if you want a more specific example you will have to give more specific information.
Edit: Thanks to #SamFarmer : Newer versions of CF than I have used may prevent this

Sorry for the Necro (I'm new to the site).
You didn't mention what DB you're using. If you happen to use mySQL you can add as many records as the max heap size will allow.
I regularly insert up to ~4500 records at a time with the default heap size (but that'll depend on the amount of data you have).
INSERT INTO yourTable (x,y,z) VALUES ('a','b','c'),('d','e','f'),('g','h','i')
All DBs should do this IMO.
HTH

Use CFTRANSACTION to group multiple queries into a single unit.
Any queries executed with CFQUERY and placed between and tags are treated as a single transaction. Changes to data requested by these queries are not committed to the database until all actions within the transaction block have executed successfully. If an error occurs in a query, all changes made by previous queries within the transaction block are rolled back.
Use the ISOLATION attribute for additional control over how the database engine performs locking during the transaction.
For more information visit http://www.adobe.com/livedocs/coldfusion/5.0/CFML_Reference/Tags103.htm

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js