Data integrity in concurrency environment - concurrency

We know that DBMS harness many technique to ensure the data integrity and satisfying the ACID properties when there are many transactions running simultaneously in a concurrency environment. Apart from setting Isolation Level, what other means taken by DBMS to ensure data integrity? what is the relationship between these technique? Could you please give me an detail explanation?

Related

Is dynamodb suitable for growing or pivotable product?

Amazon said
NoSQL design requires a different mindset than RDBMS design. For an RDBMS, you can create a normalized data model without thinking about access patterns. You can then extend it later when new questions and query requirements arise. For DynamoDB, by contrast, you shouldn't start designing your schema until you know the questions it needs to answer. Understanding the business problems and the application use cases up front is absolutely essential.
It seems that I should design the tables after designing the product for efficient query cost.
But a product can be pivoted or be appended new features. In early stage, nobody knows where the product goes.
Is dynamodb suitable for growing or pivotable product?
In my opinion, the main benefit of Dynamo DB over other NoSQL solutions is that it is a managed database service. You pay for reads and writes and you never worry about scaling to handle larger data, more users. If you are doing a prototype or don't have technical know-how to setup a database server and host in the cloud it could be useful and cost effective. It has its limitations however so if you do have technical resources consider another open source NoSQL option.
I think that statement by Amazon is confusing and is probably more marketing than anything else. Use NoSQL in cases where your data is only accessed in distinct elements that do no have to be combined in a complex manner. It's also helpful if you don't have an exact schema defined because NoSQL doesn't require a hard set schema you can store any fields in a table and you can always add new fields. This is helpful when things are changing rapidly and you don't want to migrate everything as strictly as an RDBMS would require. If however you're going to have to run complex logic or calculations combining data from across tables you should use an RDBMS. You could use NoSQL for some data and and RDBMS for other data in a hybrid fashion but in that case you probably wouldn't want to use Dynamo DB because you'd want full ownership to set it up properly. Hope this helps I'm sure others have more to say and I welcome comments to help me refine my answer.

Accessing Data while updating the same data on AWS DynamoDB

I am planning to build a mini Content Management System. Checking the possibility of storing the Content on a DynamoDB, Will the services be able to access the content while Updating the same content? (Scenario of updating the content on CMS and publishing)
Or CloudSearch will be the better solution instead of DynamoDB in such use case?
Thanks in advance!
Please think about your use case and decide whether it requires eventually consistent read or strongly consistent read.
Read Consistency
Eventually Consistent Reads
When you read data from a DynamoDB table, the response might not reflect the results of a recently completed write operation. The response might include some stale data. If you repeat your read request after a short time, the response should return the latest data.
Strongly Consistent Reads
When you request a strongly consistent read, DynamoDB returns a response with the most up-to-date data, reflecting the updates from all prior write operations that were successful. A strongly consistent read might not be available in the case of a network delay or outage.
Note:-
DynamoDB uses eventually consistent reads, unless you specify otherwise. Read operations (such as GetItem, Query, and Scan) provide a ConsistentRead parameter: If you set this parameter to true, DynamoDB will use strongly consistent reads during the operation.
AWS DynamoDB tool (Java) for transaction management:-
Out of the box, DynamoDB provides two of the four ACID properties:
Consistency and Durability. Within a single item, you also get
Atomicity and Isolation, but when your application needs to involve
multiple items you lose those properties. Sometimes that's good
enough, but many applications, especially distributed applications,
would appreciate some of that Atomicity and Isolation as well.
Fortunately, DynamoDB provides the tools (especially optimistic
concurrency control) so that an application can achieve these
properties and have full ACID transactions.
You can use this tool if you are using Java or AWS SDK Java for DynamoDB. I am not sure whether similar tool is available for other languages.
One of the features available on this library is Isolated Reads.
Isolated reads: Read operations to multiple items are not interfered with by other transactions.
Dynamodb transaction library for Java
Transaction design

Databases in a microservices pattern/architecture

I'm trying to understand the layout of the microservices pattern. Given that each microservice would run on its on VM (for sake of example) how does the database fit into this architecture?
Would each service, in turn, connect to the consolidated database to read/write data?
Thanks for any insight
There's no one size fits all solution.
The general principle is that each microservice should make the right decision for itself in terms of what the right persistence architecture should be. It might be connected to a central SQL database, or it could be using a filesystem, or it could be using NoSQL data store, or memcached, or whatever. (This is why people talk about eventual consistency a lot with microservices.)
You want to do it this way to really capture the benefits of microservices.
You want each microservice to be independently shippable, so that you're not blocked on anything. Stronger coupling to centralized infrastructure reduces the independence of the microservice.
Persistence requirements are highly variable. If you're running a search microservice, you don't need the ACID semantics of a typical SQL database. If you're doing payments, you need ACID. If you're storing and processing images, you might just use the filesystem. Etc.
In my experience when dealing with mSOA it always comes to Data Warehouse solution in the end. And this is the natural choice if you have a dedicated DB (cluster) per micro-service. After all the business should be able to use that info from your domain. Even Data Vault Modeling will be a good fit here.

Redshift as a Web App Backend?

I am building an application (using Django's ORM) that will ingest a lot of events, let's say 50/s (1-2k per msg). Initially some "real time" processing and monitoring of the events is in scope so I'll be using redis to keep some of that data to make decisions, expunging them when it makes sense. I was going to persist all of the entities, including events in Postgres for "at rest" storage for now.
In the future I will need "analytical" capability for dashboards and other features. I want to use Amazon Redshift for this. I considered just going straight for Redshift and skipping Postgres. But I also see folks say that it should play more of a passive role. Maybe I could keep a window of data in the SQL backend and archive to Redshift regularly.
My question is:
Is it even normal to use something like Redshift as a backend for web applications or does it typically play more of a passive role? If not is it realistic to think I can scale the Postgres enough for the event data to start with only that? Also if not, does the "window of data and archival" method make sense?
EDIT Here are some things I've seen before writing the post:
Some say "yes go for it" regarding the should I use Redshift for this question.
Some say "eh not performant enough for most web apps" and support the front it with a postgres database camp.
Redshift (ParAccel) is an OLAP-optimised DB, based on a fork of a very old version of PostgreSQL.
It's good at parallelised read-mostly queries across lots of data. It's bad at many small transactions, especially many small write transactions as seen in typical OLTP workloads.
You're partway in between. If you don't mind a data loss window, then you could reasonably accumulate data points and have a writer thread or two write batches of them to Redshift in decent sized transactions.
If you can't afford any data loss window and expect to be processing 50+ TPS, then don't consider using Redshift directly. The round-trip costs alone would be horrifying. Use a local database - or even a file based append-only journal that you periodically rotate. Then periodically upload new data to Redshift for analysis.
A few other good reasons you probably shouldn't use Redshift directly:
OLAP DBs with column store designs often work best with star schemas or similar structures. Such schemas are slow and inefficient for OLTP workloads as inserts and updates touch many tables, but they make querying the data along various axes for analysis much more efficient.
Using an ORM to talk to an OLAP DB is asking for trouble. ORMs are quite bad enough on OLTP-optimised DBs, with their unfortunate tendency toward n+1 SELECTs and/or wasteful chained left joins, tendency to do many small inserts instead of a few big ones, etc. This will be even worse on most OLAP-optimised DBs.
Redshift is based on a painfully old PostgreSQL with a bunch of limitations and incompatibilities. Code written for normal PostgreSQL may not work with it.
Personally I'd avoid an ORM entirely for this - I'd just accumulate data locally in an SQLite or a local PostgreSQL or something, sending multi-valued INSERTs or using PostgreSQL's COPY to load chunks of data as I received it from an in-memory buffer. Then I'd use appropriate ETL tools to periodically transform the data from the local DB and merge it with what was already on the analytics server.
Now forget everything I just said and go do some benchmarks with a simulation of your app's workload. That's the only really useful way to tell.
In addition to Redshift's slow transaction processing (by modern DB standards) there's another big challenge:
Redshift only supports serializable transaction isolation, most likely as a compromise to support ACID transactions while also optimizing for parallel OLAP mostly-read workload.
That can result in all kinds of concurrency-related failures that would not have been failures on typical DB that support read-committed isolation by default.

Check changes before run synchronize

Is there any way to check changes in database before running synchronize with MS Sync Framework?
I have a database with about 100 tables, 80% of these tables are not changed very often. I divided database into multiple scopes to handle the sync priority. Even though, there's no change in database, It takes a long time to finish synchronization.
i suggest you trace the Sync process to find out what's going on: How to: Trace the Synchronization Process
there is no specific API call in the Sync Framework SDK for simply checking a table has changed. most the API calls will do an actual change enumeration(read: query the base and tracking tables)
if you have large number of rows in your tables, you might want to set a retention period on the Sync Framework metadata to keep it small. see How to: Clean Up Metadata for Collaborative Synchronization (SQL Server)
Yes. Check out the Sync Framework Team Blog on Synchronization Services for ADO .NET for Devices: Improving performance by skipping tables that don’t need synchronization