Berkeley-DB: Atomic transactions over multiple databases - c++

I want to use different Berkeley-DB databases to store different classes of objects in my application. Transactions in a single DB can be done atomically using DbTxn::commit. However, if I'm using multiple databases, I have to create multiple transactions (one for each database), right? In this case, if committing the first succeeds but the second fails, is there a way to roll back the already committed first transaction? (As far as I understand DbTxn::abort, this is no longer possible to use after the transaction has been committed.)
Is there some way to achieve atomic transactions across multiple databases?

If you are using multiple databases then you DON'T have to create multiple transactions. By using a single transaction, you can operate on multiple DBs.
Please see this link for the documentation of Db::Open().
It has the 'DbTxn *txnid' parameter. You can specify a transaction id returned by the DB_ENV->txn_begin() API. So before opening a DB, a transaction id should be obtained.
Carefully read the note under the parameter 'txnid' in the given documentation link.
Please note that, you should Not specify DB_AUTO_COMMIT the flag in the Db::open() API. Instead of that, you will specify the same transaction id for the parameter 'txnid' for all DBs that you want to operate on. In this way you can achieve atomic transactions across multiple databases.

In general, you need something like distributed tranzaction manager, the full answer fills books. See "The Berkeley DB Book", CHAPTER 9, "Distributed Transactions and Data-Distribution Strategies", ISBN-10: 1-59059-672-2

Related

Negative impact of a Django model with multiple fields (75+ fields) [duplicate]

This question already has answers here:
Why use a 1-to-1 relationship in database design?
(6 answers)
Closed 6 months ago.
I'm in the process of building a web app that takes user input and stores it for retrieval and data manipulation. There are essentially 100-200 static fields that the user needs to input to create the Company model.
I see how I could break the Company model into multiple 1-to-1 Django models that map back the a Company such as:
Company General
Company Notes
Company Finacials
Company Scores
But why would I not create a single Company model with 200 fields?
Are there noticeable performance tradeoffs when trying to load a Query Set?
In my opinion, it would be wise for your codebase to have multiple models related to each other. This will give you better scalability opportunities and easier navigation to your model fields. Also, when you want to make a custom serializer, or custom views that will deal with some of your fields, but not all, it would be ideal to not have to retrieve 100+ fields every time.
Turns out I wasn't asking the right question. This is the questions I was asking. It's more a database question than a Django question I believe: Why use a 1-to-1 relationship in database design?
From the logical standpoint, a 1:1 relationship should always be
merged into a single table.
On the other hand, there may be physical considerations for such
"vertical partitioning" or "row splitting", especially if you know
you'll access some columns more frequently or in different pattern
than the others, for example:
You might want to cluster or partition the two "endpoint" tables of a
1:1 relationship differently. If your DBMS allows it, you might want
to put them on different physical disks (e.g. more
performance-critical on an SSD and the other on a cheap HDD). You have
measured the effect on caching and you want to make sure the "hot"
columns are kept in cache, without "cold" columns "polluting" it. You
need a concurrency behavior (such as locking) that is "narrower" than
the whole row. This is highly DBMS-specific. You need different
security on different columns, but your DBMS does not support
column-level permissions. Triggers are typically table-specific. While
you can theoretically have just one table and have the trigger ignore
the "wrong half" of the row, some databases may impose additional
limits on what a trigger can and cannot do. For example, Oracle
doesn't let you modify the so called "mutating" table from a row-level
trigger - by having separate tables, only one of them may be mutating
so you can still modify the other from your trigger (but there are
other ways to work-around that). Databases are very good at
manipulating the data, so I wouldn't split the table just for the
update performance, unless you have performed the actual benchmarks on
representative amounts of data and concluded the performance
difference is actually there and significant enough (e.g. to offset
the increased need for JOINing).
On the other hand, if you are talking about "1:0 or 1" (and not a true
1:1), this is a different question entirely, deserving a different
answer...

Firestore in Datastore mode does not seem to be strongly consistent

I am using cloud endpoints with objectify and Firestore in Datastore mode. Although it says in the documentation that all queries are strongly consistent, I have found that they are not in the following examples:
Example 1
I made an endpoint that queries for an entity by a property, adds +1 to a count property on it, and saves it back to the datastore. I then have 50 different clients all execute that method at the same time. I would expect the count property to be 50, however, it usually ends up being somewhere between 25-30.
Example 2
I have an endpoint that queries for an entity by a property. If the entity does not exist, I create the entity and save it to the datastore. If it exists, I just return it. Again, I hit this endpoint with 50 different clients at the same time. I would expect there to only be one entity in the Datastore. However, I will have maybe 5-10 of the same entity.
It seems to me this is not strongly consistent. If I take my code in the above endpoints and put them in a transaction with retries, all works as intended. I looked around in objectify to see if there is a ReadOptions set somewhere, but from what I can see, there is not, so it should be using the default of read_consistency=STRONG
For example 1, you need to use transactions to ensure that writes do not stomp on each other.
For example 2, again you need to use a transaction to get consistency across clients.
Strong consistency means that if a client writes a value, it can read or query it back after the write succeeds. Not that if a client reads a value, another reads the same value, they each do a transformation, and try to write that the blinds writes for each client will merge together.

Create fault tolerance example with Dynamodb streams

I have been looking at DynamoDB to create something close to a transaction. I was watching this video presentation: https://www.youtube.com/watch?v=KmHGrONoif4 in which the speaker shows around the 30 minute mark ways to make dynamodb operation close to ACID compliant as can be. He shows the best concept is to use dynamodb streams, but doesn't show a demo or an example. I have a very simple scenario I am look at and that is I have one Table called USERS. Each user has a list of friends. If two users no longer wish to be friends they must be removed from both of the user's entities (I can't afford for one friend to be deleted from one entity, and due to a crash for example, the second user entities friend attribute is not updated causing inconsistent data). I was wondering if someone could provide some simple walk-through oh of how to accomplish something like this to see how it all works? If code could be provided that would be great to see how it works.
Cheers!
Here is the transaction library that he is referring: https://github.com/awslabs/dynamodb-transactions
You can read through the design: https://github.com/awslabs/dynamodb-transactions/blob/master/DESIGN.md
Here is the Kinesis client library:
http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html
When you're writing to DynamoDB, you can get an output stream with all the operations that happen on the table. That stream can be consumed and processed by the Kinesis Client Library.
In your case, have your client remove it from the first user, then from the second user. In the Kinesis Client Library when you are consuming the stream and see a user removed, look at who he was friends with and go check/remove if needed - if needed the removal should probably done through the same means. It's not truly a transaction, and relies on the fact that KCL guarantees that the records from the streams will be processed.
To add to this confusion, KCL uses Dynamo to store where in the stream is at when processing and to checkpoint processed records.
You should try and minimize the need for transactions, which is a nice concept in a small scale, but can't really scale once you become very successful and need to support millions and billions of records.
If you are thinking in a NoSQL mind set, you can consider using a slightly different data model. One simple example is to use Global Secondary Index on a single table on the "friend-with" attribute. When you add a single record with a pair of friends, both the record and the index will be updated in a single action. Both table and index will be updated in a single action, when you delete the friendship record.
If you choose to use the Updates Stream mechanism or the Global Secondary Index one, you should take into consideration the "eventual consistency" case of the distributed system. The consistency can be achieved within milli-seconds, but it can also take longer. You should analyze the business implications and the technical measures you can take to solve it. For example, you can verify the existence of both records (main table as well as the index, if you found it in the index), before you present it to the user.

Unit-testing PostgreSQL row-level locks

I am currently adding unit tests to a rather large quantity of PostgreSQL stored procedures, using pgTap.
Some of the procedures perform operations which lock rows explicitly. These locks are critical to the application.
How do I write tests that check that the rows that need to be locked have been, and that rows which shouldn't be locked aren't?
The only "clue" I have at the moment is the pgrowlocks extension, which allows a transaction to check for rows locked by another transaction. However, the current transaction doesn't seem to see its own locks, so I'd have to use something to synchronise two transaction, and unless I am quite mistaken, there's no way to do that using pgTap.
(note: using PostgreSQL 9.1)
If you can identify the ctid of the rows in question, and know which transaction should have the rows locked, maybe you could use the pageinspect extension and look at the tuple info flags and xmax? The info flags should indicate the row is locked, and xmax be set to the transaction id holding it.
How do I write tests that check that the rows that need to be locked have been, and that rows which shouldn't be locked aren't?
Open separate transation, try to lock the same row with NOWAIT, and catch the exception.
PostgreSQL has no support for autonomous transactions, so - to open separate transaction from within PgTAP test, you will have to resort to dblink or other similar extension.
PS. I found this link, where Robert Haas explains, why row-level tuples are not tracked in pg_locks:
(...) ungranted tuple locks show up in
pg_locks, but they disappear once granted. (PostgreSQL would run out
of lock table space on even a medium-sized SELECT FOR UPDATE query if
we didn't do this.)
On the other hand - I quite don't understand why you want to test for lock existence - it's guaranteed after succesful LOCK command.

Postgresql locks deadlock

I am developing a system using Django + Postgresql. It's my first time with postgresql but I chose it because I needed the transactions and foreign key features.
In a certain view I have to lock my tables with AccessExclusiveLock, to prevent any read or write during this view. That's because I do some checks on the whole data before I save/update my entities.
I noticed an inconsistent error that happens from time to time. It's because of a select statement that happens directly after the lock statement. It demands to have AccessShareLock. I read on postgresql website that AccessShareLock conflicts with AccessExclusiveLock.
What I can't understand is why is it happening in the first place. Why would postgresql ask for implicit lock if it already has an explicit lock that cover that implicit one ? The second thing I can't understand is why is this view running on 2 different postregsql processes ? Aren't they supposed to be collected in a single transaction ?
Thanx in advance.
In PostgreSQL, instead of acquiring exclusive access locks, I would recommend to set the appropriate transaction isolation level on your session. So, before running your "update", send the following command to your database:
begin;
set transaction isolation level repeatable read;
-- your SQL commands here
commit;
According to your description, you need repeatable read isolation level.