The use-case is to keep track of exact number of items in a warehouse.
The warehouse has incoming items from multiple customer and the warehouse has to keep track of the item count per customer so that the warehouse owner knows the accurate count of items per customer.
So, if we were to use a QLDB to increment item_count per customer_id as and when they enter teh warehouse, would the QLDB be able to handle multi-item transaction?
If there was a read, write inconsistency, would the write to QLDB fail? We want the writes to be consistent but we are okay to read T1's data if the current data is at T2.
Short answer: yes.
QLDB supports transactions under OCC. Each transaction can have multiple statements. These statements can query the current state of the ledger to determine if the transaction can proceed. If it can, keep issuing statements until you are ready to commit. Your commit will be rejected if any other transaction interfered with it (the transaction must be serializable).
Related
I have a single-table design for my app. However, I have certain rows in the table that have important information that I plan to use to query different kinds of data. Let me explain. My app handles alarms triggered by users. When an alarm gets triggered I record a lot of info about that alert. My goals is to create GSIs so I can retrieve and sort all the info about that alarm that was triggered. Let me give you an example of a row in my table.
PK
SK
GSI1PK
GSI1SK
GSI2PK
GSI2SK
GSI3PK
GSI3SK
GSI4PK
GSI4SK
GSI5PK
GSI5SK
OtherProperties
ShipmentReceived
AL#TR#2020-08-19T23:37:41.513Z
AL#TR
2020-08-19T23:37:41.513Z
AL#TR#LO
Building1#WingA#Floor1#OfficeB#2020-08-19T23:37:41.513Z
user#example.com
2020-08-19T23:37:41.513Z
1234567
2020-08-19T23:37:41.513Z
AL#TR#HOW
PC#KS
Other values go in other columns
NOTE: AL#TR means: "Alarm Triggered" and AL#TR#LO means "Alarm triggered from location". AL#TR#HOW indicates how the alarm was triggered. 1234567 is a "device ID" used to trigger the alarm.
This kind of structure allows me to query for all sorts of interesting data. for example:
All of the ShipmentReceived alarms sorted by date
GSI1: I can get all of the alarms triggered at the company and sort them by date (That includes ShipmentReceived, PackageSent, etc)
GSI2: I can get all of the alarms triggered at a certain location and I can sort them by date.
GSI3: I can get all of the alarms triggered by a specific user and I can sort by date.
GSI4: I can get all of the alarms triggered by a specific device and I can sort them by date.
GSI5: Allows me to sort the alarms by method used to trigger them.
I am reading the DynamoDB documentation and I see that it says that it is not recommended to use indexes on items that are not queried often. A lot of these GSIs will not be queried often at all. Just very sporadically.
My question is, am I doing this wrong by creating 5 different GSIs? in this case? Is there a better way to model this data? I thought about this, maybe I can insert multiple rows with related information instead of having everything in one row, but I do not know if that is a better approach. Any other ideas?
I'm on the DynamoDB team in Seattle, and this response is from one of my colleagues:
"Anytime you need to group or sort the same entities differently, you need to make a new GSI for that access pattern. When you have multiple entity types stored in the same table you can reuse the GSI (aka GSI overloading) for those access patterns on different entities. But in your case, all of the access patterns are about grouping and sorting alarm entities so each would need a different GSI.
"However, GSIs exist to speed up or make cheaper read requests with the trade-off being a higher write expense (to keep the GSIs updated). This makes sense in access patterns that have a high read:write ratio and where the response must come back quickly. But for read access patterns that are done infrequently and for which there isn't a low-latency requirement, it might be cheaper to simply do a Scan operation compared to the cost of having a GSI. For example, for a batch job that runs once a day or once a week it might be cheaper to scan the table once a day or once a week."
DynamoDB's transaction write states:
Multiple transactions updating the same items simultaneously can cause conflicts that cancel the transactions. We recommend following DynamoDB best practices for data modeling to minimize such conflicts.
If there are multiple simultaneous TransactionWriteItems on the same item simultaneously, will all the transaction write request fail with TransactionCanceledException? Or at least one request will succeed?
This purely depends on how the conflicts happen. So, if all the writes in a transaction get successful, it will be successful. If some other write/trasaction midifies one of the items, it will fail.
I'm not sure how to achieve consistent read across multiple SELECT queries.
I need to run several SELECT queries and to make sure that between them, no UPDATE, DELETE or CREATE has altered the overall consistency. The best case for me would be something non blocking of course.
I'm using MySQL 5.6 with InnoDB and default REPEATABLE READ isolation level.
The problem is when I'm using RDS DataService beginTransaction with several executeStatement (with the provided transactionId). I'm NOT getting the full result at the end when calling commitTransaction.
The commitTransaction only provides me with a { transactionStatus: 'Transaction Committed' }..
I don't understand, isn't the commit transaction fonction supposed to give me the whole (of my many SELECT) dataset result?
Instead, even with a transactionId, each executeStatement is returning me individual result... This behaviour is obviously NOT consistent..
With SELECTs in one transaction with REPEATABLE READ you should see same data and don't see any changes made by other transactions. Yes, data can be modified by other transactions, but while in a transaction you operate on a view and can't see the changes. So it is consistent.
To make sure that no data is actually changed between selects the only way is to lock tables / rows, i.e. with SELECT FOR UPDATE - but it should not be the case.
Transactions should be short / fast and locking tables / preventing updates while some long-running chain of selects runs is obviously not an option.
Issued queries against the database run at the time they are issued. The result of queries will stay uncommitted until commit. Query may be blocked if it targets resource another transaction has acquired lock for. Query may fail if another transaction modified resource resulting in conflict.
Transaction isolation affects how effects of this and other transactions happening at the same moment should be handled. Wikipedia
With isolation level REPEATABLE READ (which btw Aurora Replicas for Aurora MySQL always use for operations on InnoDB tables) you operate on read view of database and see only data committed before BEGIN of transaction.
This means that SELECTs in one transaction will see the same data, even if changes were made by other transactions.
By comparison, with transaction isolation level READ COMMITTED subsequent selects in one transaction may see different data - that was committed in between them by other transactions.
I have an order table in the OLTP system.
Each order record has a OrderStatus field.
When end users created an order, OrderStatus field set as "Open".
When somebody cancels the order, OrderStatus field set as "Canceled".
When an order process finished(transformed into invoice), OrderStatus field set to "Close".
There are more than one hundred million record in the table in the Oltp system.
I want to design and populate data warehouse and data marts on hdfs layer.
In order to design data marts, I need to import whole order table to hdfs and then I need to reflect changes on the table continuously.
First, I can import whole table into hdfs in the initial load process by using sqoop. I may take long time but I will do this once.
When an order record is updated or a new order record entered, I need to reflect changes in hdfs. How can I achieve this in hdfs for such a big transaction table?
Thanks
One of the easier ways is to work with database triggers in your OLTP source db and every change an update happens use that trigger to push an update event to your Hadoop environment.
On the other hand (this depends on the requirements for your data users) it might be enough to reload the whole data dump every night.
Also, if there is some kind of last changed timestamp, it might be a possible way to load only the newest data and do some kind of delta check.
This all depends on your data structure, your requirements and your ressources at hand.
There are several other ways to do this but usually those involve messaging, development and new servers and I suppose in your case this infrastructure or those ressources are not available.
EDIT
Since you have a last changed date, you might be able to pull the data with a statement like
SELECT columns FROM table WHERE lastchangedate < (now - 24 hours)
or whatever your interval for loading might be.
Then process the data with sqoop or ETL tools or the like. If the records are already available in your Hadoop environment, you want to UPDATE it. If the records are not available, INSERT them with your appropriate mechanism. This is also called UPSERTING sometimes.
I have a DB2 table containing large amount of records to be send out to external system via MQs. There is a column in the table containing whether the record status (sent or pending to be sent).
I write a scheduler program to continually check if there are records in the table that are "pending to sent". If yes, the program will send the pending records out and update the status accordingly
That schedule will be started in multiple transactions. Therefore I am expecting multiple instances of the same program will be running concurrently
My questions is how to prevent the same records being pick up and sent by multiple schedulers at the same time?
I was told to use cursor with row level locks? but i am not sure how this works
remarks: I am working on CICS COBOL in z/os environment
I think you have a design problem. We accomplish something similar what you are trying to do by having a trigger on the DB2 table which sends an MQ message to a queue which is defined to trigger a CICS transaction.
In your case, you can probably dispense with CICS altogether and just do as #BillWoodger suggests and send the message when you set the pending flag.
One way to do this is as follows
1) Determine the Clustering index for the large DB2 table
2) Then have different instances of the program run only looking at different portions of this clustering index. E.G if the clustering index was on a numeric ID field that is unique, like Account ID and the ID size is Integer 9 than have instance one look at account ID ranges from 0 - 099999999 and instance 2 look at account ID ranges from 100000000 to 1999999999 and .....
This way you can write your cusror with hold, perform updates and commits as needed.
CICS will coordinate SQL transactions with DB2 for you. Each one of the CICS transactions you run will be able to select and lock for update rows and DB2 can coordinate between all of them and prevent the selection of multiple records if you do two things.
When you read the rows that qualify, use a SELECT FOR UPDATE type operation, this will lock every row you retrieve and prevent other concurrent transactions from accessing the same one (also requires you BIND with row level locks unless you want full pages locked, see your DBA about the options based on row size).
Before you release the records or end the CICS transaction, you must do something to flag said records as "sent" so that other, waiting, concurrent transactions do not grab them and send them again. This could be as simple as adding a sent Y/N column to the table and adding "AND sent <> 'Y'" to your select where clause. After you have sent the records, do an UPDATE on those records and set sent = 'Y'. Depending on your row data, you could maybe use something else, like time sent or whatever, it just needs to be something that would exclude said row from reselection.