z/os cics db2 cobol program to process database entries concurrently - concurrency

I have a DB2 table containing large amount of records to be send out to external system via MQs. There is a column in the table containing whether the record status (sent or pending to be sent).
I write a scheduler program to continually check if there are records in the table that are "pending to sent". If yes, the program will send the pending records out and update the status accordingly
That schedule will be started in multiple transactions. Therefore I am expecting multiple instances of the same program will be running concurrently
My questions is how to prevent the same records being pick up and sent by multiple schedulers at the same time?
I was told to use cursor with row level locks? but i am not sure how this works
remarks: I am working on CICS COBOL in z/os environment

I think you have a design problem. We accomplish something similar what you are trying to do by having a trigger on the DB2 table which sends an MQ message to a queue which is defined to trigger a CICS transaction.
In your case, you can probably dispense with CICS altogether and just do as #BillWoodger suggests and send the message when you set the pending flag.

One way to do this is as follows
1) Determine the Clustering index for the large DB2 table
2) Then have different instances of the program run only looking at different portions of this clustering index. E.G if the clustering index was on a numeric ID field that is unique, like Account ID and the ID size is Integer 9 than have instance one look at account ID ranges from 0 - 099999999 and instance 2 look at account ID ranges from 100000000 to 1999999999 and .....
This way you can write your cusror with hold, perform updates and commits as needed.

CICS will coordinate SQL transactions with DB2 for you. Each one of the CICS transactions you run will be able to select and lock for update rows and DB2 can coordinate between all of them and prevent the selection of multiple records if you do two things.
When you read the rows that qualify, use a SELECT FOR UPDATE type operation, this will lock every row you retrieve and prevent other concurrent transactions from accessing the same one (also requires you BIND with row level locks unless you want full pages locked, see your DBA about the options based on row size).
Before you release the records or end the CICS transaction, you must do something to flag said records as "sent" so that other, waiting, concurrent transactions do not grab them and send them again. This could be as simple as adding a sent Y/N column to the table and adding "AND sent <> 'Y'" to your select where clause. After you have sent the records, do an UPDATE on those records and set sent = 'Y'. Depending on your row data, you could maybe use something else, like time sent or whatever, it just needs to be something that would exclude said row from reselection.

Related

Can QLDB be used to atomically increment an attribute?

The use-case is to keep track of exact number of items in a warehouse.
The warehouse has incoming items from multiple customer and the warehouse has to keep track of the item count per customer so that the warehouse owner knows the accurate count of items per customer.
So, if we were to use a QLDB to increment item_count per customer_id as and when they enter teh warehouse, would the QLDB be able to handle multi-item transaction?
If there was a read, write inconsistency, would the write to QLDB fail? We want the writes to be consistent but we are okay to read T1's data if the current data is at T2.
Short answer: yes.
QLDB supports transactions under OCC. Each transaction can have multiple statements. These statements can query the current state of the ledger to determine if the transaction can proceed. If it can, keep issuing statements until you are ready to commit. Your commit will be rejected if any other transaction interfered with it (the transaction must be serializable).

How to achieve consistent read across multiple SELECT using AWS RDS DataService (Aurora Serverless)

I'm not sure how to achieve consistent read across multiple SELECT queries.
I need to run several SELECT queries and to make sure that between them, no UPDATE, DELETE or CREATE has altered the overall consistency. The best case for me would be something non blocking of course.
I'm using MySQL 5.6 with InnoDB and default REPEATABLE READ isolation level.
The problem is when I'm using RDS DataService beginTransaction with several executeStatement (with the provided transactionId). I'm NOT getting the full result at the end when calling commitTransaction.
The commitTransaction only provides me with a { transactionStatus: 'Transaction Committed' }..
I don't understand, isn't the commit transaction fonction supposed to give me the whole (of my many SELECT) dataset result?
Instead, even with a transactionId, each executeStatement is returning me individual result... This behaviour is obviously NOT consistent..
With SELECTs in one transaction with REPEATABLE READ you should see same data and don't see any changes made by other transactions. Yes, data can be modified by other transactions, but while in a transaction you operate on a view and can't see the changes. So it is consistent.
To make sure that no data is actually changed between selects the only way is to lock tables / rows, i.e. with SELECT FOR UPDATE - but it should not be the case.
Transactions should be short / fast and locking tables / preventing updates while some long-running chain of selects runs is obviously not an option.
Issued queries against the database run at the time they are issued. The result of queries will stay uncommitted until commit. Query may be blocked if it targets resource another transaction has acquired lock for. Query may fail if another transaction modified resource resulting in conflict.
Transaction isolation affects how effects of this and other transactions happening at the same moment should be handled. Wikipedia
With isolation level REPEATABLE READ (which btw Aurora Replicas for Aurora MySQL always use for operations on InnoDB tables) you operate on read view of database and see only data committed before BEGIN of transaction.
This means that SELECTs in one transaction will see the same data, even if changes were made by other transactions.
By comparison, with transaction isolation level READ COMMITTED subsequent selects in one transaction may see different data - that was committed in between them by other transactions.

Reflecting changes on big tables in hdfs

I have an order table in the OLTP system.
Each order record has a OrderStatus field.
When end users created an order, OrderStatus field set as "Open".
When somebody cancels the order, OrderStatus field set as "Canceled".
When an order process finished(transformed into invoice), OrderStatus field set to "Close".
There are more than one hundred million record in the table in the Oltp system.
I want to design and populate data warehouse and data marts on hdfs layer.
In order to design data marts, I need to import whole order table to hdfs and then I need to reflect changes on the table continuously.
First, I can import whole table into hdfs in the initial load process by using sqoop. I may take long time but I will do this once.
When an order record is updated or a new order record entered, I need to reflect changes in hdfs. How can I achieve this in hdfs for such a big transaction table?
Thanks
One of the easier ways is to work with database triggers in your OLTP source db and every change an update happens use that trigger to push an update event to your Hadoop environment.
On the other hand (this depends on the requirements for your data users) it might be enough to reload the whole data dump every night.
Also, if there is some kind of last changed timestamp, it might be a possible way to load only the newest data and do some kind of delta check.
This all depends on your data structure, your requirements and your ressources at hand.
There are several other ways to do this but usually those involve messaging, development and new servers and I suppose in your case this infrastructure or those ressources are not available.
EDIT
Since you have a last changed date, you might be able to pull the data with a statement like
SELECT columns FROM table WHERE lastchangedate < (now - 24 hours)
or whatever your interval for loading might be.
Then process the data with sqoop or ETL tools or the like. If the records are already available in your Hadoop environment, you want to UPDATE it. If the records are not available, INSERT them with your appropriate mechanism. This is also called UPSERTING sometimes.

Tracking changes in Django with PostgreSQL

I have a Django project with PostgreSQL as database.
There are few tables that describe state (let's call them "state tables")
There are several servers that can modify state (Each one modifies its own table)
There is few servers that read the state (let's call them "readers") and modifies internal stuff based on current state of the tables.
What I'd like to do is to give the readers ability to know what row in state tables was changed, so that it won't have to scan all the tables all the time.
Currently I have a special tracking table and a post_save() trigger on all state tables. The post_save trigger saves the table name and the ID.
Initially the plan was to define sequence ID on the tracking table and to check whether "last known tracking ID" is the largest. If it's not - I would scan all of the tracked entries and know what states were changed.
However, it seems that PostgreSQL's indexes are not promised to be sequential. I don't mind the gaps between them, but I do rely on tracking record N+1 to have ID bigger than record N.
Any advice?

API Gateway generating 11 sql queries per second on REG_LOG

We have sysdig running on our WSO2 API gateway machine and we notice that it fires a large number of SQL queries to the database for a minute, than waits a minute and repeats.
The query looks like this:
Every minute it goes wild, waits for a minute and goes wild again with a request of the following format:
SELECT REG_PATH, REG_USER_ID, REG_LOGGED_TIME, REG_ACTION, REG_ACTION_DATA
FROM REG_LOG
WHERE REG_LOGGED_TIME>'2016-02-29 09:57:54'
AND REG_LOGGED_TIME<'2016-03-02 11:43:59.959' AND REG_TENANT_ID=-1234
There is no load on the server. What is causing this? What can we do to avoid this?
screen shot sysdig api gateway process
This particular query is the result of the registry indexing task that runs in the background. The REG_LOG table is being queried periodically to retrieve the latest registry actions. The indexing task cannot be stopped. However, one can configure the frequency of the indexing task through the following parameter that is in the registry.xml. See [1] for more information.
indexingFrequencyInSeconds
If this table is filled up, one can clean the data using a simple SQL query. However, when deleting the records, one must be careful not to delete all the data. The latest records of each resource path should be left in the REG_LOG table since reindexing of data requires at least one reference of each resource path.
Also, if required, before clearing up the REG_LOG table, you can take a dump of the data in case you do not want to loose old records. Hope this answer provides information you require.
[1] - https://docs.wso2.com/display/Governance510/Configuration+for+Indexing