Oracle Materialized View Refresh fails with ORA-01555 - materialized-views

I've a Materialized view set to refresh on demand:
CREATE MATERIALIZED VIEW XYZ
REFRESH COMPLETE ON DEMAND
AS
SELECT * FROM ABC WHERE LAST_UPD > SYSDATE-30;
When i run a procedure for refresh it fails every two days.
Refresh command:
dbms_mview.refresh(list => 'XYZ',
method => 'C',
parallelism => 0,
atomic_refresh => false);
Error:
1 - ERROR IN MERGE : ORA-12008: error in materialized view refresh path
ORA-01555: snapshot too old: rollback segment number 406 with name "_SYSSMU406_3487494604$" too small
ORA-02063: preceding line from IJSFASIEBEL
I've read that using select * to create the Materialized view can cause this error,
but i've dropped the view and recreated it many times, the refresh runs fine one day and gets erred out the next day.
No changes were made to the base table.
Can anyone tell me what the error message means or what might be causing the issue?

The problem is that your rollback segments are not large enough for the query that is being run given the other updates happening on the database at the same time.
There is a full discussion of what this means here:
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:275215756923
Possible solutions:-
Create larger rollback segment to allow more changes to occur during the refresh without running out of rollback space
Creating an index on LAST_UPD to improve the speed of the query (if indeed it does)
Running the refresh at a quieter time of day

Pratheek Ponnuru,
Please check if any LOB are there in the table , the check for lob corruption.
If LOB are corrupted then this error used to come....
I had faced same issue recently, I check the corruption for all lobs in the table and
post further investigation found some corrupted lob segments, which later I set to blob_null().
-- Milind Kale

Related

"sequence contains no matching element" on Group By operations in Power Query

Power BI newbie question here.
Whenever I add a Group By step with a Text.Combine() or a Max() aggregate, applying changes or refreshing data results in the aforementioned exception.
My datasource is a D365 dataverse connection, all queries run just fine until I add a step to group and aggregate. As an example, starting with a very simple query with 2 columns (demandId, kor_subcontractorbillnumber) I want to concatenate in a csv column all billNumbers related to a given demandId :
= Table.Group(#"Table Buffer", {"demandId"}, {{"BillNumbers", each Text.Combine([kor_subcontractorbillnumber],", "), type nullable text}})
As seen in the attached screenshot, the preview on screen seems correct : the expected result is displayed in the BillNumbers column, and no error is reported in the column quality indicators. All is fine...until I click Apply, which raises the exception.
I tried to clean the columns as much as possible before grouping (removing empty values, errors, duplicates, etc.), as well as adding an extra step to store results in a table buffer before grouping but with no luck.
Browsing through SO I found that similar issues could be related to :
Wrong relationship cardinalities : does not apply here I guess since everything is correct in the buffer table until I group
Power Bi Desktop update : some users have reported in the past that an update broke something and gave the same exception. In my case, the issue started occurring after upgrading to July 2022 version and unfortunately it seems I can't downgrade to a previous version. I've started using PowerBi in June and do not have much experience to detect whether the july update actually broke something, though some reports ceased functioning short time after the update.
Even stranger : If I remove the last step (Group By) and I create a new query referencing this one... I can add a Group By step and apply my changes...until I Refresh my report : at this point all the embedded queries fail with the same exception, even those absolutely unrelated with my changes.
Could anyone explain me what I'm doing wrong or if you have experienced the same behavior with the last version of Power Bi desktop ( 2.107.841.0 64-bit), which could point me to the right direction ?
Thanks for your help !
After many tries, I eventually stumbled upon a workaround: instead of the Group By step, I clicked on the very last step of my query and selected 'Extract Previous'. This created a new query (result of all previous steps), and I was able to perform my Group By on this new query without any errors.
I have no idea how this is different from adding the Group By at the end of the first query... but the exception is gone. Kind of a code smell anyway...I mark my own question as answered in case it can help someone, but I'd more than happy if someone could shed some light on the underlying reason of this issue.

Is it possible to run queries in parallel in Redshift?

I wanted to do an insert and update at the same time in Redshift. For this I am inserting the data into a temporary table, removing the updated entries from the original table and inserting all the new and updated entries. Since Redshift uses concurrency, sometimes entries are duplicated, because the delete started before the insert was finished. Using a very large sleep for each operation this does not happen, however the script is very slow. Is it possible to run queries in parallel in Redshift?
Hope someone can help me , thanks in advance!
You should read up on MVCC (multi-version coherency control) and transactions. Redshift can only only run one query at a time (for a session) but that is not the issue. You want to COMMIT both changes at the same time (COMMIT is the action that causes changes to be apparent to others). You do this by wrapping your SQL statement in a transaction (BEGIN ... COMMIT) and executed in the same session (not clear if you are using multiple sessions). All changes made within the transaction will only be visible to the session making the changes UNTIL COMMIT when ALL the changes made by the transaction will be visible to everyone at the same moment.
A few things to watch out for - if your connection is in AUTOCOMMIT mode then you may break out of your transaction early and COMMIT partial results. Also when you are working in transactions your source table information is unchanging (so you see consistent data during your transaction) and this information isn't allowed to change for you. This means that if you have multiple sessions changing table data you need to be careful about the order in which they COMMIT so the right version of data is presented to each other.
begin transaction;
<run the queries in parallel>
end transaction;
In this specific case do this:
create temp table stage (like target);
insert into stage
select * from source
where source.filter = 'filter_expression';
begin transaction;
delete from target
using stage
where target.primarykey = stage.primarykey;
insert into target
select * from stage;
end transaction;
drop table stage;
See:
https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html
https://docs.aws.amazon.com/redshift/latest/dg/t_updating-inserting-using-staging-tables-.html

Waiting for a table to be completely deleted

I have a table that has to be refreshed daily from an external source. All the recommendations I read say to delete the whole table and re-create it instead of deleting all the items.
I tried the suggested method, but the deleteTable function returns successful even though the table is still in a state of "Table is being deleted", as seen from the DynamoDB console. Sometimes this takes more than a minute.
What is the proper way of deleting and re-creating a table? Should I just keep trying createTable until the already exists error goes away?
I am using Node.js.
(The table is a list of some 5,000+ bus stops. The source doesn't specify how often the data changes nor give any indicator that there are changes. I found a small number of changes once every few weeks.)
If you are using boto3 (Python), there is a waiter called TableNotExists:
Polls DynamoDB.Client.describe_table() every 20 seconds until a successful state is reached. An error is returned after 25 failed checks.
Or, you could just do that polling yourself.
I would suggest changing the table name each day, using the current date as part of the table name. Then you can create the new table and start populating it without having to wait for the delete of the previous day's table to complete.
If the response from the createTable method is a Table already exists exception, the exception also contains a retryDelay property that is a number.
I can't find documentation on retryDelay but it seems to be a time duration in seconds.
I use the Table already exists exception to check that the table is not completely deleted, and, if not, back off for a period specified in the retryDelay property. After a few iterations, the table can be successfully created.
Sometimes the value in retryDelay can be more than 20.
This approach has worked without issues for me every time.

PDI - Update field value in Logging tables

I'm trying create a transformation that can change field value in DB (postgreSQL what i use).
Case :
In postgre db I have table called Monitoring and it has several field like id, date, starttime, endtime, duration, transformation name, status, desc. All those value I get from Transformation Logging.
So, when I run the transformation it will insert into Monitoring table and set value for field status with Running. And when it done it will update the status into Finish. What I'm trying is to define value in table field by myself not take it from Transformation Logging so I can customize the value like I want to.
Goal is Update transformation status value from 'running' to 'finish/error/abort etc' in my db using pentaho and display that status in web app
I have thinking to used Modified Java Script step to do it but if there any other way maybe? A better one. (Just need opinion about this)
Apart from my remark, did you try the Value Mapper?
modified javascript is not a good idea to use. Ideally, it shouldn't be used due to the performance issue. You can use "add constant" step or "User defined Java Class" for an alternative.
You cannot change the values of the built-in Logging tables, for the simple reason that they are reserved for PDI usage. This causes a known issue in case of hard error: for example the status is not set to finish when the data base server crashes, or when a NullException is not catch by the DPI code.
You have some work around.
The simplest, the one used in the ETL-Pilot is to test (Status=Finish OR LogDate< 15 minutes ago) is the web app.
You can update the table when the transformation is not running. For example, put an hourly (or less) crontab that changes to Finish the status of any transformation whose LogDate is older than 15 mn. This crontab may be a simple SQL or included in a transformation that also check the tables size and/or send an email in case of potential error.
You can copy the table (if it is a non locking operation in your DB system), modify the Status column and use this table for your web app.

Can data be changed in my tables that will break batch updating while it modifies multiple rows?

I have an update query that is based on the result of a select, typically returning more than 1000 rows.
If some of these rows are updated by other queries before this update can touch them could that cause a problem with the records? For example could they get out of sync with the original query?
If so would it be better to select and update individual rows rather than in batch?
If it makes a difference, the query is being run on Microsoft SQL Server 2008 R2
Thanks.
No.
A Table cannot be updated while something else is in the process of updating it.
Databases use concurrency control and have ACID properties to prevent exactly this type of problem.
I would recommend reading up on isolation levels. The default in SQL Server is READ COMMITTED, which means that other transactions cannot read data that has been updated but not committed by a given transaction.
This means that data returned by your select/update statement will be an accurate reflection of the database at a moment in time.
If you were to change your database to READ UNCOMMITTED then you could get into a situation where the data from your select/update is out of synch.
If you're selecting first, then updating, you can use a transaction
BEGIN TRAN
-- your select WITHOUT LOCKING HINT
-- your update based upon select
COMMIT TRAN
However, if you're updating directly from a select, then, no need to worry about it. A single transaction is implied.
UPDATE mytable
SET value = mot.value
FROM myOtherTable mot
BUT... do NOT do the following, otherwise you'll run into a deadlock
UPDATE mytable
SET value = mot.value
FROM myOtherTable mot WITH (NOLOCK)